0% found this document useful (0 votes)
25 views8 pages

Graph Neural Networks Meet Neural-Symbolic Computing: A Survey and Perspective

This document presents a survey on the intersection of Graph Neural Networks (GNNs) and Neural-Symbolic Computing (NSC), highlighting their applications in relational and symbolic domains. It reviews existing research on GNNs, their methodologies, and their relationship with NSC, while proposing future research directions for integrating these approaches. The paper emphasizes the need for improved explainability and reasoning in AI systems, advocating for the development of more robust neural-symbolic systems.

Uploaded by

kifega9328
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views8 pages

Graph Neural Networks Meet Neural-Symbolic Computing: A Survey and Perspective

This document presents a survey on the intersection of Graph Neural Networks (GNNs) and Neural-Symbolic Computing (NSC), highlighting their applications in relational and symbolic domains. It reviews existing research on GNNs, their methodologies, and their relationship with NSC, while proposing future research directions for integrating these approaches. The paper emphasizes the need for improved explainability and reasoning in AI systems, advocating for the development of more robust neural-symbolic systems.

Uploaded by

kifega9328
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20)

Survey Track

Graph Neural Networks Meet Neural-Symbolic Computing:


A Survey and Perspective
Luı́s C. Lamb1 , Artur d’Avila Garcez2 , Marco Gori3,4 , Marcelo O.R. Prates1 ,
Pedro H.C. Avelar1,3 and Moshe Y. Vardi5
1
UFRGS, Federal University of Rio Grande do Sul, Brazil
2
City, University of London, UK
3
University of Siena, Italy
4
Université Côte d’Azur, 3IA, France
5
Rice University, Houston, USA
{lamb, morprates, phcavelar}@inf.ufrgs.br, [email protected], [email protected], [email protected]

Abstract be modelled by deep learning and symbolic reasoning, re-


spectively.1 In this paper, we present a survey and relate
Neural-symbolic computing has now become the recent research results on: (1) Neural-Symbolic Comput-
subject of interest of both academic and indus- ing, by summarizing the main approaches to rich knowledge
try research laboratories. Graph Neural Networks representation and reasoning within deep learning, and (2)
(GNNs) have been widely used in relational and the approach pioneered by the authors and others of Graph
symbolic domains, with widespread application of Neural Networks (GNNs) for learning and reasoning about
GNNs in combinatorial optimization, constraint problems that require relational structures or symbolic learn-
satisfaction, relational reasoning and other scien- ing. Although recent papers have surveyed GNNs, including
tific domains. The need for improved explain- [Battaglia et al., 2018; Chami et al., 2020; Wu et al., 2019;
ability, interpretability and trust of AI systems in Zhang et al., 2018] they have not focused on the relation-
general demands principled methodologies, as sug- ship between GNNs and neural-symbolic computing (NSC).
gested by neural-symbolic computing. In this pa- [Bengio et al., 2018] also touches particular topics related
per, we review the state-of-the-art on the use of to some we discuss here, in particular to do with meta-
GNNs as a model of neural-symbolic computing. transfer learning. Recent surveys in neural-symbolic comput-
This includes the application of GNNs in several ing [d’Avila Garcez et al., 2015; d’Avila Garcez et al., 2019;
domains as well as their relationship to current de- Townsend et al., 2019] have not exploited the highly relevant
velopments in neural-symbolic computing. applications of GNNs in symbolic and relational learning, or
the relationship between the two approaches.

1 Introduction Our Contribution. As mentioned above, recent work have


surveyed graph neural networks and neural-symbolic com-
Over the last decade Artificial Intelligence in general, and puting, but to the best of our knowledge, no survey has re-
deep learning in particular, have been the focus of intensive viewed and analysed the recent results on the specific re-
research endeavors, gathered media attention and led to de- lationship between GNNs and NSC. We also outline the
bates on their impacts both in academia and industry [Mar- promising directions for research and applications combin-
cus, 2020; Raghavan, 2019]. The recent AI Debate in Mon- ing GNNs and NSC from the perspective of symbolic reason-
treal with Yoshua Bengio and Gary Marcus [Marcus, 2020], ing tasks. The above-referenced surveys on GNNs, although
and the AAAI-2020 fireside conversation with Nobel Laure- comprehensive, all describe other application domains. The
ate Daniel Kahneman and the 2018 Turing Award winners remainder of the paper is organized as follows. In Section
and deep learning pioneers Geoff Hinton, Yoshua Bengio and 2, we present an overview and taxonomy of neural-symbolic
Yann LeCun have led to new perspectives on the future of computing. In Section 3, we discuss the main GNN models
AI. It has now been argued that if one aims to build richer and their relationship to neural-symbolic computing. We then
AI systems, i.e. semantically sound, explainable, and reli- outline the main GNN architectures and their use in relational
able, one has to add a sound reasoning layer to deep learning and symbolic learning. Finally, we conclude and point out
[Marcus, 2020]. Kahneman has made this point clear when directions for further research. We shall assume familiarity
he stated at AAAI-2020 that “...so far as I’m concerned, Sys- with neural learning and symbolic AI.
tem 1 certainly knows language... System 2... does involve
certain manipulation of symbols.” [Kahneman et al., 2020]. 1
“Thinking, Fast and Slow”, by Daniel Kahneman: New York,
Kahneman’s comments address recent parallels made by FSG, 2011, describes the author’s “... current understanding of
researchers between “Thinking, Fast and Slow” and the so- judgment and decision making, which has been shaped by psycho-
called “AI’s systems 1 and 2”, which could, in principle, logical discoveries of recent decades.”

4877
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20)
Survey Track

2 Neural-Symbolic Computing Taxonomy ing problems than what was seen during training. T YPE 5 are
those tightly-coupled neural-symbolic systems where a sym-
At this year’s Robert S. Engelmore Memorial Lecture, bolic logic rule is mapped onto a distributed representation
at the AAAI Conference on Artificial Intelligence, New (an embedding) and acts as a soft-constraint (a regularizer) on
York, February 10th, 2020, Henry Kautz introduced a tax- the network’s loss function. Examples of these are [Huang et
onomy for neural-symbolic computing as part of a talk al., 2017] and [Serafini and d’Avila Garcez, 2016].
entitled The Third AI Summer. Six types of neural- Finally, TYPE 6 systems should be capable, according to
symbolic systems are outlined: (1) SYMBOLIC N EURO SYM - Kautz, of true symbolic reasoning inside a neural engine. It
BOLIC , (2) S YMBOLIC [N EURO ], (3) N EURO ;S YMBOLIC , is what one could refer to as a fully-integrated system. Early
(4) N EURO :S YMBOLIC → N EURO, (5) N EURO S YMBOLIC and work in neural-symbolic computing has achieved this: see
(6) N EURO [S YMBOLIC ]. [d’Avila Garcez et al., 2009] for a historical overview; and
The origin of GNNs [Scarselli et al., 2008] can be traced some TYPE 4 systems are also capable of it [d’Avila Garcez
back to neural-symbolic computing (NSC) in that both sought et al., 2009; d’Avila Garcez et al., 2015; Hitzler et al., 2004],
to enrich the vector representations in the inputs of neural but in a localist rather than a distributed architecture and us-
networks, first by accepting tree structures and then graphs ing simpler forms of embedding than TYPE 5 systems. Kautz
more generally. In this sense, according to Kautz’s taxon- adds that TYPE 6 systems should be capable of combinatorial
omy, GNNs are a TYPE 1 neural-symbolic system. GNNs reasoning, suggesting using an attention schema to achieve it
[Battaglia et al., 2018] were recently combined with convo- effectively. In fact, attention mechanisms can be used to solve
lutional networks in novel ways which have produced impres- graph problems, for example with pointer networks [Vinyals
sive results on data efficiency. In parallel, NSC has focused et al., 2015]. It should be noted that the same problems can
on the learning of adequate embeddings for the purpose of be solved through other NSC architectures, such as GNNs
symbolic computation. This branch of neural-symbolic com- [Prates et al., 2019]. This idea resonates with the recent pro-
puting, which includes Logic Tensor Networks [Serafini and posal outlined by Bengio in the AI debate of December 2019.
d’Avila Garcez, 2016] and Tensor Product Representations In what concerns neural-symbolic computing theory, the
[Huang et al., 2017] has been called in [d’Avila Garcez et study of TYPE 6 systems is highly relevant. In practi-
al., 2019] tensorization methods and draw similarities with cal terms, a tension exists between effective learning and
[Diligenti et al., 2017] that use fuzzy methods in represent- sound reasoning, which may prescribe the use of a more hy-
ing first-order logic. These have been classified by Kautz as brid approach (TYPES 3 to 5) or variations thereof such as
TYPE 5 neural-symbolic systems, as also discussed in what the use of attention with tensorization. Orthogonal to the
follows. A natural point of contact between GNNs and NSC above taxonomy, but mostly associated so far with TYPE 4,
is the provision of rich embeddings and attention mechanisms is the study of the limits of reasoning within neural net-
towards structured reasoning and efficient learning. works w.r.t. full first-order, higher-order and non-classical
T YPE 1 neural-symbolic integration is standard deep learn- logic theorem proving [d’Avila Garcez and Lamb, 2003;
ing, which some may argue is a stretch to refer to as neural- d’Avila Garcez et al., 2015]. In this paper, as we revisit
symbolic, but which is included here to note that the input the use of rich logic embeddings in TYPE 5 systems, notably
and output of a neural network can be made of symbols e.g. Logic Tensor Networks [Serafini and d’Avila Garcez, 2016],
in the case of language translation or question answering ap- alongside the use of attention mechanisms or convolutions in
plications. T YPE 2 are hybrid systems such as DeepMind’s GNNs, we will seek to propose a research agenda and specific
AlphaGo and other systems, where the core neural network applications of symbolic reasoning and statistical learning to-
is loosely-coupled with a symbolic problem solver such as wards the sound development of TYPE 6 systems.
Monte Carlo tree search. T YPE 3 is also a hybrid system
whereby a neural network focusing on one task (e.g. object 3 Graph Neural Networks Meet
detection) interacts via input/output with a symbolic system
specialising in a complementary task (e.g. query answer- Neural-Symbolic Computing
ing). Examples include the neuro-symbolic concept learner One of the key concepts in machine learning is that of priors
[Mao et al., 2019] and deepProbLog [Manhaeve et al., 2018; or inductive biases – the set of assumptions that a learner
Galassi et al., 2020]. uses to compute predictions on test data. In the context
In a TYPE 4 neural-symbolic system, symbolic knowledge of deep learning (DL), the design of neural building blocks
is compiled into the training set of a neural network. Kautz that enforce strong priors has been a major source of break-
offers [Lample and Charton, 2020] as an example. Here, throughs. For instance, the priors obtained through feedfor-
we would also include other tightly-coupled neural-symbolic ward layers encourage the learner to combine features addi-
systems where various forms of symbolic knowledge, not re- tively, while the ones obtained through dropout discourage it
stricted to if-then rules only, can be translated into the initial to overfit and the ones obtained through multi-task learning
architecture and set of weights of a neural network [d’Avila encourage it to prefer sets of parameters that explain more
Garcez et al., 2009], in some cases with guarantees of correct- than one task.
ness. We should also mention [Arabshahi et al., 2018], which One of the most influential neural building blocks, having
learn and reason over mathematical constructions, as well as helped pave the way for the DL revolution, is the convolu-
[Arabshahi et al., 2019], which propose a learning architec- tional layer [LeCun et al., 2015]. Convolutional architectures
ture that extrapolates to much harder symbolic maths reason- are successful for tasks defined over Euclidean signals be-

4878
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20)
Survey Track

f (x) = (x1 ∨ ¬x5 ∨ x2 ∨ x3 ∨ ¬x4 ) Teije, 2019]. Related ideas are discussed formally in the
context of constraint-based learning and reasoning [d’Avila
Figure 1: Due to permutation invariance, literals ¬x5 and x3 can
Garcez et al., 2019]. Recent research in first-order logic pro-
exchange places with no effect to the boolean function f (x). There
are 5! = 120 such permutations. grams has successfully exploited advantages of distributed
representations of logic symbols for efficient reasoning, in-
ductive programming [Evans and Grefenstette, 2018] and dif-
ferentiable theorem proving [Rocktäschel and Riedel, 2016].
cause they enforce equivariance to spatial translation. This is
a useful property to have when learning representations for 3.2 Pointer Networks
objects regardless of their position in a scene.
Analogously, recurrent layers enforce equivariance in time The Pointer Network (PN) formalization [Vinyals et al.,
that is useful for learning over sequential data. Recently, at- 2015] is a neural architecture meant for computing a m-sized
tention mechanisms, through the advent of transformer net- sequence (i1 , i2 , . . . in ) ∈ [1, n]m over the elements of an in-
works, have enabled advancing the state-of-art in many se- put set X = {x1 , . . . , xn }. PN implements a simple modifi-
quential tasks, notably in natural language processing [Devlin cation over the traditional seq2seq model, augmenting it with
et al., 2019; Goyal et al., 2019] and symbolic reasoning tasks a simplified variant of the attention mechanism whose outputs
such as solving math equations and integrals2 [Lample and are interpreted as “pointers” to the input elements. Traditional
Charton, 2020]. Attention encourages the learner to combine seq2seq models implement an encoder-decoder architecture
representations additively, while also enforcing permutation in which the elements of the input sequence are consumed in
invariance. All three architectures take advantage of sparse order and used to update the encoder’s hidden state at each
connectivity – another important design in DL which is key step. Finally, a decoder consumes the encoder’s hidden state
to enable the training of larger models. Sparse connectiv- and is used to yield a sequence of outputs, one at a time.
ity and neural, building blocks with strong priors usually go It is known that seq2seq models tend to exhibit improved
hand in hand, as the latter leverage symmetries in the input performance when augmented with an attention mechanism,
space to cut down parameters through invariance to differ- a phenomenon noticeable from the perspective of Natural
ent types of transformations. NSC architectures often com- Language Processing [Devlin et al., 2019]. Traditional mod-
bine the key design concepts from convolutional networks els however yield sequences of outputs over a fixed-length
and attention-based architectures to enforce permutation in- dictionary (for instance a dictionary of tokens for language
variance over the elements of a set or the nodes of a graph models), which is not useful for tasks whose output is defined
(Fig. 1). Some NSC architectures such as Pointer Networks over the input set and hence require a variable-length dictio-
[Vinyals et al., 2015] implement attention directly over a set nary. PN tackle this problem by encoding the n-sized input
of inputs X = {x1 , . . . , xn } coupled with a decoder that out- set P with a traditional encoding architecture and decoding
puts a sequence (i1 , i2 , . . . in ) ∈ [1, n]m of “pointers” to the a probability distribution p(Ci |C1 , . . . Ci−1 , P) over the set
input elements (hence the name). Note that both formaliza- {1, . . . , n} of indices at each step i by computing a softmax
tions are defined over set inputs rather than sequential ones. over an attention layer parameterized by matrices W1 , W2
and vector v feeding on the decoder state di and the encoder
3.1 Logic Tensor Networks states ei (1, . . . , n):
Tensorisation is a class of approaches that embeds first-order uij = v| tanh (W1 ej W2 di ) j ∈ (1, . . . n)
logic (FOL) symbols such as constants, facts and rules into (1)
real-valued tensors. Normally, constants are represented as p(Ci |C1 , . . . Ci−1 , P) = softmax(ui )
one-hot vectors (first-order tensor). Predicates and functions
The output pointers can then be used to compute loss func-
are matrices (second-order tensor) or higher-order tensors.
tions over combinatorial optimization problems. In the origi-
In early work, embedding techniques were proposed to
nal paper the authors define a PN to solve the Traveling Sales-
transform symbolic representations into vector spaces where
person Problem (TSP) in which a beam search procedure is
reasoning can be done through matrix computation [Bordes
used to select cities given the probability distributions com-
et al., 2011; Serafini and d’Avila Garcez, 2016; Santoro et
puted at each step and finally a loss function can computed for
al., 2017]. Training embedding systems can be carried out
the output tour by adding the corresponding city distances.
as distance learning using backpropagation. Most research
Given their discrete nature, PNs are naturally suitable for
in this direction focuses on representing relational predicates
many combinatorial problems (the original paper evaluates
in a neural network. This is known as “relational embed-
PN on Delauney Triangulation, TSP and Convex Hull prob-
ding” [Sutskever and Hinton, 2008]. For representation of
lems). Unfortunately, even though PNs can solve problems
more complex logical structures, i.e. FOL formulas, a system
over sets, they cannot be directly applied to general (non-
named Logic Tensor Network (LTN) [Serafini and d’Avila
complete) graphs.
Garcez, 2016] is proposed by extending Neural Tensor Net-
works (NTN), a state-of-the-art relational embedding method. 3.3 Convolutions as Self-Attention
LTNs effectively implement learning using symbolic infor-
mation as a prior, as pointed out by [van Harmelen and ten The core building block of models in the GNN family is
the graph convolution operation, which is a neural building
2 block that enables one to perform learning over graph inputs.
It is advisable to read [Lample and Charton, 2020] alongside
this critique of its limitations [Davis, 2019] Empowering DL architectures with the capacity of feeding on

4879
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20)
Survey Track

graph-based data is particularly suitable for neural-symbolic


reasoning, as symbolic expressions are easily represented as
graphs (Fig. 2). Furthermore, graph representations have use- (x1 ∨ ¬x2 ) ∧ (x3 ∨ x4 ∨ x5 )
ful properties such as permutation invariance and flexibility
for generalization over the input size (models in the graph x1 ¬x1 x2 ¬x2 x3 ¬x3 x4 ¬x4 x5 ¬x5
neural network family can be fed with graphs regardless of
their size in terms of number of vertices). Graph convolu- Figure 2: CNF formula F = (x1 ∨¬x2 )∧(x3 ∨x4 ∨x5 ) represented
tions can be seen as a variation of the well-known attention as a graph: clauses and literals correspond to nodes, edges between
mechanism [Garcia and Bruna, 2018]. A graph convolution clauses and literals are painted gray and edges between literals and
is essentially an attention layer with two key differences: their complements are painted black

1. There is no dot-product for computing weights: encod-


ings are simply added together with unit weights.3 neighborhood defined over pixels. One can think of the set of
pixels of an image as a graph with a grid topology in which
2. The sum is masked with an adjacency mask, or in other
each vertex is associated with a vector representation corre-
words the graph convolution generalizes attention for
sponding to the Red/Green/Blue channels. The internal ac-
non-complete graphs.
tivations of a CNN can also be thought of graphs with grid
All models in the GNN family learn continuous representa- topologies, but the vector representations for each pixel are
tions for graphs by embedding nodes into hyper-dimensional generally embedded in spaces of higher dimensionality (cor-
spaces, an insight motivated by graph embedding algorithms. responding to the number of convolutional kernels learned at
A graph embedding corresponds to a function f : V → Rn each layer). In this context, Graph Convolutional Networks
mapping from the set of vertices V of a graph G = (V, E) (GCNs) [Kipf and Welling, 2017] can be thought of as a gen-
to n-dimensional vectors. In the context of GNNs, we are eralization of CNNs for non-grid topologies. Generalizing
interested in learning the parameters θ of a function f : CNNs this way is tricky because one cannot rely anymore on
G × θ → (V → Rn ). That is, a parameterized function learning 3 × 3 or 5 × 5 kernels, for two reasons:
f (G, θ) over the set of graphs G whose outputs are mappings
1. In grid topologies pixels are embedded in 2-dimensional
V → Rn from vertices to n-dimensional vectors. In other
Euclidean space, which enables one to learn a specific
words, GNNs learn functions to encode vertices in a gener-
weight for each neighbor on the basis of its relative
alized way. Note that since the output from a GNN is itself a
position (left, right, central, top-right, etc.). This is
function, there are no limitations for the number of vertices in
not true for general graphs, and hence weights such as
the input graph. This useful property stems from the modular
W1,0 , W1,1 , W1,1 , W0,1 do not always have a clear
architecture of GNNs, which will be discussed at length in
interpretation.
the sequel. We argue that this should be interesting to explore
in the context of neural-symbolic computing in the represen- 2. In grid topologies each vertex has a fixed number of
tation and manipulation of variables within neural networks. neighbors and weight sharing, but there is no such con-
Generally, instead of synthesizing a vertex embedding straint for general graphs. Thus we cannot hope to learn
function from the ground up, GNNs choose an initial, simpler a specific weight for each neighbor as the required num-
vertex embedding such as mapping each vertex to the same ber of such weights will vary with the input graph.
(learned) vector representation or sampling vectors from a GCNs tackle this problem the following way: Instead
multivariate normal distribution, and then learn to refine this of learning kernels corresponding to matrices of weights,
representation by iteratively updating representations for all they learn transformations for vector representations (em-
vertices. The refinement process, which consists of each ver- beddings) of graph vertices. Concretely, given a graph
tex aggregating information from its direct neighbors to up- G = (V, E) and a matrix x(k) ∈ R|V|×d of vertex represen-
date its own embedding is at the core of how GNNs learn
tations (i.e. x~i (k) is the vector representation of vertex i at the
properties over graphs. Over many refinement steps, ver-
tices can aggregate structural information about progressively k-th layer), a GCN computes the representations x~i (k+1) of
larger reachable subsets of the input graph. However we rely vertex i in the next layer as:
on a well-suited transformation at each step to enable vertices
θk · x~j (k)
 X 
to make use of this structural information to solve problems (k+1)
x~i =σ p p (2)
over graphs. The graph convolution layer, described next in j∈N (i)∪{i}
deg(i) deg(j)
Section 3.4, implements such transformation.
In other words, we linearly transform the vector represen-
3.4 Graph Convolutional Networks tation of each neighbor j by multiplying it with a learned
Graph convolutions are defined in analogy to convolutional matrix of weights θk , normalizing it by the square roots of
layers over Euclidean data. Both architectures compute the degrees deg i, deg j of both vertices, aggregate all results
weighted sums over a neighborhood. For CNNs, this neigh- additively and finally apply a non-linearity σ. Note that θk
borhood is the well known 9-connected or 25-connected denotes the learned weight matrix for GCN layer k: in gen-
eral one will stack n different GCN layers together and hence
3
The Graph Attention network (GAT) however generalizes graph learn the parameters of n such matrices. Also note that one
convolutions with dot-product attention [Veličković et al., 2018] iterates over an extended neighborhood N (i) ∪ {i}, which

4880
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20)
Survey Track

includes i itself. This is done to prevent “forgetting” the Also, the update procedure is run over a fixed number of steps
representation of the vertex being updated.  Equation 2 can and it is usual to implement u if using some type of recurrent
1 1
be summarized as: x(k+1) = D̃− 2 ÃD̃− 2 x(k) θ(k) , where neural network (RNN), such as Long-Short Term Memory
à = A + I is the adjacency matrix A plus self-loops (I is the (LSTM) cells [Hochreiter and Schmidhuber, 1997; Selsam et
al., 2019], or Gated Recurrent Units.
identity matrix) and D̃ is the degree matrix of Ã.
3.5 The Graph Neural Network Model 3.7 Graph Attention Networks
Although GCNs are conceptually simpler, the GNN model The Graph Attention Networks (GAT) [Veličković et al.,
predates them by a decade, having been originally proposed 2018] augment models in the graph neural network fam-
by [Scarselli et al., 2008]. The model is similar to GCNs, ily with an attention mechanism enabling vertices to weigh
with two key differences: neighbor representations during their aggregation. As with
(1) One does not stack multiple independent layers as in other types of attention, a parameterized function is used to
GCNs. A single parameterized function is iterated many compute the weights dynamically, which enables the model to
times, in analogy to recurrent networks, until convergence. learn to weigh representations wisely. The goal of the GAT
(2) The transformations applied to neighbor vertex represen- is to compute a coefficient eij : R for each neighbor j of a
tations are not necessarily linear, and can be implemented by given vertex i, so that the aggregation in Equation 3 becomes:
deep neural networks (e.g. by a multilayer perceptron). x~i (t+1) = j∈N (i) eij h li , lj , lij , x~i (t)
P 
Concretely, the graph neural network model defines pa-
rameterized functions h : N × N × N × Rd → Rd and To compute eij , the GAT introduces a weight matrix W ∈
g : N × Rd → Ro , named the transition function and the Rd × Rd , used to multiply vertex embeddings for i and j,
output function. In analogy to a graph convolution layer, which are concatenated and multiplied by a parameterized
the transition function defines a rule for updating vertex rep- weight vector ~a. Finally, a non-linearity is applied to the
resentations by aggregating transformations over representa- computation in the above equation and then a softmax over
tions of neighbor vertices. The vertex representation x~i (t+1) the set of neighbors N (i) is applied over the exponential of
for vertex i at time (t + 1) is computed as: the result, yielding: eij = softmaxj (σ(~a · (Wx~i ||Wx~j )))
X   GAT are known to outperform typical GCN architectures
x~i (t+1) = h li , lj , lij , x~i (t) (3) for graph classification tasks, as shown in the original paper
j∈N (i) [Veličković et al., 2018].
Where li , lj and lij are respectively the labels for nodes i and
j and edge ij and Rd , Ro are respectively the space of vertex 4 Perspectives and Applications of GNNs to
representations and the output space. The model is defined Neural-Symbolic Computing
over labelled graphs, but can still be implemented for unla- In this paper, we have seen that GNNs endowed with attention
belled ones by supressing li , lj , lij from the transition func- mechanisms are a promising direction of research towards
tion. After a certain number of iterations one should expect the provision of rich reasoning and learning in type 6 neural-
that vertex embeddings x~i (t+1) are enriched with structural symbolic systems. Future work includes, of course, applica-
information about the input graph. At this point, the output tion and systematic evaluation of relevant specific tasks and
function g can be used to compute an output for each vertex, data sets. These include what John McCarthy described as
given its final representation: o~i = g(li , x~i ) drosophila tasks for Computer Science: basic problems that
In other words, the output at the end of the process is a can illustrate the value of a computational model.
set of |V| vectors ∈ Ro . This is useful for node classifica- Examples in the case of GNNs and NSC could be: (1) ex-
tion tasks, in which one can have o equal the number of node trapolation of a learned classification of graphs as Hamilto-
classes and enforce o~i to encode a probability distribution by nian to graphs of arbitrary size, (2) reasoning about a learned
incorporating a softmax layer into the output function g. If graph structure to generalise beyond the distribution of the
one would like to learn a function over the entire graph in- training data, (3) reasoning about the partOf (X, Y ) relation
stead of its neighbors, there are many possibilities, of which to make sense of handwritten MNIST digits and non-digits.
one is to compute the output onPan aggregation  over all final (4) using an adequate self-attention mechanism to make com-
vertex representations: ~o = g i∈V x~v g binatorial reasoning computationally efficient. This last task
relates to satisfiability including work on using GNNs to
3.6 Message-passing Neural Networks solve the TSP problem. The other tasks are related to meta-
Message-passing neural networks implement a slight modi- transfer learning across domains, extrapolation and causality.
fication over the original GNN model, which is to define a In terms of domains of application, the following are relevant.
specialized update function u : Rd × Rd → Rd to update
the representation for vertex i given its current representation 4.1 Relational Learning and Reasoning
and an aggregation mi over transformations of neighbor ver-
GNN models have been successfully applied to a number
tex embeddings (which are referred to as “messages”, hence
of relational reasoning tasks. Despite the success of convo-
message-passing neural networks), as an example:
lutional networks, visual scene understanding is still out of
x~i (t+1) = u x~i (t) , j∈N (i) h li , lj , lij , x~i (t)
P 
reach for pure CNN models, and hence are a fertile ground for

4881
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20)
Survey Track

GNN-based models. Hybrid CNN + GNN models in particu- embeddings in their heuristic search for the Minimum Ver-
lar have been successful in these tasks, having been applied to tex Cover (MVC), Maximum Cut and Traveling Salesper-
understanding human-object interactions, localising objects, son (TSP) problems. Regarding end-to-end models, [Kool
and challenging visual question answering problems [San- et al., 2019] trained a transformer-based GNN model to em-
toro et al., 2017]. Relational reasoning has also been applied bed TSP answers and extract solutions with an attention-
to physics, with models for extracting objects and relations based decoder, while obtaining better performance than pre-
in unsupervised fashion [van Steenkiste et al., 2018]. GNNs vious work. [Li et al., 2018] used a GCN as a heuristic
coupled with differentiable ODE solvers have been used to to a search algorithm, applying this method on four canon-
learn the Hamiltonian dynamics of physical systems given ical NP-complete problems, namely Maximal Independent
their interactions modelled as a dynamic graph [Greydanus Set, MVC, Maximal Clique, and the Boolean Satisfiability
et al., 2019]. The application of NSC models to life sciences Problem (SAT). [Palm et al., 2018] achieved convergent algo-
is very promising, as graphs are natural representations for rithms over relational problems. The expressiveness of GNNs
molecules, including proteins. In this context, [Stokes et al., has also been the focus of recent research [Sato, 2020].
2020] have generated the first machine-learning discovered Regarding N P-Hard problems, neural-symbolic models
antibiotic (“halcin”) by training a GNN to predict the prob- with an underlying GNN formalization have been proposed
ability that a given input molecule has a growth inhibition to train solvers for the decision variants of the SAT, TSP and
effect on the bacterium E. coli and using it to rank randomly- graph colouring problems, respectively [Selsam et al., 2019;
generated molecules. Protein Structure Prediction, which is Prates et al., 2019; Lemos et al., 2019]. This allowed these
concerned with predicting the three-dimensional structure of models to be trained with a single bit of supervision on each
proteins given its molecular description, is another promis- instance, [Selsam et al., 2019; Cameron et al., 2020] being
ing problem for graph-based and NSC models such as Deep- able to extract assignments from the trained model, [Prates
Mind’s AlphaFold and its variations [Wei, 2019]. et al., 2019] performing a binary search on the prediction
In Natural language processing, tasks are usually de- probability to estimate the optimal route cost. [Toenshoff
fined over sequential data, but modelling textual data with et al., 2019] built an end-to-end framework for dealing with
graphs offers a number of advantages. Several approaches (boolean) constraint satisfaction problems in general, extend-
have defined graph neural networks over graphs of text co- ing the previous works and providing comparisons and per-
occurrences, showing that these architectures improve upon formance increases, and [Abboud et al., 2020] have proposed
the state-of-the-art for seq2seq models [Yao et al., 2019]. a GNN-based architecture that learns to approximate DNF
GNN models have also been successfully applied to rela- counting. There has also been work in generative models for
tional tasks over knowledge bases, such as link prediction combinatorial optimization, such as [You et al., 2019], which
[Schlichtkrull et al., 2018]. As previously mentioned, atten- generates SAT instances using a graph-based approach.
tion mechanisms, which can be seen as a variation of models
in the GNN family, have enabled substantial improvements in 5 Conclusions
several NLP tasks through transfer learning over pretrained
transformer language models [Devlin et al., 2019]. The ex- We presented a review on the relationship between Graph
tent to which language models pretrained over huge amounts Neural Network (GNN) models and architectures and Neural-
of data can perform language understanding however is sub- Symbolic Computing (NSC). In order to do so, we presented
stantially debated, as pointed out by both Marcus [Marcus, the main recent research results that highlight the potential
2020] and Kahneman [Kahneman et al., 2020]. applications of these related fields both in foundational and
Graph-based neural network models have also found a fer- applied AI and Computer Science problems. The interplay
tile field of application in software engineering: due to the between the two fields is beneficial to several areas. These
structured and unambiguous nature of code, it can be repre- range from combinatorial optimization/constraint satisfaction
sented naturally with graphs that are derived unambiguously to relational reasoning, which has been the subject of in-
via parsing. Several works have then utilised GNNs to per- creasing industrial relevance in natural language processing,
form analysis over graph representations of programs and life sciences and computer vision and image understanding
[Raghavan, 2019; Marcus, 2020]. This is largely due to the
obtained significant results. More specifically, Microsoft’s
“Deep Program Understanding” research program has used a fact that many learning tasks can be easily and naturally cap-
GNN variant called Gated Graph Sequence Neural Networks tured using graph representations, which can be seen as a
[Li et al., 2016] in a large number of applications, including generalization over the traditional sequential (RNN) and grid-
spotting errors, suggesting variable names, code completion based representations (CNN) in the family of deep learning
[Brockschmidt et al., 2019], as well as edit representation and building blocks. Finally, it is worth mentioning that the prin-
automatically applying edits to programs [Yin et al., 2019]. cipled integration of both methodologies (GNNs and NSC)
offers a richer alternative to the construction of trustful, ex-
4.2 Combinatorial Optimization and Constraint plainable and robust AI systems, which is clearly an invalu-
Satisfaction Problems able research endeavor.
Many combinatorial optimization problems are relational in
structure and thus are prime application targets to GNN-based Acknowledgements
models [Bengio et al., 2018]. For instance, [Khalil et al., This work is partly supported by CNPq and CAPES, Brazil -
2017] uses a GNN-like model to embed graphs and use these Finance Code 001.

4882
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20)
Survey Track

References [Devlin et al., 2019] Jacob Devlin, Ming-Wei Chang, Ken-


[Abboud et al., 2020] Ralph Abboud, Ismail I. Ceylan, and ton Lee, and Kristina Toutanova. BERT: Pre-training of
Thomas Lukasiewicz. Learning to reason: Leveraging deep bidirectional transformers for language understand-
neural networks for approximate DNF counting. In AAAI, ing. In NAACL-HLT, pages 4171–4186, 2019.
2020. [Diligenti et al., 2017] Michelangelo Diligenti, Marco Gori,
[Arabshahi et al., 2018] Forough Arabshahi, Sameer Singh, and Claudio Saccà. Semantic-based regularization for
and Animashree Anandkumar. Combining symbolic ex- learning and inference. Artif. Intell., 244:143–165, 2017.
pressions and black-box function evaluations in neural [Evans and Grefenstette, 2018] Richard Evans and Edward
programs. In ICLR, 2018. Grefenstette. Learning explanatory rules from noisy data.
[Arabshahi et al., 2019] Forough Arabshahi, Zhichu Lu, JAIR, 61:1–64, 2018.
Sameer Singh, and Animashree Anandkumar. Mem- [Galassi et al., 2020] Andrea Galassi, Kristian Kersting,
ory augmented recursive neural networks. CoRR, Marco Lippi, Xiaoting Shao, and Paolo Torroni. Neural-
abs/1911.01545, 2019. symbolic argumentation mining: An argument in favor of
[Battaglia et al., 2018] Peter Battaglia, Jessica Hamrick, deep learning and reasoning. Front. Big Data, 2:52, 2020.
Victor Bapst, Alvaro Sanchez-Gonzalez, Vinı́cius Zam-
[Garcia and Bruna, 2018] Victor Garcia and Joan Bruna.
baldi et. al. Relational inductive biases, deep learning, and
graph networks. CoRR, abs/1806.01261, 2018. Few-shot learning with graph neural networks. In ICLR,
pages 1–13, 2018.
[Bengio et al., 2018] Yoshua Bengio, Andrea Lodi, and An-
toine Prouvost. Machine learning for combinatorial op- [Goyal et al., 2019] Anirudh Goyal, Alex Lamb, Jordan
timization: a methodological tour d’horizon. CoRR Hoffmann, Shagun Sodhani, Sergey Levine, Yoshua Ben-
abs/1811.06128, 2018. gio, and Bernhard Schölkopf. Recurrent independent
mechanisms. CoRR, abs/1909.10893, 2019.
[Bordes et al., 2011] Antoine Bordes, Jason Weston, Ronan
Collobert, and Yoshua Bengio. Learning structured em- [Greydanus et al., 2019] Samuel Greydanus, Misko
beddings of knowledge bases. In AAAI, 2011. Dzamba, and Jason Yosinski. Hamiltonian neural
[Brockschmidt et al., 2019] Marc Brockschmidt, Miltiadis networks. In NeurIPS, pages 5353–15363, 2019.
Allamanis, Alexander Gaunt, and Oleksandr Polozov. [Hitzler et al., 2004] Pascal Hitzler, Steffen Hölldobler, and
Generative code modeling with graphs. In ICLR, 2019. Anthony K. Seda. Logic programs and connectionist net-
[Cameron et al., 2020] Chris Cameron, Rex Chen, Jason works. J. Appl. Log., 2(3):245–272, 2004.
Hartford, and Kevin Leyton-Brown. Predicting proposi- [Hochreiter and Schmidhuber, 1997] Sepp Hochreiter and
tional satisfiability via end-to-end learning. In AAAI, 2020. Jürgen Schmidhuber. Long short-term memory. Neural
[Chami et al., 2020] Ines Chami, Sami Abu-El-Haija, Bryan Computation, 9(8):1735–1780, 1997.
Perozzi, Christopher Ré, and Kevin Murphy. Machine [Huang et al., 2017] Qiuyuan Huang, Paul Smolensky, Xi-
learning on graphs: A model and comprehensive taxon- aodong He, Li Deng, and Dapeng Oliver Wu. A neural-
omy. CoRR, abs/2005.03675, 2020. symbolic approach to natural language tasks. CoRR,
[d’Avila Garcez and Lamb, 2003] Artur d’Avila Garcez and abs/1710.11475, 2017.
Luı́s Lamb. Reasoning about time and knowledge in neural [Kahneman et al., 2020] Daniel Kahneman, Francesca
symbolic learning systems. In NIPS, pages 921–928, 2003. Rossi, Geoffrey Hinton, Yoshua Bengio, and Yann
[d’Avila Garcez et al., 2009] Artur d’Avila Garcez, Luı́s C. LeCun. AAAI20 fireside chat with Daniel Kahne-
Lamb, and Dov M. Gabbay. Neural-Symbolic Cognitive man. https://vimeo.com/390814190?ref=tw-share, 2020.
Reasoning. Springer, 2009. Accessed 23/02/2020.
[d’Avila Garcez et al., 2015] Artur d’Avila Garcez, Tarek [Khalil et al., 2017] Elias Khalil, Hanjun Dai, Yuyu Zhang,
Besold, Luc de Raedt, Peter Földiák, Pascal Hitzler, Bistra Dilkina, and Le Song. Learning combinatorial op-
Thomas Icard, Kai-Uwe Kühnberger, Luı́s C. Lamb, Risto timization algorithms over graphs. In NIPS, pages 6348–
Miikkulainen, and Daniel Silver. Neural-symbolic learn- 6358, 2017.
ing and reasoning: Contributions and challenges. In AAAI
Spring Symposia, 2015. [Kipf and Welling, 2017] Thomas N. Kipf and Max Welling.
Semi-supervised classification with graph convolutional
[d’Avila Garcez et al., 2019] Artur d’Avila Garcez, Marco networks. In ICLR, 2017.
Gori, Luı́s C. Lamb, Luciano Serafini, Michael Spranger,
and Son Tran. Neural-symbolic computing: An effective [Kool et al., 2019] Wouter Kool, Herke van Hoof, and Max
methodology for principled integration of machine learn- Welling. Attention, learn to solve routing problems! In
ing and reasoning. FLAP, 6(4):611–632, 2019. ICLR, 2019.
[Davis, 2019] Ernest Davis. The use of deep learning for [Lample and Charton, 2020] Guillaume Lample and
symbolic integration: A review of (Lample and Charton, François Charton. Deep learning for symbolic mathemat-
2019). CoRR, abs/1912.05752, 2019. ics. In ICLR, 2020.

4883
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20)
Survey Track

[LeCun et al., 2015] Yann LeCun, Yoshua Bengio, and Ge- David L. Dill. Learning a SAT solver from single-bit
offrey Hinton. Deep learning. Nature, 521(7553):436– supervision. In ICLR, pages 1–11, 2019.
444, 2015. [Serafini and d’Avila Garcez, 2016] Luciano Serafini and
[Lemos et al., 2019] Henrique Lemos, Marcelo Prates, Pe- Artur d’Avila Garcez. Logic tensor networks: Deep learn-
dro Avelar, and Luı́s C. Lamb. Graph colouring meets ing and logical reasoning from data and knowledge. CoRR,
deep learning: Effective graph neural network models for abs/1606.04422, 2016.
combinatorial problems. In ICTAI, pages 879–885, 2019. [Stokes et al., 2020] Jonathan M. Stokes, Kevin Yang, Kyle
[Li et al., 2016] Yujia Li, Daniel Tarlow, Marc Swanson, and Wengong Jin et al. A deep learning ap-
Brockschmidt, and Richard Zemel. Gated graph se- proach to antibiotic discovery. Cell, 180, 2020.
quence neural networks. In ICLR, 2016. [Sutskever and Hinton, 2008] Ilya Sutskever and Geoffrey
[Li et al., 2018] Zhuwen Li, Qifeng Chen, and Vladlen Hinton. Using matrices to model symbolic relationship.
Koltun. Combinatorial optimization with graph convolu- In NIPS, pages 1593–1600, 2008.
tional networks and guided tree search. In NeurIPS, 2018. [Toenshoff et al., 2019] Jan Toenshoff, Martin Ritzert, Hin-
[Manhaeve et al., 2018] Robin Manhaeve, Sebastijan Du- rikus Wolf, and Martin Grohe. RUN-CSP: unsupervised
mancic, Angelika Kimmig, Thomas Demeester, and learning of message passing networks for binary constraint
Luc De Raedt. Deepproblog: Neural probabilistic logic satisfaction problems. CoRR, abs/1909.08387, 2019.
programming. In NeurIPS, 2018. [Townsend et al., 2019] Joe Townsend, Thomas Chaton, and
[Mao et al., 2019] Jiayuan Mao, Chuang Gan, Pushmeet João Monteiro. Extracting relational explanations from
Kohli, Joshua Tenenbaum, and Jiajun Wu. The Neuro- deep neural networks: A survey from a neural-symbolic
Symbolic Concept Learner: Interpreting scenes, words, perspective. IEEE T Neur Net Learn, pages 1–15, 2019.
and sentences from natural supervision. In ICLR, 2019. [van Harmelen and ten Teije, 2019] Frank van Harmelen and
[Marcus, 2020] Gary Marcus. The next decade in AI: Annette ten Teije. A boxology of design patterns for hy-
Four steps towards robust artificial intelligence. CoRR, brid learning and reasoning systems. J Web Eng, 18(1):97–
abs/1801.00631, 2020. 124, 2019.
[Palm et al., 2018] Rasmus Palm, Ulrich Paquet, and Ole [van Steenkiste et al., 2018] Sjoerd van Steenkiste, Michael
Winther. Recurrent relational networks. In NeurIPS, pages Chang, Klaus Greff, and Jürgen Schmidhuber. Relational
3372–3382, 2018. neural expectation maximization: Unsupervised discovery
[Prates et al., 2019] Marcelo Prates, Pedro Avelar, Henrique of objects and their interactions. In ICLR, 2018.
Lemos, Luı́s Lamb, and Moshe Vardi. Learning to Solve [Veličković et al., 2018] Petar Veličković, Guillem Cucurull,
NP-Complete Problems: A Graph Neural Network for De- Arantxa Casanova, Adriana Romero, Pietro Liò, and
cision TSP. In AAAI-2019, pages 4731–4738, 2019. Yoshua Bengio. Graph attention networks. In ICLR, 2018.
[Raghavan, 2019] Sriram Raghavan. 2020 AI predictions [Vinyals et al., 2015] Oriol Vinyals, Meire Fortunato, and
from IBM research. https://www.ibm.com/blogs/research/ Navdeep Jaitly. Pointer networks. In NIPS, pages 2692–
2019/12/2020-ai-predictions, 2019. Accessed 20/02/2020. 2700, 2015.
[Rocktäschel and Riedel, 2016] Tim Rocktäschel and Sebas- [Wei, 2019] Guo-Wei Wei. Protein structure prediction be-
tian Riedel. Learning knowledge base inference with neu- yond alphafold. Nature Mach. Intell., 1:336–337, 2019.
ral theorem provers. In AKBC@NAACL-HLT, 2016. [Wu et al., 2019] Zonghan Wu, Shirui Pan, Fengwen Chen,
[Santoro et al., 2017] Adam Santoro, David Raposo, David Guodong Long, Chengqi Zhang, and Philip S. Yu. A
Barrett, Mateusz Malinowski, Razvan Pascanu, Peter comprehensive survey on graph neural networks. CoRR,
Battaglia, and Tim Lillicrap. A simple neural network abs/1901.00596, 2019.
module for relational reasoning. In NIPS, pages 4967– [Yao et al., 2019] Liang Yao, Chengsheng Mao, and Yuan
4976, 2017.
Luo. Graph convolutional networks for text classification.
[Sato, 2020] Ryoma Sato. A survey on the expressive power In AAAI, pages 7370–7377, 2019.
of graph neural networks. CoRR, abs/2003.04078, 2020.
[Yin et al., 2019] Pengcheng Yin, Graham Neubig, Miltiadis
[Scarselli et al., 2008] Franco Scarselli, Marco Gori, Allamanis, and Alexander Gaunt Marc Brockschmidt.
Ah Tsoi, Markus Hagenbuchner, and Gabriele Monfar- Learning to represent edits. In ICLR, 2019.
dini. The graph neural network model. IEEE T Neural
[You et al., 2019] Jiaxuan You, Haoze Wu, Clark Barrett,
Networ, 20(1):61–80, 2008.
Raghuram Ramanujan, and Jure Leskovec. G2SAT: learn-
[Schlichtkrull et al., 2018] Michael Schlichtkrull, Thomas ing to generate SAT formulas. In NeurIPS, pages 10552–
Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, and 10563, 2019.
Max Welling. Modeling relational data with graph convo-
[Zhang et al., 2018] Ziwei Zhang, Peng Cui, and Wenwu
lutional networks. ESWC, pages 593–607, 2018.
Zhu. Deep learning on graphs: A survey. CoRR,
[Selsam et al., 2019] Daniel Selsam, Matthew Lamm, abs/1812.04202, 2018.
Benedikt Bünz, Percy Liang, Leonardo de Moura, and

4884

You might also like