Graph Neural Networks Meet Neural-Symbolic Computing: A Survey and Perspective
Graph Neural Networks Meet Neural-Symbolic Computing: A Survey and Perspective
Survey Track
4877
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20)
Survey Track
2 Neural-Symbolic Computing Taxonomy ing problems than what was seen during training. T YPE 5 are
those tightly-coupled neural-symbolic systems where a sym-
At this year’s Robert S. Engelmore Memorial Lecture, bolic logic rule is mapped onto a distributed representation
at the AAAI Conference on Artificial Intelligence, New (an embedding) and acts as a soft-constraint (a regularizer) on
York, February 10th, 2020, Henry Kautz introduced a tax- the network’s loss function. Examples of these are [Huang et
onomy for neural-symbolic computing as part of a talk al., 2017] and [Serafini and d’Avila Garcez, 2016].
entitled The Third AI Summer. Six types of neural- Finally, TYPE 6 systems should be capable, according to
symbolic systems are outlined: (1) SYMBOLIC N EURO SYM - Kautz, of true symbolic reasoning inside a neural engine. It
BOLIC , (2) S YMBOLIC [N EURO ], (3) N EURO ;S YMBOLIC , is what one could refer to as a fully-integrated system. Early
(4) N EURO :S YMBOLIC → N EURO, (5) N EURO S YMBOLIC and work in neural-symbolic computing has achieved this: see
(6) N EURO [S YMBOLIC ]. [d’Avila Garcez et al., 2009] for a historical overview; and
The origin of GNNs [Scarselli et al., 2008] can be traced some TYPE 4 systems are also capable of it [d’Avila Garcez
back to neural-symbolic computing (NSC) in that both sought et al., 2009; d’Avila Garcez et al., 2015; Hitzler et al., 2004],
to enrich the vector representations in the inputs of neural but in a localist rather than a distributed architecture and us-
networks, first by accepting tree structures and then graphs ing simpler forms of embedding than TYPE 5 systems. Kautz
more generally. In this sense, according to Kautz’s taxon- adds that TYPE 6 systems should be capable of combinatorial
omy, GNNs are a TYPE 1 neural-symbolic system. GNNs reasoning, suggesting using an attention schema to achieve it
[Battaglia et al., 2018] were recently combined with convo- effectively. In fact, attention mechanisms can be used to solve
lutional networks in novel ways which have produced impres- graph problems, for example with pointer networks [Vinyals
sive results on data efficiency. In parallel, NSC has focused et al., 2015]. It should be noted that the same problems can
on the learning of adequate embeddings for the purpose of be solved through other NSC architectures, such as GNNs
symbolic computation. This branch of neural-symbolic com- [Prates et al., 2019]. This idea resonates with the recent pro-
puting, which includes Logic Tensor Networks [Serafini and posal outlined by Bengio in the AI debate of December 2019.
d’Avila Garcez, 2016] and Tensor Product Representations In what concerns neural-symbolic computing theory, the
[Huang et al., 2017] has been called in [d’Avila Garcez et study of TYPE 6 systems is highly relevant. In practi-
al., 2019] tensorization methods and draw similarities with cal terms, a tension exists between effective learning and
[Diligenti et al., 2017] that use fuzzy methods in represent- sound reasoning, which may prescribe the use of a more hy-
ing first-order logic. These have been classified by Kautz as brid approach (TYPES 3 to 5) or variations thereof such as
TYPE 5 neural-symbolic systems, as also discussed in what the use of attention with tensorization. Orthogonal to the
follows. A natural point of contact between GNNs and NSC above taxonomy, but mostly associated so far with TYPE 4,
is the provision of rich embeddings and attention mechanisms is the study of the limits of reasoning within neural net-
towards structured reasoning and efficient learning. works w.r.t. full first-order, higher-order and non-classical
T YPE 1 neural-symbolic integration is standard deep learn- logic theorem proving [d’Avila Garcez and Lamb, 2003;
ing, which some may argue is a stretch to refer to as neural- d’Avila Garcez et al., 2015]. In this paper, as we revisit
symbolic, but which is included here to note that the input the use of rich logic embeddings in TYPE 5 systems, notably
and output of a neural network can be made of symbols e.g. Logic Tensor Networks [Serafini and d’Avila Garcez, 2016],
in the case of language translation or question answering ap- alongside the use of attention mechanisms or convolutions in
plications. T YPE 2 are hybrid systems such as DeepMind’s GNNs, we will seek to propose a research agenda and specific
AlphaGo and other systems, where the core neural network applications of symbolic reasoning and statistical learning to-
is loosely-coupled with a symbolic problem solver such as wards the sound development of TYPE 6 systems.
Monte Carlo tree search. T YPE 3 is also a hybrid system
whereby a neural network focusing on one task (e.g. object 3 Graph Neural Networks Meet
detection) interacts via input/output with a symbolic system
specialising in a complementary task (e.g. query answer- Neural-Symbolic Computing
ing). Examples include the neuro-symbolic concept learner One of the key concepts in machine learning is that of priors
[Mao et al., 2019] and deepProbLog [Manhaeve et al., 2018; or inductive biases – the set of assumptions that a learner
Galassi et al., 2020]. uses to compute predictions on test data. In the context
In a TYPE 4 neural-symbolic system, symbolic knowledge of deep learning (DL), the design of neural building blocks
is compiled into the training set of a neural network. Kautz that enforce strong priors has been a major source of break-
offers [Lample and Charton, 2020] as an example. Here, throughs. For instance, the priors obtained through feedfor-
we would also include other tightly-coupled neural-symbolic ward layers encourage the learner to combine features addi-
systems where various forms of symbolic knowledge, not re- tively, while the ones obtained through dropout discourage it
stricted to if-then rules only, can be translated into the initial to overfit and the ones obtained through multi-task learning
architecture and set of weights of a neural network [d’Avila encourage it to prefer sets of parameters that explain more
Garcez et al., 2009], in some cases with guarantees of correct- than one task.
ness. We should also mention [Arabshahi et al., 2018], which One of the most influential neural building blocks, having
learn and reason over mathematical constructions, as well as helped pave the way for the DL revolution, is the convolu-
[Arabshahi et al., 2019], which propose a learning architec- tional layer [LeCun et al., 2015]. Convolutional architectures
ture that extrapolates to much harder symbolic maths reason- are successful for tasks defined over Euclidean signals be-
4878
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20)
Survey Track
f (x) = (x1 ∨ ¬x5 ∨ x2 ∨ x3 ∨ ¬x4 ) Teije, 2019]. Related ideas are discussed formally in the
context of constraint-based learning and reasoning [d’Avila
Figure 1: Due to permutation invariance, literals ¬x5 and x3 can
Garcez et al., 2019]. Recent research in first-order logic pro-
exchange places with no effect to the boolean function f (x). There
are 5! = 120 such permutations. grams has successfully exploited advantages of distributed
representations of logic symbols for efficient reasoning, in-
ductive programming [Evans and Grefenstette, 2018] and dif-
ferentiable theorem proving [Rocktäschel and Riedel, 2016].
cause they enforce equivariance to spatial translation. This is
a useful property to have when learning representations for 3.2 Pointer Networks
objects regardless of their position in a scene.
Analogously, recurrent layers enforce equivariance in time The Pointer Network (PN) formalization [Vinyals et al.,
that is useful for learning over sequential data. Recently, at- 2015] is a neural architecture meant for computing a m-sized
tention mechanisms, through the advent of transformer net- sequence (i1 , i2 , . . . in ) ∈ [1, n]m over the elements of an in-
works, have enabled advancing the state-of-art in many se- put set X = {x1 , . . . , xn }. PN implements a simple modifi-
quential tasks, notably in natural language processing [Devlin cation over the traditional seq2seq model, augmenting it with
et al., 2019; Goyal et al., 2019] and symbolic reasoning tasks a simplified variant of the attention mechanism whose outputs
such as solving math equations and integrals2 [Lample and are interpreted as “pointers” to the input elements. Traditional
Charton, 2020]. Attention encourages the learner to combine seq2seq models implement an encoder-decoder architecture
representations additively, while also enforcing permutation in which the elements of the input sequence are consumed in
invariance. All three architectures take advantage of sparse order and used to update the encoder’s hidden state at each
connectivity – another important design in DL which is key step. Finally, a decoder consumes the encoder’s hidden state
to enable the training of larger models. Sparse connectiv- and is used to yield a sequence of outputs, one at a time.
ity and neural, building blocks with strong priors usually go It is known that seq2seq models tend to exhibit improved
hand in hand, as the latter leverage symmetries in the input performance when augmented with an attention mechanism,
space to cut down parameters through invariance to differ- a phenomenon noticeable from the perspective of Natural
ent types of transformations. NSC architectures often com- Language Processing [Devlin et al., 2019]. Traditional mod-
bine the key design concepts from convolutional networks els however yield sequences of outputs over a fixed-length
and attention-based architectures to enforce permutation in- dictionary (for instance a dictionary of tokens for language
variance over the elements of a set or the nodes of a graph models), which is not useful for tasks whose output is defined
(Fig. 1). Some NSC architectures such as Pointer Networks over the input set and hence require a variable-length dictio-
[Vinyals et al., 2015] implement attention directly over a set nary. PN tackle this problem by encoding the n-sized input
of inputs X = {x1 , . . . , xn } coupled with a decoder that out- set P with a traditional encoding architecture and decoding
puts a sequence (i1 , i2 , . . . in ) ∈ [1, n]m of “pointers” to the a probability distribution p(Ci |C1 , . . . Ci−1 , P) over the set
input elements (hence the name). Note that both formaliza- {1, . . . , n} of indices at each step i by computing a softmax
tions are defined over set inputs rather than sequential ones. over an attention layer parameterized by matrices W1 , W2
and vector v feeding on the decoder state di and the encoder
3.1 Logic Tensor Networks states ei (1, . . . , n):
Tensorisation is a class of approaches that embeds first-order uij = v| tanh (W1 ej W2 di ) j ∈ (1, . . . n)
logic (FOL) symbols such as constants, facts and rules into (1)
real-valued tensors. Normally, constants are represented as p(Ci |C1 , . . . Ci−1 , P) = softmax(ui )
one-hot vectors (first-order tensor). Predicates and functions
The output pointers can then be used to compute loss func-
are matrices (second-order tensor) or higher-order tensors.
tions over combinatorial optimization problems. In the origi-
In early work, embedding techniques were proposed to
nal paper the authors define a PN to solve the Traveling Sales-
transform symbolic representations into vector spaces where
person Problem (TSP) in which a beam search procedure is
reasoning can be done through matrix computation [Bordes
used to select cities given the probability distributions com-
et al., 2011; Serafini and d’Avila Garcez, 2016; Santoro et
puted at each step and finally a loss function can computed for
al., 2017]. Training embedding systems can be carried out
the output tour by adding the corresponding city distances.
as distance learning using backpropagation. Most research
Given their discrete nature, PNs are naturally suitable for
in this direction focuses on representing relational predicates
many combinatorial problems (the original paper evaluates
in a neural network. This is known as “relational embed-
PN on Delauney Triangulation, TSP and Convex Hull prob-
ding” [Sutskever and Hinton, 2008]. For representation of
lems). Unfortunately, even though PNs can solve problems
more complex logical structures, i.e. FOL formulas, a system
over sets, they cannot be directly applied to general (non-
named Logic Tensor Network (LTN) [Serafini and d’Avila
complete) graphs.
Garcez, 2016] is proposed by extending Neural Tensor Net-
works (NTN), a state-of-the-art relational embedding method. 3.3 Convolutions as Self-Attention
LTNs effectively implement learning using symbolic infor-
mation as a prior, as pointed out by [van Harmelen and ten The core building block of models in the GNN family is
the graph convolution operation, which is a neural building
2 block that enables one to perform learning over graph inputs.
It is advisable to read [Lample and Charton, 2020] alongside
this critique of its limitations [Davis, 2019] Empowering DL architectures with the capacity of feeding on
4879
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20)
Survey Track
4880
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20)
Survey Track
includes i itself. This is done to prevent “forgetting” the Also, the update procedure is run over a fixed number of steps
representation of the vertex being updated. Equation 2 can and it is usual to implement u if using some type of recurrent
1 1
be summarized as: x(k+1) = D̃− 2 ÃD̃− 2 x(k) θ(k) , where neural network (RNN), such as Long-Short Term Memory
à = A + I is the adjacency matrix A plus self-loops (I is the (LSTM) cells [Hochreiter and Schmidhuber, 1997; Selsam et
al., 2019], or Gated Recurrent Units.
identity matrix) and D̃ is the degree matrix of Ã.
3.5 The Graph Neural Network Model 3.7 Graph Attention Networks
Although GCNs are conceptually simpler, the GNN model The Graph Attention Networks (GAT) [Veličković et al.,
predates them by a decade, having been originally proposed 2018] augment models in the graph neural network fam-
by [Scarselli et al., 2008]. The model is similar to GCNs, ily with an attention mechanism enabling vertices to weigh
with two key differences: neighbor representations during their aggregation. As with
(1) One does not stack multiple independent layers as in other types of attention, a parameterized function is used to
GCNs. A single parameterized function is iterated many compute the weights dynamically, which enables the model to
times, in analogy to recurrent networks, until convergence. learn to weigh representations wisely. The goal of the GAT
(2) The transformations applied to neighbor vertex represen- is to compute a coefficient eij : R for each neighbor j of a
tations are not necessarily linear, and can be implemented by given vertex i, so that the aggregation in Equation 3 becomes:
deep neural networks (e.g. by a multilayer perceptron). x~i (t+1) = j∈N (i) eij h li , lj , lij , x~i (t)
P
Concretely, the graph neural network model defines pa-
rameterized functions h : N × N × N × Rd → Rd and To compute eij , the GAT introduces a weight matrix W ∈
g : N × Rd → Ro , named the transition function and the Rd × Rd , used to multiply vertex embeddings for i and j,
output function. In analogy to a graph convolution layer, which are concatenated and multiplied by a parameterized
the transition function defines a rule for updating vertex rep- weight vector ~a. Finally, a non-linearity is applied to the
resentations by aggregating transformations over representa- computation in the above equation and then a softmax over
tions of neighbor vertices. The vertex representation x~i (t+1) the set of neighbors N (i) is applied over the exponential of
for vertex i at time (t + 1) is computed as: the result, yielding: eij = softmaxj (σ(~a · (Wx~i ||Wx~j )))
X GAT are known to outperform typical GCN architectures
x~i (t+1) = h li , lj , lij , x~i (t) (3) for graph classification tasks, as shown in the original paper
j∈N (i) [Veličković et al., 2018].
Where li , lj and lij are respectively the labels for nodes i and
j and edge ij and Rd , Ro are respectively the space of vertex 4 Perspectives and Applications of GNNs to
representations and the output space. The model is defined Neural-Symbolic Computing
over labelled graphs, but can still be implemented for unla- In this paper, we have seen that GNNs endowed with attention
belled ones by supressing li , lj , lij from the transition func- mechanisms are a promising direction of research towards
tion. After a certain number of iterations one should expect the provision of rich reasoning and learning in type 6 neural-
that vertex embeddings x~i (t+1) are enriched with structural symbolic systems. Future work includes, of course, applica-
information about the input graph. At this point, the output tion and systematic evaluation of relevant specific tasks and
function g can be used to compute an output for each vertex, data sets. These include what John McCarthy described as
given its final representation: o~i = g(li , x~i ) drosophila tasks for Computer Science: basic problems that
In other words, the output at the end of the process is a can illustrate the value of a computational model.
set of |V| vectors ∈ Ro . This is useful for node classifica- Examples in the case of GNNs and NSC could be: (1) ex-
tion tasks, in which one can have o equal the number of node trapolation of a learned classification of graphs as Hamilto-
classes and enforce o~i to encode a probability distribution by nian to graphs of arbitrary size, (2) reasoning about a learned
incorporating a softmax layer into the output function g. If graph structure to generalise beyond the distribution of the
one would like to learn a function over the entire graph in- training data, (3) reasoning about the partOf (X, Y ) relation
stead of its neighbors, there are many possibilities, of which to make sense of handwritten MNIST digits and non-digits.
one is to compute the output onPan aggregation over all final (4) using an adequate self-attention mechanism to make com-
vertex representations: ~o = g i∈V x~v g binatorial reasoning computationally efficient. This last task
relates to satisfiability including work on using GNNs to
3.6 Message-passing Neural Networks solve the TSP problem. The other tasks are related to meta-
Message-passing neural networks implement a slight modi- transfer learning across domains, extrapolation and causality.
fication over the original GNN model, which is to define a In terms of domains of application, the following are relevant.
specialized update function u : Rd × Rd → Rd to update
the representation for vertex i given its current representation 4.1 Relational Learning and Reasoning
and an aggregation mi over transformations of neighbor ver-
GNN models have been successfully applied to a number
tex embeddings (which are referred to as “messages”, hence
of relational reasoning tasks. Despite the success of convo-
message-passing neural networks), as an example:
lutional networks, visual scene understanding is still out of
x~i (t+1) = u x~i (t) , j∈N (i) h li , lj , lij , x~i (t)
P
reach for pure CNN models, and hence are a fertile ground for
4881
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20)
Survey Track
GNN-based models. Hybrid CNN + GNN models in particu- embeddings in their heuristic search for the Minimum Ver-
lar have been successful in these tasks, having been applied to tex Cover (MVC), Maximum Cut and Traveling Salesper-
understanding human-object interactions, localising objects, son (TSP) problems. Regarding end-to-end models, [Kool
and challenging visual question answering problems [San- et al., 2019] trained a transformer-based GNN model to em-
toro et al., 2017]. Relational reasoning has also been applied bed TSP answers and extract solutions with an attention-
to physics, with models for extracting objects and relations based decoder, while obtaining better performance than pre-
in unsupervised fashion [van Steenkiste et al., 2018]. GNNs vious work. [Li et al., 2018] used a GCN as a heuristic
coupled with differentiable ODE solvers have been used to to a search algorithm, applying this method on four canon-
learn the Hamiltonian dynamics of physical systems given ical NP-complete problems, namely Maximal Independent
their interactions modelled as a dynamic graph [Greydanus Set, MVC, Maximal Clique, and the Boolean Satisfiability
et al., 2019]. The application of NSC models to life sciences Problem (SAT). [Palm et al., 2018] achieved convergent algo-
is very promising, as graphs are natural representations for rithms over relational problems. The expressiveness of GNNs
molecules, including proteins. In this context, [Stokes et al., has also been the focus of recent research [Sato, 2020].
2020] have generated the first machine-learning discovered Regarding N P-Hard problems, neural-symbolic models
antibiotic (“halcin”) by training a GNN to predict the prob- with an underlying GNN formalization have been proposed
ability that a given input molecule has a growth inhibition to train solvers for the decision variants of the SAT, TSP and
effect on the bacterium E. coli and using it to rank randomly- graph colouring problems, respectively [Selsam et al., 2019;
generated molecules. Protein Structure Prediction, which is Prates et al., 2019; Lemos et al., 2019]. This allowed these
concerned with predicting the three-dimensional structure of models to be trained with a single bit of supervision on each
proteins given its molecular description, is another promis- instance, [Selsam et al., 2019; Cameron et al., 2020] being
ing problem for graph-based and NSC models such as Deep- able to extract assignments from the trained model, [Prates
Mind’s AlphaFold and its variations [Wei, 2019]. et al., 2019] performing a binary search on the prediction
In Natural language processing, tasks are usually de- probability to estimate the optimal route cost. [Toenshoff
fined over sequential data, but modelling textual data with et al., 2019] built an end-to-end framework for dealing with
graphs offers a number of advantages. Several approaches (boolean) constraint satisfaction problems in general, extend-
have defined graph neural networks over graphs of text co- ing the previous works and providing comparisons and per-
occurrences, showing that these architectures improve upon formance increases, and [Abboud et al., 2020] have proposed
the state-of-the-art for seq2seq models [Yao et al., 2019]. a GNN-based architecture that learns to approximate DNF
GNN models have also been successfully applied to rela- counting. There has also been work in generative models for
tional tasks over knowledge bases, such as link prediction combinatorial optimization, such as [You et al., 2019], which
[Schlichtkrull et al., 2018]. As previously mentioned, atten- generates SAT instances using a graph-based approach.
tion mechanisms, which can be seen as a variation of models
in the GNN family, have enabled substantial improvements in 5 Conclusions
several NLP tasks through transfer learning over pretrained
transformer language models [Devlin et al., 2019]. The ex- We presented a review on the relationship between Graph
tent to which language models pretrained over huge amounts Neural Network (GNN) models and architectures and Neural-
of data can perform language understanding however is sub- Symbolic Computing (NSC). In order to do so, we presented
stantially debated, as pointed out by both Marcus [Marcus, the main recent research results that highlight the potential
2020] and Kahneman [Kahneman et al., 2020]. applications of these related fields both in foundational and
Graph-based neural network models have also found a fer- applied AI and Computer Science problems. The interplay
tile field of application in software engineering: due to the between the two fields is beneficial to several areas. These
structured and unambiguous nature of code, it can be repre- range from combinatorial optimization/constraint satisfaction
sented naturally with graphs that are derived unambiguously to relational reasoning, which has been the subject of in-
via parsing. Several works have then utilised GNNs to per- creasing industrial relevance in natural language processing,
form analysis over graph representations of programs and life sciences and computer vision and image understanding
[Raghavan, 2019; Marcus, 2020]. This is largely due to the
obtained significant results. More specifically, Microsoft’s
“Deep Program Understanding” research program has used a fact that many learning tasks can be easily and naturally cap-
GNN variant called Gated Graph Sequence Neural Networks tured using graph representations, which can be seen as a
[Li et al., 2016] in a large number of applications, including generalization over the traditional sequential (RNN) and grid-
spotting errors, suggesting variable names, code completion based representations (CNN) in the family of deep learning
[Brockschmidt et al., 2019], as well as edit representation and building blocks. Finally, it is worth mentioning that the prin-
automatically applying edits to programs [Yin et al., 2019]. cipled integration of both methodologies (GNNs and NSC)
offers a richer alternative to the construction of trustful, ex-
4.2 Combinatorial Optimization and Constraint plainable and robust AI systems, which is clearly an invalu-
Satisfaction Problems able research endeavor.
Many combinatorial optimization problems are relational in
structure and thus are prime application targets to GNN-based Acknowledgements
models [Bengio et al., 2018]. For instance, [Khalil et al., This work is partly supported by CNPq and CAPES, Brazil -
2017] uses a GNN-like model to embed graphs and use these Finance Code 001.
4882
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20)
Survey Track
4883
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20)
Survey Track
[LeCun et al., 2015] Yann LeCun, Yoshua Bengio, and Ge- David L. Dill. Learning a SAT solver from single-bit
offrey Hinton. Deep learning. Nature, 521(7553):436– supervision. In ICLR, pages 1–11, 2019.
444, 2015. [Serafini and d’Avila Garcez, 2016] Luciano Serafini and
[Lemos et al., 2019] Henrique Lemos, Marcelo Prates, Pe- Artur d’Avila Garcez. Logic tensor networks: Deep learn-
dro Avelar, and Luı́s C. Lamb. Graph colouring meets ing and logical reasoning from data and knowledge. CoRR,
deep learning: Effective graph neural network models for abs/1606.04422, 2016.
combinatorial problems. In ICTAI, pages 879–885, 2019. [Stokes et al., 2020] Jonathan M. Stokes, Kevin Yang, Kyle
[Li et al., 2016] Yujia Li, Daniel Tarlow, Marc Swanson, and Wengong Jin et al. A deep learning ap-
Brockschmidt, and Richard Zemel. Gated graph se- proach to antibiotic discovery. Cell, 180, 2020.
quence neural networks. In ICLR, 2016. [Sutskever and Hinton, 2008] Ilya Sutskever and Geoffrey
[Li et al., 2018] Zhuwen Li, Qifeng Chen, and Vladlen Hinton. Using matrices to model symbolic relationship.
Koltun. Combinatorial optimization with graph convolu- In NIPS, pages 1593–1600, 2008.
tional networks and guided tree search. In NeurIPS, 2018. [Toenshoff et al., 2019] Jan Toenshoff, Martin Ritzert, Hin-
[Manhaeve et al., 2018] Robin Manhaeve, Sebastijan Du- rikus Wolf, and Martin Grohe. RUN-CSP: unsupervised
mancic, Angelika Kimmig, Thomas Demeester, and learning of message passing networks for binary constraint
Luc De Raedt. Deepproblog: Neural probabilistic logic satisfaction problems. CoRR, abs/1909.08387, 2019.
programming. In NeurIPS, 2018. [Townsend et al., 2019] Joe Townsend, Thomas Chaton, and
[Mao et al., 2019] Jiayuan Mao, Chuang Gan, Pushmeet João Monteiro. Extracting relational explanations from
Kohli, Joshua Tenenbaum, and Jiajun Wu. The Neuro- deep neural networks: A survey from a neural-symbolic
Symbolic Concept Learner: Interpreting scenes, words, perspective. IEEE T Neur Net Learn, pages 1–15, 2019.
and sentences from natural supervision. In ICLR, 2019. [van Harmelen and ten Teije, 2019] Frank van Harmelen and
[Marcus, 2020] Gary Marcus. The next decade in AI: Annette ten Teije. A boxology of design patterns for hy-
Four steps towards robust artificial intelligence. CoRR, brid learning and reasoning systems. J Web Eng, 18(1):97–
abs/1801.00631, 2020. 124, 2019.
[Palm et al., 2018] Rasmus Palm, Ulrich Paquet, and Ole [van Steenkiste et al., 2018] Sjoerd van Steenkiste, Michael
Winther. Recurrent relational networks. In NeurIPS, pages Chang, Klaus Greff, and Jürgen Schmidhuber. Relational
3372–3382, 2018. neural expectation maximization: Unsupervised discovery
[Prates et al., 2019] Marcelo Prates, Pedro Avelar, Henrique of objects and their interactions. In ICLR, 2018.
Lemos, Luı́s Lamb, and Moshe Vardi. Learning to Solve [Veličković et al., 2018] Petar Veličković, Guillem Cucurull,
NP-Complete Problems: A Graph Neural Network for De- Arantxa Casanova, Adriana Romero, Pietro Liò, and
cision TSP. In AAAI-2019, pages 4731–4738, 2019. Yoshua Bengio. Graph attention networks. In ICLR, 2018.
[Raghavan, 2019] Sriram Raghavan. 2020 AI predictions [Vinyals et al., 2015] Oriol Vinyals, Meire Fortunato, and
from IBM research. https://www.ibm.com/blogs/research/ Navdeep Jaitly. Pointer networks. In NIPS, pages 2692–
2019/12/2020-ai-predictions, 2019. Accessed 20/02/2020. 2700, 2015.
[Rocktäschel and Riedel, 2016] Tim Rocktäschel and Sebas- [Wei, 2019] Guo-Wei Wei. Protein structure prediction be-
tian Riedel. Learning knowledge base inference with neu- yond alphafold. Nature Mach. Intell., 1:336–337, 2019.
ral theorem provers. In AKBC@NAACL-HLT, 2016. [Wu et al., 2019] Zonghan Wu, Shirui Pan, Fengwen Chen,
[Santoro et al., 2017] Adam Santoro, David Raposo, David Guodong Long, Chengqi Zhang, and Philip S. Yu. A
Barrett, Mateusz Malinowski, Razvan Pascanu, Peter comprehensive survey on graph neural networks. CoRR,
Battaglia, and Tim Lillicrap. A simple neural network abs/1901.00596, 2019.
module for relational reasoning. In NIPS, pages 4967– [Yao et al., 2019] Liang Yao, Chengsheng Mao, and Yuan
4976, 2017.
Luo. Graph convolutional networks for text classification.
[Sato, 2020] Ryoma Sato. A survey on the expressive power In AAAI, pages 7370–7377, 2019.
of graph neural networks. CoRR, abs/2003.04078, 2020.
[Yin et al., 2019] Pengcheng Yin, Graham Neubig, Miltiadis
[Scarselli et al., 2008] Franco Scarselli, Marco Gori, Allamanis, and Alexander Gaunt Marc Brockschmidt.
Ah Tsoi, Markus Hagenbuchner, and Gabriele Monfar- Learning to represent edits. In ICLR, 2019.
dini. The graph neural network model. IEEE T Neural
[You et al., 2019] Jiaxuan You, Haoze Wu, Clark Barrett,
Networ, 20(1):61–80, 2008.
Raghuram Ramanujan, and Jure Leskovec. G2SAT: learn-
[Schlichtkrull et al., 2018] Michael Schlichtkrull, Thomas ing to generate SAT formulas. In NeurIPS, pages 10552–
Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, and 10563, 2019.
Max Welling. Modeling relational data with graph convo-
[Zhang et al., 2018] Ziwei Zhang, Peng Cui, and Wenwu
lutional networks. ESWC, pages 593–607, 2018.
Zhu. Deep learning on graphs: A survey. CoRR,
[Selsam et al., 2019] Daniel Selsam, Matthew Lamm, abs/1812.04202, 2018.
Benedikt Bünz, Percy Liang, Leonardo de Moura, and
4884