Neural-Symbolic Predicate Invention - Learning Relational Concepts From Visual Scenes 2023nesy
Neural-Symbolic Predicate Invention - Learning Relational Concepts From Visual Scenes 2023nesy
Abstract
The predicates used for Inductive Logic Programming (ILP) systems are usually elusive and need to be
hand-crafted in advance, which limits the generalization of the system when learning new rules without
sufficient background knowledge. Predicate Invention (PI) for ILP is the problem of discovering new
concepts that describe hidden relationships in the domain. PI can mitigate the generalization problem for
ILP by inferring new concepts, giving the system a better vocabulary to compose logic ruless. Although
there are several PI approaches for symbolic ILP systems, PI for NeSy ILP systems that can handle visual
input to learn logical rules using differentiable reasoning is relatively unaddressed. To this end, we
propose a neural-symbolic approach, NeSy-𝜋, to invent predicates from visual scenes for NeSy ILP
systems based on clustering and extension of relational concepts. (𝜋 denotes the abbrivation of Predicate
Invention). NeSy-𝜋 processes visual scenes as input using deep neural networks for the visual perception
and invents new concepts that support the task of classifying complex visual scenes. The invented
concepts can be used by any NeSy ILP systems instead of hand-crafted background knowledge. Our
experiments show that the PI model is capable of inventing high-level concepts and solving complex
visual logic patterns more efficiently and accurately in the absence of explicit background knowledge.
Moreover, the invented concepts are explainable and interpretable, while also providing competitive
results with state-of-the-art NeSy ILP systems based on given knowledge.
Keywords
Predicate Invention, Inductive Logic Programming, Neural Symbolic Artificial Intelligence
1. Introduction
Inductive Logic Programming (ILP) learns generalized logic programs from given data [1, 2, 3].
Unlike Deep Neural Networks (DNNs), ILP gains vital advantages, such as learning explana-
tory rules from small data. However, predicates for ILP systems are typically elusive and
need to be hand-crafted, requiring much prior knowledge to compose solutions. Predi-
cate invention (PI) systems invent new predicates that map new concepts from well de-
signed primitive predicates given by experts, which extend the expression of the ILP lan-
guage and consequently reduce the dependence on human experts [4]. A simple example
NeSy 2023, 17th International Workshop on Neural-Symbolic Learning and Reasoning, Certosa di Pontignano, Siena,
Italy
*
Corresponding author.
$ [email protected] (J. Sha); [email protected] (H. Shindo);
[email protected] (K. Kersting); [email protected] (D. S. Dhami)
© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
Workshop
Proceedings
http://ceur-ws.org
ISSN 1613-0073
CEUR Workshop Proceedings (CEUR-WS.org)
NeSy-ILP NeSy-π
Object:O1/O2,Shape:Sphere/Cube;
Const. Color:Blue/Pink/Green;
Direction:Left/Right/Front/Behind
Basic color/2/obj,color;shape/2/obj,shape;
Pred. dir/2/obj,obj
b_sp(O1):-color(O1,B),shape(O1,Sp).
B.K. g_sp(O1):-color(O1,G),shape(O1,Sp). -
g_cu(O1):-color(O1,G),shape(O1,Cu).
left_side/2/obj,obj
Pred. -
right_side/2/obj,obj
Figure 1: A logical pattern in 3D scenes can be learned by neural-symbolic ILP system. Left: Positive
images in the first row, negative images in the second row. The pattern is: a blue sphere is either on the
left side of a green sphere or on the right side of a green cube. Right: Comparison of language needs
between NeSy-ILP and NeSy-𝜋. Meaning of abbreviations in the table: Const-Constant. BK-Background
Knowledge. Ne-Pred-Neural Predicate. B-Blue, G-Green, Sp-Sphere, Cu-Cube.
is the concept of the blue sphere. In classical NeSy-ILP systems, this can be explained by
a clause blue_sphere(X):-color(X, blue), shape(X, sphere) given as background knowl-
edge. With PI systems, such concepts are learned from the basic predicates color(X, blue) and
shape(X, sphere) by concatenation.
Recently, several neural-symbolic ILP frameworks have been proposed [5, 6] that incorporate
DNNs for visual perception and learn explanation rules from raw inputs. NeSy-ILP systems
outperform pure neural-based baselines on visual reasoning, where the answers are inferred by
reasoning about objects’ attributes and their relations [7]. The major flaw of existing NeSy-ILP
systems is that they require all predicates beforehand, e.g. pretrained neural predicates or
explained by hand-crafted background knowledge. Moreover, the collection of background
knowledge is very costly as it needs to be provided by human experts or requires pre-training
of the neural modules with additional supervision. This severely limits the applicability of these
NeSy-ILP systems to different domains. In contrast, DNNs require a minimal prior and achieve
high performance by learning from data [8]. So the question arises: How can we implement a
NeSy-ILP system that can learn from less or no background knowledge?
To this end, we propose Neural-Symbolic Predicate Invention (NeSy-𝜋), a predicate invention
pipeline that can invent relational concepts given visual scenes by using primitive predicates,
reducing dependence on prior knowledge (see Fig. 1 for an example). NeSy-𝜋 discovers predicates
describing attributes of objects and their spatial relations in visual scenes, which is equivalent
to summarizing the required knowledge from scenes. The NeSy-𝜋 system (shown in Fig. 2)
consists of two iterative steps: 1) evaluating clauses consisting of existing predicates and 2)
inventing new predicates. To evaluate the existing predicates we propose two novel indicators
of necessity and sufficiency which are computed by mapping the examples to a 2D Euclidean
space. Only the predicates satisfying the above properties are considered in the invention of
NeSy- NeSy-ILP
Invent new predicates Eval Invent Minimize the classification loss
Invented
0.9: target(X):-
inv_pred3(O1,O2,O3,O5):- Predicates in(O1,X),in(O2,X),in(O3,X),
in(O1,X),in(O2,X),in(O3,X),
in(O5,X), in(O4,X),in(O5,X),
rho(O1,O2,rho0),rho(O1,O5,rho0). inv_pred3(O1,O2,O3,O5),
inv_pred3 inv_pred4(O2,O4,O5).
inv_pred4(O1,O2,O5):- 0.1: target(X):-
inv_pred4
︙
in(O1,X),in(O2,X),in(O5,X) in(O1,X),in(O2,X),in(O3,X),
rho(O1,O2,rho0),rho(O1,O5,rho0).
︙ ︙ in(O4,X),in(O5,X),
color(O1,pink),shape(O2,cube).
Visual ILP problem
: clause weights
: loss function
positive negative
Figure 2: NeSy-𝜋 architecture. NeSy-𝜋 invents relational concepts for visual scenes by iterating
evaluation and invention from primitive predicates and visual scenes (left). The invented predicates are
fed into a NeSy-ILP solver to learn classification rules. The NeSy-ILP solver generates rule candidates
using given predicates, performs differentiable reasoning using weighted clauses, and optimizes the
clause weights by gradient descent. To this end, the classification rules can be efficiently assembled
using the invented predicates (right). (Best viewed in color)
new predicates. NeSy-𝜋 can be integrated with existing NeSy-ILP systems providing them with
a rich vocabulary to generate efficient solutions.
Overall, we make the following important contributions: (1) We propose NeSy-𝜋, a NeSy
predicate invention framework compatible with NeSy ILP systems. NeSy-𝜋 extends NeSy ILP
systems by providing the capability to enrich their vocabularies by learning from data. (2) We
propose two criteria for evaluating clauses, which can then be used for predicate invention
tasks. (3) We develop 3D Kandinsky Patterns, which extends Kandisnky Patterns to the 3D
world, and it achieves faster generation than other environments, e.g. CLEVR [9].
−
Figure 3: Scoring strategy in NeSy-𝜋. NeSy-𝜋 takes a pair of positive and negative example (𝑥+ 𝑖 , 𝑥𝑖 )
as input and applies the reasoning function 𝑟𝒞 using clauses 𝒞 to compute a scalar value for each
example. 𝑟𝒞 computes the probability of classification label by differentiable forward reasoning using
clauses 𝒞 in a visual scene (left). The computed pair of scalar values is transferred to a 2D Euclidean
scoring space. 𝑓suff quantifies the sufficiency of the clauses 𝒞 for the dataset 𝒟, i.e. it counts the number
of training pairs in 𝒟 whose negative example is not entailed by 𝒞, and 𝑓ness quantifies the necessity of
the clauses 𝒞 for the dataset 𝒟, i.e. it counts the number of training pairs in 𝒟 whose positive example
is entailed by 𝒞 (right).
perform classification as follows: (i) conversion of a positive or negative visual scene 𝑥𝑖 into
a set of probabilistic atoms, (ii) differentiable forward reasoning using weighted clauses and
probabilistic atoms, and (iii) the reasoning result is used to predict the label of the visual scene.
Given set of clauses 𝒞, we assume that each subset of 𝒞 defines a new predicate, i.e. we
consider 𝒞p ∈ ℘(𝒞) defining predicate p where ℘(𝒞) is a powerset of 𝒞. To efficiently evaluate
−
clauses, NeSy-𝜋 maps each training pair (𝑥+ 𝑖 , 𝑥𝑖 ) ∈ 𝒟, to a point in a 2D Euclidean scoring
−
space (𝑟𝒞 (𝑥𝑖 ), 𝑟𝒞 (𝑥𝑖 )) ∈ [0, 1] , where 𝑟𝒞 is a reasoning function for clauses 𝒞, as shown in
+ 2
Fig. 3. 𝑟𝒞 (𝑥+
𝑖 ) represents the score of positive example 𝑥𝑖 , i.e., the probability that a positive
+
Predicate Invention. NeSy-𝜋 invents new predicates by composing clauses to define new
predicates using high scoring clauses in the two metrics. Given ℘(𝒞) and their evaluation
(a) Three Same (b) Two Pairs (c) Shape in Shape (d) Check Mark
Figure 4: 3D Kandinsky Patterns. Three Same: This pattern always have 3 objects inside, three of
them has same color and same shape. Two Pairs: This pattern always have 4 objects inside, two of
them has same color and same shape. The other two also have same color and same shape. Shape in
Shape: This pattern always have 4 objects with same shape inside, if they are cubes, their positions
form a shape of square; if they are spheres, their positions form a shape of triangle. Check Mark: This
pattern always have 5 objects, the position of 5 objects consists of an outline of a check mark. (For their
varieties and false patterns please see Appendix A).
scores, 𝒮ness and 𝒮suff , top-𝑘 sets of clauses for each score are selected from ℘(𝒞), i.e. 𝒞ness
* =
top_k (℘(𝒞), 𝒮ness ) and 𝒞suff = top_k (℘(𝒞), 𝒮suff ). Let 𝒞 = 𝒞ness ∪ 𝒞suff , i.e. 𝒞 is a set
* * * * *
of clause sets that have high scores in the metrics. New predicates are composed by taking
disjunction of high-scored clauses to define them. Let {𝑋1 , ..., 𝑋𝑛 } ∈ 𝒞 * , we compose the
following clauses to define a new predicate: 𝑋new ← body(𝑋1 ), . . . , 𝑋new ← body(𝑋𝑛 ),
where 𝑋new is an atom using a new predicate and body(𝑋) is body atoms of clause 𝑋.
For example, assume the following clauses defining inv_pred1 and inv_pred2 are selected:
then NeSy-𝜋 invents a new predicate by taking the disjunction of their bodies:
where inv_pred3 is an invented 1-ary predicate that can be interpreted as: “There is a pair of
objects with the same color (blue or red) in the image.”. By iterating the steps evaluate and invent,
NeSy-𝜋 discovers new high-level relational concepts useful for classifying given visual scenes.
The invented predicates are then fed into a NeSy ILP system to solve visual ILP problems.
4. Experimental Evaluation
We aim to answer the following questions: Q1: Does NeSy-𝜋 invent predicates from complex
visual scenes for NeSy ILP systems, reducing required background knowledge? Q2: Is NeSy-𝜋
computationally efficient?
(A1. Predicate Invention) : We designed four visual scenes with different patterns for 2D
and 3D Kandinsky patterns, respectively. (see Fig. 4). Each pattern has at least two varieties.
For example, the pattern three same can be three same blue sphere, or three same pink cube, or
something similar. All the patterns that we designed cannot be simply described by one or few
given basic predicate, but need several long clauses. The results are compared with no predicate
invention equipped model 𝛼ILP [6].
To solve such patterns, classical NeSy-ILP systems requires background knowledge and
predicates to help the system reason about the final rules. For example, to solve pattern
three same, NeSy-ILP can give the concept two_same(A, B) as background knowledge, then
chains two_same(A, B) twice, i.e. three_same(A, B, C) : −two_same(A, B), two_same(B, C).,
and two_same(A, B) is given as background knowledge described by other five BK predicates
and five BK clauses as follows:
two_same(A,B):-same_color_pair(A,B),same_shape_pair(A,B),in(A,X),in(B,X).
same_shape_pair(A,B):-shape(A,sq),shape(B,sq),in(A,X),in(B,X).
same_shape_pair(A,B):-shape(A,cir),shape(B,cir),in(A,X),in(B,X).
same_color_pair(A,B):-color(A,red),color(B,red),in(A,X),in(B,X).
same_color_pair(A,B):-color(A,blue),color(B,blue),in(A,X),in(B,X).
In NeSy-𝜋, such kind of background knowledge is not provided, but learned by the system.
The concepts are invented as new predicates. We only give six kinds of basic predicates as
background knowledge. They are: in/2/O, X: evaluate if the object O exists in the image X.
shape/2/O, S: evaluate if the object O has shape S. color/2/O, C: evaluate if the object O has
color C. phi/2/O1 , O2 , 𝜑: evaluate if the object O2 is located in the direction 𝜑 of O1 as calculated
in polar coordinate system. rho/2/O1 , O2 , 𝜌: evaluate if the distance between object O2 and O1
is around 𝜌 (For more details, please check Appendix A). A possible solution of pattern three
same from NeSy-𝜋 can be learned as follows
target(A,B):-in(O1,X),in(O2,X),inv_pred2(O1,O2).
inv_pred1(O1,O2):-color(O1,blue),color(O2,blue),in(O1,X),in(O2,X).
inv_pred1(O1,O2):-color(O1,red),color(O2,red),in(O1,X),in(O2,X).
inv_pred2(O1,O2):-in(O1,X),in(O2,X),inv_pred1(O1,O2),shape(O1,cir),shape(O2,cir).
inv_pred2(O1,O2):-in(O1,X),in(O2,X),inv_pred1(O1,O2),shape(O1,sq),shape(O2,sq).
In this case, the predicate target(A, B) corresponds the concept two_same(A, B) and it
takes five clauses to explain it, which are learned instead of given. The predicates
inv_pred1(A, B) and inv_pred2(A, B) are invented predicates. The other three predicates
(shape(A, Y), color(A, Z), in(A, X)) are given beforehand. For more examples, please check
Appendix. A.
The result of the evaluation of each pattern is shown in Tab. 1. For each pattern, we give the
system only basic predicates and see that our system can successfully invent new predicates
for complex visual patterns, resulting in improved accuracy. For most of the patterns, NeSy-𝜋
successfully finds the target rules by inventing new predicates and using them to describe the
target patterns, while 𝛼ILP can only use existing predicates to describe them and fails at finding
the correct target clauses. However, since the variety of each pattern is controlled in a small
scale (limited choice of colors, shapes and positions), it is possible to find the target clauses that
describe the pattern comprehensively, which also explains the reason for the accuracy of 1.0.
The pattern full house has many more varieties (3024 varieties for positive patterns). Therefore
Iter. Time (minutes) Accuracy # Preds
Dataset Patterns # obj
𝛼 𝜋 𝛼 𝜋 𝛼 𝜋 𝛼 𝜋
Red Triangle 2 5 3 9.18 5.58 0.83 1.00 16 16 (+2)
Two Pairs 4 8 4 11.27 8.91 0.64 1.00 4 4 (+5)
2D KP
Full House 4 8 8 14.79 4.12 0.83 0.90 4 4(+8)
Check Mark 5 5 3 22.35 10.32 0.73 1.00 22 22 (+4)
Same 3 5 3 0.91 0.72 0.55 1.00 4 4 (+3)
Two Pairs 4 5 3 4.55 1.53 0.64 1.00 4 4 (+5)
3D KP
Shape of Shape 4 4 2 14.69 3.17 0.79 1.00 11 11 (+2)
Check Mark 5 4 2 19.36 10.64 0.87 1.00 12 12 (+1)
Table 1
Experiment result on 2D Kandinsky Patterns and 3D Kandinsky Patterns. The dataset of each experiment
has 64 PN pairs. 𝛼 and 𝜋 denote 𝛼ILP system (without background knowledge) and NeSy-𝜋 respectively.
The brackets in last column represent the number of invented predicates used in the target clauses.
it is impossible to consider every single case, which is very time consuming. Therefore, no
predicate was generated to describe it comprehensively. Nevertheless, the PI module improves
the accuracy up to 0.9, which proves that the invented predicates improve the accuracy in
incomplete description.
(A2. Computational Efficiency) : The runtime of NeSy-𝜋 depends mainly on the number
of iterations needed to find the target clause. In each iteration, a predicate is added to each
clause to obtain more complex rules. The higher the number of iterations, the more complex
the rule becomes. Tab. 1 shows the time comparison of NeSy-𝜋 with 𝛼ILP (without background
knowledge). Since the invented predicates entail the previously learned clauses, they are far
more informative than the clauses learned by classical NeSy systems like 𝛼ILP. Therefore,
NeSy-𝜋 is able to converge to the target clause faster, making it more efficient.
5. Related Work
Inductive Logic Programming [1, 10, 2, 3] has emerged at the intersection of machine learning
and logic programming. Many ILP frameworks have been developed, e.g., FOIL [11], Pro-
gol [10], ILASP [12], Metagol [13, 14], and Popper [15]. Recently, NeSy ILP (Differentiable
ILP) frameworks have been developed [5, 16, 6] to deal with visual inputs in ILP. Predicate
invention (PI) has been a long-standing problem for ILP and many methods have been devel-
oped [17, 18, 14, 19, 20, 21, 22, 23]. Although these methods can invent predicates for symbolic
inputs, predicate invention for Ne-Sy ILP systems has not been addressed. NeSy-𝜋 invents
predicates given visual scenes using DNNs as a perception model, and to this end NeSy-ILP
systems achieve better performance with less background knowledge or pre-training of neural
modules. Meta rules, which define a template for rules to be generated, have been used for
dealing with new predicates in (NeSy) ILP systems [5, 24]. NeSy-𝜋 achieves memory-efficient
predicate invention system by performing scoring and pruning of candidates from given data,
and this is crucial to handle complex visual scenes in NeSy ILP systems since they are are
memory-intensive [5].
6. Conclusion
We proposed an approach for Neural-Symbolic Predicate Invention (NeSy-𝜋). NeSy-𝜋 is able to
find the new knowledge and summarize it as new predicates, thus it requires less background
knowledge for reasoning. We show that our NeSy-𝜋 model uses only basic neural predicates
for finding target clauses, i.e., no further background knowledge is required. Compared to
the NeSy-ILP system without PI module, our system provides significantly more accurate
classifications. Extending NeSy-𝜋 to more complex problems, e.g., with more objects in the
scene, is an interesting avenue for future work. Developing an efficient prune strategy for
finding longer clauses is another interesting direction.
References
[1] S. H. Muggleton, Inductive logic programming, New Gener. Comput. 8 (1991) 295–318.
[2] S.-H. Nienhuys-Cheng, R. d. Wolf, J. Siekmann, J. G. Carbonell, Foundations of Inductive
Logic Programming, 1997.
[3] A. Cropper, S. Dumancic, R. Evans, S. H. Muggleton, Inductive logic programming at 30,
Mach. Learn. 111 (2022) 147–172.
[4] S. Muggleton, W. L. Buntine, Machine invention of first order predicates by inverting
resolution, in: Proceedings of the Fifth International Conference on Machine Learning,
ML’88, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1988, p. 339–352.
[5] R. Evans, E. Grefenstette, Learning explanatory rules from noisy data, J. Artif. Intell. Res.
61 (2018) 1–64.
[6] H. Shindo, V. Pfanschilling, D. S. Dhami, K. Kersting, 𝛼ilp: thinking visual scenes as
differentiable logic programs, Mach. Learn. (2023).
[7] J. Mao, C. Gan, P. Kohli, J. B. Tenenbaum, J. Wu, The neuro-symbolic concept learner:
Interpreting scenes, words, and sentences from natural supervision, in: International
Conference on Learning Representations (ICLR), 2019.
[8] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polo-
sukhin, Attention is all you need, Advances in neural information processing systems
(NeurIPS) 30 (2017).
[9] J. Johnson, B. Hariharan, L. Van Der Maaten, L. Fei-Fei, C. Lawrence Zitnick, R. Girshick,
Clevr: A diagnostic dataset for compositional language and elementary visual reasoning,
in: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR),
2017, pp. 2901–2910.
[10] S. Muggleton, Inverse Entailment and Progol, New Generation Computing, Special issue
on Inductive Logic Programming 13 (1995) 245–286.
[11] J. R. Quinlan, Learning logical definitions from relations, Mach. Learn. 5 (1990) 239–266.
[12] M. Law, A. Russo, K. Broda, Inductive learning of answer set programs, in: E. Fermé,
J. Leite (Eds.), Logics in Artificial Intelligence - 14th European Conference (JELIA), volume
8761 of Lecture Notes in Computer Science, 2014, pp. 311–325.
[13] A. Cropper, S. H. Muggleton, Metagol system, https://github.com/metagol/metagol, 2016.
URL: https://github.com/metagol/metagol.
[14] A. Cropper, R. Morel, S. Muggleton, Learning higher-order logic programs, Mach. Learn.
109 (2019) 1289 – 1322.
[15] A. Cropper, R. Morel, Learning programs by learning from failures, Mach. Learn. 110
(2021) 801–856.
[16] H. Shindo, M. Nishino, A. Yamamoto, Differentiable inductive logic programming for
structured examples, in: Proceedings of the 35th AAAI Conference on Artificial Intelligence
(AAAI), 2021, pp. 5034–5041.
[17] I. Stahl, Predicate invention in ilp — an overview, in: P. B. Brazdil (Ed.), Machine Learning:
ECML-93, Springer Berlin Heidelberg, Berlin, Heidelberg, 1993, pp. 311–322.
[18] D. Athakravi, K. Broda, A. Russo, Predicate Invention in Inductive Logic Programming,
in: 2012 Imperial College Computing Student Workshop, volume 28 of OpenAccess Series
in Informatics (OASIcs), Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl,
Germany, 2012, pp. 15–21.
[19] C. Hocquette, S. H. Muggleton, Complete bottom-up predicate invention in meta-
interpretive learning, in: Proceedings of the Twenty-Ninth International Joint Conference
on Artificial Intelligence, (IJCAI), International Joint Conferences on Artificial Intelligence
Organization, 2020, pp. 2312–2318.
[20] S. Kok, P. M. Domingos, Statistical predicate invention, in: International Conference on
Machine Learning (ICML), 2007.
[21] S. Kramer, Predicate invention : A comprehensive view 1, 2007.
[22] A. Cropper, R. Morel, S. H. Muggleton, Learning higher-order programs through predicate
invention, Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) (2020)
13655–13658.
[23] A. Cropper, R. Morel, Predicate invention by learning from failures, arXiv
Preprint:2104.14426 (2021).
[24] T. KAMINSKI, T. EITER, K. INOUE, Exploiting answer set programming with external
sources for meta-interpretive learning, Theory and Practice of Logic Programming 18
(2018) 571–588. doi:10.1017/S1471068418000261.
?/ ?* @1
@-
@,
?. ?+ @+
@*
!"#$% &!#$'(#)
?1 ?,
!"#$% !"#$%
?- ?0
Figure 5: Spatial neural predicates explanation by examples. Left: Evaluation of the directions of red
circle with repect to green circle using predicate 𝜑1 , ..., 𝜑7 . The red circle is in the direction of 𝜑1 . Right:
Evaluation of distance of red circle to the green circle using the predicate 𝜌1 ,...,𝜌5 . The red circle is
located at the distance of 𝜌2 . For 3D KP, we assume all the object placed on a flat surface with same
height. Thus their positions can be mapped to 2D space with ignoring the height axis. The positions of
each object can be evaluated from depth map of 3D scenes.
.
A. Experiment Setting.
A.1. Spatial Neural Predicates
To represent the spatial relationships between two objects, we have developed two types
of spatial neural predicates, i.e., rho(O1, O2, 𝜌) and phi(O1, O2, 𝜑), which are given as basic
knowledge. They are used for new predicates invention in some patterns associated with spatial
concepts.
𝑟ℎ𝑜(𝑂1, 𝑂2, 𝜌) The neural predicate rho(O1, O2, 𝜌) describes the distance between object 𝑂1
and 𝑂2, where 𝜌 is the measure. Let 𝑊 denotes the length of image diagonal. Obviously, 𝑊 is
the maximum possible distance between 𝑂1 and 𝑂2. To symbolize the distance between 𝑂1
and 𝑂2, we divide 𝑊 into 𝑁𝜌 parts and denote each part by 𝜌𝑖 , 1 ≤ 𝑖 ≤ 𝑁𝜌 (as shown in Fig. 5
right). For example, if 𝑁𝜌 = 4, then 𝜌1 = 0.25𝑊, 𝜌2 = 0.5𝑊, 𝜌3 = 0.75𝑊, 𝜌4 = 1.0𝑊 . The
actual distance between two objects can be symbolized by 𝜌1 , 𝜌2 , 𝜌3 or 𝜌4 .
𝑝ℎ𝑖(𝑂1, 𝑂2, 𝜑) The predicate 𝑝ℎ𝑖(𝑂1, 𝑂2, 𝜑) describes the direction of 𝑂2 with respect to
𝑂1. Based on the polar coordinate system, we consider the first argument 𝑂1 as the original
point, then the direction of 𝑂2 is in the range [0, 360). We symbolize all directions with
𝜑1 , ..., 𝜑𝑁 , where 𝑁 is a hyper-parameter. The actual direction can be classified to one of these
symbols (as shown in Fig. 5).
A.2. Visual Patterns
Fig. 6 illustrates the four training patterns in the 2D Kandinsky dataset. Fig. 7 illustrates the
four training patterns in 3D Kandinsky dataset.
Figure 6: Kandinsky Patterns. Two Pairs: In this pattern there are always four objects, two of them
have the same color and the same shape, the other two have the same shape and a different color.
Red Triangle: This pattern always contains a red triangle and another object with different color and
shape nearby, the remaining four objects are random. Check Mark: This pattern always consists of five
objects whose positions form an outline of check mark. It includes 3 symmetrical patterns. We consider
two of them with convex shapes as true and the other two with concave shape as false. Full House: In
this pattern, there are always four different objects. i.e. none of two objects are exactly the same, they
either have a different color or a different shape. In the false patterns, at least two objects are exactly
the same, i.e., they have same color and the same shape. Actually, full house is the opposite case of
one pair, the system failed on this pattern (with 0.90 accuracy). The reason is that there are too many
different variants in the true patterns and the probability of each variant is low, so they are difficult to
capture. To invent a predicate that includes all patterns, we need to select more clauses in each iteration
and use them to invent a new predicate. But considering more clauses is very memory intensive. One
solution may be to start with false patterns, which consists one pair, has less variety, and is therefore
easy to describe. Then the truth pattern is the negation of the false pattern.
(a) True (b) True (c) False (d) False
Figure 7: Training examples in 3D Kandinsky pattern dataset. From top to bottom, Three Same: Three
objects with the same color and the same shape. Two Pairs: Four objects, two of which have the same
color and shape, the other two also have the same color and shape. Shape in Shape: Four objects with
the same shape. If all objects are cubes, their positions form a square shape. If all objects are spheres,
their positions form a triangle shape. Check Mark: Five objects form a check mark shape, the mirrored
or symmetrical shapes are also considered as positive.
(three same)
kp(X):-in(O1,X),in(O2,X),in(O3,X),inv_pred100(O1,O2,O3),inv_pred451(O1,O2,O3).
inv_pred451(O1,O2,O3):-in(O1,X),in(O2,X),in(O3,X),inv_pred2(O1,O2),
inv_pred2(O1,O3),inv_pred2(O2,O3).
inv_pred100(O1,O2,O3):-in(O1,X),in(O2,X),in(O3,X),shape(O1,cube),
shape(O2,cube),shape(O3,cube).
inv_pred100(O1,O2,O3):-in(O1,X),in(O2,X),in(O3,X),shape(O1,sphere),
shape(O2,sphere),shape(O3,sphere).
inv_pred2(O1,O2):-color(O1,blue),color(O2,blue),in(O1,X),in(O2,X).
inv_pred2(O1,O2):-color(O1,green),color(O2,green),in(O1,X),in(O2,X).
inv_pred2(O1,O2):-color(O1,pink),color(O2,pink),in(O1,X),in(O2,X).
(two pairs)
kp(X):-in(O1,X),in(O2,X),in(O3,X),in(O4,X),inv_pred38(O1,O2,O3,O4),
inv_pred9(O1,O2),inv_pred9(O3,O4).
inv_pred38(O1,O2,O3,O4):-in(O1,X),in(O2,X),in(O3,X),in(O4,X),inv_pred2(O3,O4),
inv_pred3(O1,O2),inv_pred4(O3,O4).
inv_pred9(O1,O2):-in(O1,X),in(O2,X),shape(O1,cube),shape(O2,cube).
inv_pred9(O1,O2):-in(O1,X),in(O2,X),shape(O1,sphere),shape(O2,sphere).
inv_pred2(O1,O2):-color(O1,blue),in(O1,X),in(O2,X),shape(O1,sphere).
inv_pred2(O1,O2):-color(O1,green),color(O2,green),in(O1,X),in(O2,X).
inv_pred2(O1,O2):-color(O1,pink),color(O2,pink),in(O1,X),in(O2,X).
inv_pred3(O1,O2):-color(O1,blue),color(O2,blue),in(O1,X),in(O2,X).
inv_pred3(O1,O2):-color(O1,green),color(O2,green),in(O1,X),in(O2,X).
inv_pred3(O1,O2):-color(O1,pink),color(O2,pink),in(O1,X),in(O2,X).
inv_pred4(O1,O2):-color(O1,blue),color(O2,blue),in(O1,X),in(O2,X).
inv_pred4(O1,O2):-color(O1,blue),in(O1,X),in(O2,X),shape(O1,cube).
inv_pred4(O1,O2):-color(O1,green),color(O2,green),in(O1,X),in(O2,X).
inv_pred4(O1,O2):-color(O1,pink),color(O2,pink),in(O1,X),in(O2,X).
(shape in shape)
kp(X):-in(O1,X),in(O2,X),in(O3,X),in(O4,X),inv_pred7(O1,O2,O3,O4),
phi(O2,O4,phi2),rho(O1,O4,rho1).
inv_pred7(O1,O2,O3,O4):-in(O1,X),in(O2,X),in(O3,X),in(O4,X),inv_pred1(O1,O2,O3),
phi(O1,O4,phi3).
inv_pred7(O1,O2,O3,O4):-in(O1,X),in(O2,X),in(O3,X),in(O4,X),inv_pred1(O1,O2,O3),
shape(O1,sphere).
inv_pred1(O1,O2,O3):-in(O1,X),in(O2,X),in(O3,X),rho(O2,O3,rho0).
inv_pred1(O1,O2,O3):-in(O1,X),in(O2,X),in(O3,X),shape(O1,cube).
(check mark)
kp(X):-in(O1,X),in(O2,X),in(O3,X),in(O4,X),in(O5,X),inv_pred3(O2,O3,O4,O5),
rho(O2,O5,rho0),rho(O3,O4,rho1).
inv_pred3(O1,O2,O3,O5):-in(O1,X),in(O2,X),in(O3,X),in(O5,X),phi(O1,O2,phi2),
phi(O2,O5,phi1).
inv_pred3(O1,O2,O3,O5):-in(O1,X),in(O2,X),in(O3,X),in(O5,X),phi(O2,O5,phi1),
phi(O3,O5,phi4).