Inductive_Logic_Programming_via_Differen
Inductive_Logic_Programming_via_Differen
Abstract
1 Introduction
Despite the tremendous success of the deep neural networks, they are still prone to some limitations.
These systems, in general, do not construct any explicit and symbolic representation of the algorithm
they learn. In particular, the learned algorithm is implicitly stored in thousands or even millions of
weights, which is typically impossible for human agents to decipher or verify. Further, MLP networks
are suitable when large training examples are available. Otherwise, they usually do not generalize
well. One of the machine learning approaches that addresses these shortcomings is Inductive Logic
Programming (ILP). In ILP, explicit rules and symbolic logical representations can be learned using
only a few training examples. Further, the solutions usually generalize well.
The idea of using neural networks for learning ILP has attracted a lot of research in recent years (
Hölldobler et al. [1999]; França et al. [2014]; Serafini and Garcez [2016]; Evans and Grefenstette
[2018]). Most neural ILP solvers work by propositionalization of the relational data and use the
neural networks for the inference tasks. As such, they usually are superior to classical ILP solvers in
handling missing or uncertain data. However, in many of the proposed neural solvers, the learning
is not explicit (e.g. connectionist network (Bader et al. [2008]). Further, these methods do not
usually support features such as inventing new predicates and learning recursive rules for predicates.
Additionally, in almost all of the past ILP solvers, the space of possible symbolic rules for each
predicate is significantly restricted and reduced by introducing some types of rule templates before
searching through this space for possible candidate. (e.g., mode declarations in Progol (Muggleton
[1995]) and meta-rules in Metagol). In fact, as stated in Evans and Grefenstette [2018], the need for
using program templates to generate a limited set of viable candidate clauses in forming the predicates
is the key weakness in all existing (past) ILP systems (neural or non-neural), severely limiting the
solution space of a problem. The contribution of this paper is as follows: we introduce a new neural
2
xi mi Fc xi mi Fd
0 0 1 0 0 0
0 1 0 0 1 0
1 0 1 1 0 0
1 1 1 1 1 1
(a) (b)
Figure 1: Truth table of Fc (·) and Fd (·) functions
as Metagol and dILP (Evans and Grefenstette [2018])), the possible clauses are generated via a
template and the generated clauses are tested against positive and negative examples. Since the space
of possible clauses are vast, in most of these systems, very restrictive template rules are employed
to reduce the size of the search space. For example, dILP allows for clauses of at most two atoms
and only two rules per each predicate. In the above example, since |I2lt | = 18, this corresponds
to considering only 18 2
2 items from all the possible clauses (i.e., the power set of Ilt ). Metagol
employs a more flexible approach by allowing the programmer to define the rule templates via some
meta-rules. However, in practice, this approach does not resolve the issue completely. Even though it
allows for more flexibility, defining those templates is itself a complicated task which requires expert
knowledge and possible trials and it can still lead to exponentially large space of possible solutions.
Later we will consider examples where these kinds of approaches are practically impossible.
Alternatively, we propose a novel approach which allows for learning any arbitrary Boolean function
involving several atoms from the set Iip . This is made possible via a set of differentiable neural
functions which can explicitly learn and represent Boolean functions.
Any Boolean functions can be learned (at least in theory) via a typical MLP network. However, since
the corresponding logic is stored implicitly in weights of the MLP network, it is very difficult (if
not impossible) to decipher the actual learned function. Therefore, MLP is not a good candidate to
use in our ILP solver. Our intermediate goal is to design new neuronal functions which are capable
of learning and representing Boolean functions in an explicit manner. Since any Boolean functions
can be expressed in a Disjunctive Normal Form (DNF) or alternatively in a Conjunctive Normal
Form (CNF), we first introduce novel conjunctive and disjunctive neurons. We can then combine
these elementary functions to form more expressive constructs such as DNF and CNF functions.
We use the extension of the Boolean to real values in the range [0, 1] and we use 1 (True) and 0
(False) representations for the two states of a binary variable. We also define the fuzzy unary and
dual Boolean functions of two Boolean variables x and y as:
x̄ = 1 − x , x ∧ y = xy x ∨ y = 1 − (1 − x)(1 − y) (6)
This algebraic representation of the Boolean logic allows us to manipulate the logical expressions
via Algebra. Let xn ∈ {0, 1}n be the input vector for our logical neuron. In order to implement
the conjunction function, we need to select a subset in xn and apply the fuzzy conjunction (i.e.
multiplication) to the selected elements. To this end, we associate a trainable Boolean membership
weight mi to each input elements xi from vector xn . Further, we define a Boolean function Fc (xi , mi )
with the truth table as in Fig.1a which is able to include (exclude) each element in (out of) the
conjunction function. This design ensures the incorporation of each element xi in the conjunction
function only when the corresponding membership weight is 1. Consequently, the neural conjunction
function fconj can be defined as:
Yn
fconj (xn ) = Fc (xi , mi )
i=1
where, Fc (xi , mi ) = xi mi = 1 − mi (1 − xi ) , (7)
To ensure the membership weights remain in the range [0, 1] we apply a sigmoid function to corre-
sponding trainable weights wi in the neural network, i.e., mi = sigmoid(c wi ) where c ≥ 1 is a
constant. Similar to perceptron layers, we can stack N conjunction neurons to create a conjunction
layer of size N . This layer has the same complexity as a typical perceptron layer without incorpo-
rating any bias term. More importantly, this implementation of the conjunction function makes it
3
possible to interpret the learned Boolean function directly from the values of the membership weights
mi . The disjunctive neuron can be defined similarly but using the function Fd with truth table as
depicted in Fig.1b, i.e.:
n
Y n
Y
fdisj (xn ) = Fd (xi , mi ) = 1 − (1 − Fd (xi , mi )) ,
i=1 i=1
where, Fd (xi , mi ) = xi mi (8)
We call a complex networks made by combining the elementary conjunctive and disjunctive neurons,
a dNL (differentiable Neural Logic) network. For example, by cascading a conjunction layer with one
disjunctive neuron we can form a dNL-DNF construct. Similarly, a dNL-CNF can be constructed.
We associate a dNL (conjunction) function Fpi to ith rule of every intensional predicate p in our
logic program. intensional predicates can use other predicates and variables in contrast to the
extensional predicates which are entirely defined by the ground atoms. We view the membership
weights m in the conjunction neuron as a Boolean flags that indicates whether each atom in a rule is
off or on. In this view, the problem of ILP can be seen as finding an assignment to these membership
Boolean flags such that the resulting rules applied to the background facts, entail all positive examples
and reject all negative examples. However, by allowing these membership weights to be learnable
weights, we are formulating a continuous relaxation of the satisfiability problem. This approach is
in some ways similar to the approach in dILP Evans and Grefenstette [2018], but differs in how we
define Boolean flags. In dILP, a Boolean flag is assigned to each of the possible combinations of
two atoms from the set Iip . They then use a softmax network to learn the set of winning clauses and
they interpret those weights in the softmax network as the Boolean flags that select one clause out
of possible clauses. However, as mentioned earlier, in our approach the membership weights of the
conjunction (or any other logical function from dNL) can be directly interpreted as the flags in the
satisfiability interpretation.
We are now able to formulate the ILP problem as an end-to-end differentiable neural network.
(t)
We associate a (fuzzy) value vector for each predicate p at time-stamp t as Xp which holds
the (fuzzy) Boolean values of all the ground atoms involving that predicate. For the example
(t)
in consideration (i.e., lessThan), the vector Xinc includes the Boolean values for atoms in
{inc(0, 0), inc(0, 1), . . . , inc(4, 4)}. For extensional predicates, these values will be constant during
(t)
the forward chain of reasoning, but for intensional predicates such as lt, the values of the Xp
i
would change during the application of the predicate rules Fp at each time-stamp. Let G be the set
of all ground atoms and Gp be the subset of G associated with predicate p. For every ground atom
e ∈ Gp and for every rule Fpi , let Θip (e) be the set of all the substitutions of the constants into the
variables Vpi which would result in the atom e. In the lessThan program (see page 2) for example,
for the ground atom lt(0, 2), the set of all substitutions corresponding to the second rule (i.e., i = 2)
is given by Θ2lt ( lt(0, 2) ) = {{A 7→ 0, B 7→ 2, C 7→ 0}, . . . , {A 7→ 0, B 7→ 2, C 7→ 4}}. We can
now define the one step forward inference formula as:
∀e ∈ Gp , Xp(t+1) [e] = Fam (Xp(t) [e], F(e)) , where (9a)
_ _
F(e) = Fpi ( Iip |θ ) (9b)
i θ∈Θip (e)
For the most practical purposes we can assume that the amalgamate function Fam is simply the
fuzzy disjunction function, but we will consider other options in the Appendix B. Here, for brevity
we did not introduce the indexing notations in (9). By Xp [e], we actually mean Xp [index(Xp , e)]
where index(Xp , e) returns the index of the corresponding element of vector Xp . Further, each Fpi
is the corresponding predicate rule function implemented as a differentiable dNL network (e.g., a
conjunctive neuron). In each substitution, this function is applied to the input vector Iip |θ which is
evaluated for the substitution θ. As an example, for the ground atom lt(0, 2) in the previous example,
4
Figure 2: The diagram for one step forward chaining for predicate lt where Flt is implemented using a
dNL-DNF network.
2.4 Training
We obtain the initial values of the valuation vectors from the background atoms. i.e.,
∀p, ∀e ∈ Gp , if e ∈ B, Xp(0) [e] = 1, else Xp(0) [e] = 0 (10)
(t )
We interpret the final values of Xp max [e] (after tmax steps of forward chaining) as the conditional
probability for the value of atom given the model parameters and we define the loss as the average
cross-entropy loss between the ground truth provided by the positive and negative examples for the
(t )
corresponding predicate p) and Xp max which is the algorithm output after tmax forward chaining
steps. We train the model using ADAM (Kingma and Ba [2014]) optimizer to minimize the aggregate
loss over all intensional predicates with the learning rate of 0.001 (in some cases we may increase the
rate for faster convergence). After the training is completed, a zero cross-entropy loss indicates that
the model has been able to satisfy all the examples in the positive and negative sets. However, there
might exist a few atoms with membership weights of ’1’ in the corresponding dNL network for a
predicate which are not necessary for the satisfiability of the solution. However, since there is no
gradient at this point, those terms cannot be directly removed during the gradient descent algorithm
unless we include some penalty terms. In practice, we use a simpler approach. In the final stage of
algorithm we remove each atom if by switching its membership variable from ’1’ to ’0’, the loss
function does not change.
In the majority of the ILP systems, the body of the rules are defined as the conjunction of some
atoms. However,in general the predicate rules can be defined as any arbitrary Boolean function of
the elements of set Ip . One of the main reasons for restricting the form of these rules in most ILP
implementations is the vast space of possible Boolean functions that is needed to be considered. For
example, by restricting the form of rule’s body to a pure Horn clause we reduce the space of possible
L
functions from 22 to only 2L , where L = |Iip |. Most ILP systems apply much further restrictions.
For example, dILP limits the possible combinations to the L2 possible combinations of terms made
of two atoms. In contrast, in our proposed framework via dNL networks, we are able to learn arbitrary
5
L
functions with any number of atoms in the formula. Though some functions from the possible 22
functions require exponentially large number of terms if expressed in DNF form for example, in
most of the typical scenarios, a dNL-DNF function with reasonable number of disjunction terms is
capable of learning the required logic. Further, even though our approach allows for multiple rules
per predicates, in most scenarios we can learn all the rules for a predicate as one DNF formula instead
of learning separate rules. Finally, we can easily allow for including the negation of each atom in
the formula by concatenating the vector Ip |θ and its fuzzy negation, i.e., (1.0 − Ip |θ ) as the input to
the Fpi function. This would only double the number of parameters of the model. In contrast, in
most other implementations of ILP, this would increase the number of parameters and the problem
complexity at much higher rates.
We have implemented1 the dNL-ILP solver model using Tensorflow (Abadi et al. [2016]). In the
previous sections, we have outlined the process in a sequential manner. However, in the actual
implementations we first create index matrices using all the background facts before starting the
optimization task. Further, all the substitution operations for each predicate (at each time-stamp)
are carried using a single gather function. Finally, at each time-stamp and for each intensional
predicate, all instances of applying (executing) the neural function Fpi are carried in a batch operation
and in parallel. The proposed algorithm allows for a very efficient learning of arbitrary complex
formulas and significantly reduces the complexity that arises in the typical ILP systems when
increasing the number of possible atoms in each rule. Indeed, in our approach, usually there is no
need for any tuning and parameter specification other than the size of the DNF network (total number
of rules for a predicate) and specifying the number of existentially quantified variables for each rule.
On the other hand, since we use a propositionalization step (typical to almost all neural ILP solvers),
special care is required when the number of constants in the program is very large. While for the
extensional and target predicates we can usually define the vectors Xp corresponding only to the
provided atoms in the sets B, P and N , for the auxiliary predicates we may need to consider many
intermediate ground atoms not included in the program. In such cases, when the space of possible
atoms is very large, we may need to restrict the set of possible ground atoms.
3 Past Works
Addressing all the important past contributions in ILP is a tall order and given the limited space we
will only focus on a few recent approaches that are in some ways relevant to our work. Among the ILP
solvers that are capable of learning recursive predicates (in an explicit and symbolic manner), the most
notable examples are Metagol (Cropper and Muggleton [2016]) and dILP (Evans and Grefenstette
[2018]). Metagol is a powerful method that is capable of learning very complex tasks via using the
user-provided meta-rules. The main issue with Metagol is that while it allows for some flexibility
in terms of providing the meta-rules, it is not always clear how to define those meta formulas. In
practice, unless the expert already has some knowledge regarding the form of the possible solution,
it would be very difficult to use this method. dILP, on the other hand, is a neural ILP solvers that,
like our method, uses propositionalization of the data and formulates a differentiable neural ILP
solver. Our proposed algorithm is in many regards similar to dILP. However, because of the way it
define templates, dILP is limited to learning simple predicates with arity of at most two and with
maximum two atoms in each rule. CILP++ (França et al. [2014]) is another noticeable neural ILP
solver which also uses propositionalization similar to our method and dILP. CLIP++ is a very efficient
algorithm and is capable of learning large scale relational datasets. However, since this algorithm
uses the bottom clause propositionalization, it is not able to learn recursive predicates. In dealing with
uncertain data and specially in the tasks involving classification of the relational datasets, the most
notable framework is the probabilistic ILP (PILP) (De Raedt and Kersting [2008]) and its variants and
also Markov Logic Networks (MLN) Richardson and Domingos [2006]. These types of algorithms
extend the framework of ILP to handle uncertain data by introducing a probabilistic framework. Our
proposed approach is related to PILP in that we also associate a real number to each atom and each
rule in the formula. We will compare the performance of our method to this category of statistical
relational learners later in our experiment. The methods in this category in general are not capable of
learning recursive predicates.
1
The python implementation of dNL-ILP is available at https://github.com/apayani/ILP
6
4 Experiments
The ability to learn recursive predicates is fundamental in learning a variety of algorithmic tasks
(Tamaddoni-Nezhad et al. [2015]; Cropper and Muggleton [2015]). In practice, Metagol is the only
notable ILP solver which can efficiently learn recursive predicates (via meta-rule templates). Our
evaluations2 show that the proposed dNL-ILP solver can learn a variety of discrete algorithmic tasks
involving recursion very efficiently and without the need for predefined meta-rules. Here, we briefly
explore two synthetic learning tasks before considering large-scale tasks involving relational datasets.
We use dNL-ILP solver for learning the predicates mul/3 for decimal multiplication using only the
positive and negative examples. We use C = {0, 1, 2, 3, 4, 5, 6} as constants and our background
knowledge is consisted of the extensional predicates {zero/1, inc/2, add/3}, where inc/2 defines
increment of one and add/3 defines the addition. The target predicate is mul(A, B, C) and we allow
for using 5 variables (i.e., num_vari (mul) = 5) in each rule. We use a dNL-DNF network with 4
disjunction terms (4 conjunctive rules) for learning Fmul . It is worth noting that since we do not
know in advance how many rules would be needed, we should pick an arbitrary number and increase
in case the ILP program cannot explain all the examples. Further, we set the tmax = 8. One of the
solutions that our model finds is:
mul(A, B, C) ← zero(B), zero(C)
mul(A, B, C) ← mul(A, D, E), inc(D, B), add(E, A, C)
4.2 Sorting
The sorting task is more complex than the previous task since it requires not only the list semantics,
but also many more constants compared to the arithmetic problem. We implement the list semantic by
allowing the use of functions in defining predicates. For a data of type list, we define two functions
H and t which allow for decomposing a list into head and tail elements, i.e A = [AH |At ]. We use
elements of {a, b, c, d} and all the ordered lists made from permutations of up to three elements as
constants in the program (i.e., |C| = 40). We use extensional predicates such as gt (greater than), eq
(equals) and lte (less than or equal) to define ordering between the elements of lists as part of the
background knowledge. We allow for using 4 variables (and their functions) in defining the predicate
sort (i.e., num_vari (sort) = 4). One of the solution that our model finds is:
sort(A, B) ← sort(AH , C), lte(Ct , At ), eq(BH , C), eq(At , Bt )
sort(A, B) ← sort(AH , C), sort(D, BH ), gt(Ct , At ), eq(Bt , Ct ), eq(DH , CH ), eq(At , Dt )
Even though the above examples involve learning tasks that may not seem very difficult on the
surface, and deal with relatively small number of constants, they are far from trivial. To the best of
our knowledge, learning a recursive predicate for a complex algorithmic task such as sort which
involves multiple recursive rules with 6 atoms and includes 12 variables (by counting two functions
head and tail per variables) is beyond the power of any existing ILP solver. Here for example, the total
2
number of possible atoms to choose from
176
is |Isort10
| = 176 and for the case of choosing 6 elements
from this list we need to consider 6 > 3 × 10 possible combinations (assuming we knew in
advance there is a need for 6 atoms). While we can somewhat reduce this large space by removing
some of the improbable clauses, no practical ILP solver is capable of learning these kinds of relations
directly from examples.
We evaluate the performance of our proposed ILP solver in some benchmark ILP tasks. We use
relational datasets Mutagenesis (Debnath et al. [1991]), UW-CSE (Richardson and Domingos [2006])
as well as IMDB and Cora datasets3 . Table 1 summarizes the features of these datasets. As baseline
2
Many of the symbolic tasks used in Evans and Grefenstette [2018] as well as some others are provided in
the accompanying source code.
3
Publicly-available at https://relational.fit.cvut.cz/
7
Table 1: Dataset Features
Dataset Constants Predicates Examples Target Predicate
Mutagenesis 7045 20 188 active(A)
UW-CSE 7045 15 16714 advisedBy(A, B)
Cora 3079 10 70367 sameBib(A, B)
IMDB 316 10 14505 workingU nder(A, B)
we are comparing our method with the state of the art algorithms based on Markov Logic Networks
such GSLP (Dinh et al. [2011]), LSM (Kok and Domingos [2009]), MLN-B (Boosted MLN), B-RLR
(Ramanan et al. [2018]) as well as probabilistic ILP based algorithms such as SleepCover (Bellodi
and Riguzzi [2015]). Further, since in most of these datasets, the number of negative examples
are significantly greater than the positive examples, we report the Area Under Precision Recall
(AUPR) curve as a more reliable measure of the classification performance. We use 5-fold cross
validations except for the Mutagenesis dataset which we have used 10-fold and we report the average
AUPR over all the folds. Table 2 summarizes the classification performance for the 4 relational
datasets. As the results show, our proposed method outperforms the previous algorithms in the three
tasks; Mutagenesis, Cora and IMDB. In case of IMDB dataset, it reaches the perfect classification
(AUROC=1.0, AUPR=1.0). This impressive performance is only made possible because of the ability
of learning recursive predicates. Indeed, when we disallow the recursion in this model, the AUPR
performance drops to 0.76. The end-to-end design of our differentiable ILP solver makes it possible
to combine some other forms of learnable functions with the dNL networks. For example, while
handling continuous data is usually difficult in most ILP solvers, we can directly learn some threshold
values to create binary predicates from the continuous data (see Appendix D)4 . We have used this
method in the Mutagenesis task to handle the continuous data in this dataset. For the case of UW-CSE,
however, our method did not perform as well. One of the reasons is arguably the fact that the number
of negative examples is significantly larger than the positive ones for this dataset. Indeed, in some
of the published reports, (e.g. França et al. [2014]), the number of negative examples are limited
using the closed world assumption as Davis et al. [2005]. Because of the difference in hardware, it is
difficult to directly compare the speed of algorithms. In our case, we have evaluated the models using
a 3.70GHz CPU, 16GB RAM and GeForce GTX 1080TI graphic card. Using this setup the problems
such as IMDB, Mutagenesis are learned in just a few seconds. For Cora, the model creation takes
about one minute and the whole simulation for any fold takes less than 3 minutes.
5 Conclusion
We have introduced dNL-ILP as a new framework for learning inductive logic programming problems.
Using various experiments we showed that dNL-ILP outperforms past algorithms for learning
algorithmic and recursive predicates. Further, we demonstrated that dNL-ILP is capable of learning
from uncertain and relational data and outperforms the state of the art ILP solvers in classification
tasks for Mutagenesis, Cora and IMDB datasets.
4
Alternatively, we can assign learnable probabilistic functions to those variables (see Appendix C).
8
A Notations
9
distance from mean as positive and negative examples for predicate offi , respectively. We interpret
the values of the membership weights in the trained dNL-CNF networks which are used in FInf-offi
as the degree of connection between two genes. Table 4 compares the performance of dNL-ILP to the
two state of the art algorithms NARROMI Zhang et al. [2012] and MICRAT Yang et al. [2018] for
10-gene classification tasks of DREAM4 dataset.
References
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu
Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. Tensorflow: A system for
large-scale machine learning. In 12th {USENIX} Symposium on Operating Systems Design and
Implementation ({OSDI} 16), pages 265–283, 2016.
Sebastian Bader, Pascal Hitzler, and Steffen Hölldobler. Connectionist model generation: A first-order
approach. Neurocomputing, 71(13-15):2420–2432, 2008.
Elena Bellodi and Fabrizio Riguzzi. Structure learning of probabilistic logic programs by searching
the clause space. Theory and Practice of Logic Programming, 15(2):169–212, 2015.
10
Andrew Cropper and Stephen H Muggleton. Learning efficient logical robot strategies involving
composable objects. In Twenty-Fourth International Joint Conference on Artificial Intelligence,
2015.
Andrew Cropper and Stephen H. Muggleton. Metagol system. https://github.com/metagol/metagol,
2016.
Jesse Davis, Elizabeth Burnside, Inês de Castro Dutra, David Page, and Vítor Santos Costa. An
integrated approach to learning bayesian networks of rules. In European Conference on Machine
Learning, pages 84–95. Springer, 2005.
Luc De Raedt and Kristian Kersting. Probabilistic inductive logic programming. In Probabilistic
Inductive Logic Programming, pages 1–27. Springer, 2008.
A. K. Debnath, R. L. Lopez de Compadre, G. Debnath, A. J. Shusterman, and C. Hansch. Structure-
activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. Correlation with
molecular orbital energies and hydrophobicity. Journal of medicinal chemistry, 34(2):786–797,
1991.
Quang-Thang Dinh, Matthieu Exbrayat, and Christel Vrain. Generative structure learning for markov
logic networks based on graph of predicates. In Twenty-Second International Joint Conference on
Artificial Intelligence, 2011.
Dheeru Dua and Efi Karra Taniskidou. UCI machine learning repository, 2017.
Richard Evans and Edward Grefenstette. Learning explanatory rules from noisy data. Journal of
Artificial Intelligence Research, 61:1–64, 2018.
Manoel VM França, Gerson Zaverucha, and Artur S d’Avila Garcez. Fast relational learning using
bottom clause propositionalization with artificial neural networks. Machine learning, 94(1):81–104,
2014.
Steffen Hölldobler, Yvonne Kalinke, and Hans-Peter Störr. Approximating the semantics of logic
programs by recurrent neural networks. Applied Intelligence, 11(1):45–58, 1999.
Katsumi Inoue, Tony Ribeiro, and Chiaki Sakama. Learning from interpretation transition. Machine
Learning, 94(1):51–79, 2014.
Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. CoRR,
abs/1412.6980, 2014.
Stanley Kok and Pedro Domingos. Learning markov logic network structure via hypergraph lifting.
In Proceedings of the 26th annual international conference on machine learning, pages 505–512.
ACM, 2009.
Daniel Marbach, Thomas Schaffter, Dario Floreano, Robert J Prill, and Gustavo Stolovitzky. The
dream4 in-silico network challenge. Draft, version 0.3, 2009.
Stephen Muggleton. Inverse entailment and progol. New generation computing, 13(3-4):245–286,
1995.
Nandini Ramanan, Gautam Kunapuli, Tushar Khot, Bahare Fatemi, Seyed Mehran Kazemi, David
Poole, Kristian Kersting, and Sriraam Natarajan. Structure learning for relational logistic regres-
sion: An ensemble approach. In Sixteenth International Conference on Principles of Knowledge
Representation and Reasoning, 2018.
Tony Ribeiro, Sophie Tourret, Maxime Folschette, Morgan Magnin, Domenico Borzacchiello, Fran-
cisco Chinesta, Olivier Roux, and Katsumi Inoue. Inductive learning from state transitions over
continuous domains. In International Conference on Inductive Logic Programming, pages 124–139.
Springer, 2017.
Matthew Richardson and Pedro Domingos. Markov logic networks. Machine learning, 62(1-2):107–
136, 2006.
11
Luciano Serafini and Artur d’Avila Garcez. Logic tensor networks: Deep learning and logical
reasoning from data and knowledge. arXiv preprint arXiv:1606.04422, 2016.
Farhad Shakerin and Gopal Gupta. Induction of non-monotonic logic programs to explain boosted
tree models using lime. arXiv preprint arXiv:1808.00629, 2018.
Ashwin Srinivasan. The aleph manual, 2001.
Alireza Tamaddoni-Nezhad, David Bohan, Alan Raybould, and Stephen Muggleton. Towards
machine learning of predictive models from ecological data. In Inductive Logic Programming,
pages 154–167. Springer, 2015.
Bei Yang, Yaohui Xu, Andrew Maxwell, Wonryull Koh, Ping Gong, and Chaoyang Zhang. Micrat:
a novel algorithm for inferring gene regulatory networks using time series gene expression data.
BMC systems biology, 12(7):115, 2018.
Xiujun Zhang, Keqin Liu, Zhi-Ping Liu, Béatrice Duval, Jean-Michel Richer, Xing-Ming Zhao,
Jin-Kao Hao, and Luonan Chen. Narromi: a noise and redundancy reduction technique improves
accuracy of gene regulatory network inference. Bioinformatics, 29(1):106–113, 2012.
12