Shakarian AAAI Tutorial
Shakarian AAAI Tutorial
• Anticipated release
in late Spring/early
Summer 2023
• Published by
Springer-Nature
Why Neuro Symbolic?
• Neural Methods provide great results, but
often have difficulties in reasoning,
explainability, and modularity.
P. Shakarian, A. Koyyalamudi, N. Ngu, L. Mareedu, An Independent Evaluation of ChatGPT on Mathematical Word Problems
(MWP), AAAI Spring Symposium (Mar. 2023). Accepted.
More Additions → More Failures for ChatGPT
P. Shakarian, A. Koyyalamudi, N. Ngu, L. Mareedu, An Independent Evaluation of ChatGPT on Mathematical Word Problems
(MWP), AAAI Spring Symposium (Mar. 2023). Accepted.
Pac-Man
Can we tell Pac-Man to not take dumb
moves?
Q-learning w. logical
shielding
Training Episode
Thanks to Tanmay Khandait. Devendra Rajendra Parkar, and Kirby Kuznia for
allowing us to share the results of their student project from CSE 591.
Introduction and overview of neuro symbolic
frameworks for reasoning and learning
Paulo Shakarian
Associate Professor
Arizona State University
[email protected]
AAAI 2023, Tutorial Section
Overview
• Generalizing differentiable logic with
annotated logic
• Logical Neural Networks
• Differentiable Inductive Logic Programming
• Deep Symbolic Policy Learning
• Learning with STL Constraints: STL Net
• Learning Constraints to Combinatorial
Problems: SAT Net
Generalizing Differentiable Logic with Annotated Logic
Papers:
M. Kifer, V.S. Subrahmanian, Theory of Generalized Annotated Logic Programs
and its Applications. Journal of Logic Programming, Elsevier, 1992.
P. Shakarian, G. Simari, Extensions to Generalized Annotated Logic and an
Equivalent Neural Architecture, IEEE TransAI, 2022.
D. Aditya, K. Mukherji, S. Balasubramanian, A. Chaudhary, P. Shakarian,
PyReason: Software for Open World Temporal Logic, AAAI Spring Symposium
(Mar. 2023).
Annotated Logic
Logical atoms are “annotated” with values from a lattice structure or functions
(“annotation functions”) over such a structure, below is an example of a rule in
general annotated logic.
If we specify our lattice as scalars in [0,1] and use T-norms, T-conforms and
other fuzzy logic operators as annotation functions, then we can capture various
other real-valued logics (e.g., the logics used in LTN’s).
Kifer and Subrahmanian 1992 show results in the general case (annotations
from arbitrary lattice structures), specifically showing that a fixpoint operator
provides exact deductive inference.
M. Kifer, V.S. Subrahmanian, Theory of Generalized Annotated Logic Programs and its Applications. Journal of Logic Programming, Elsevier, 1992.
Annotated Logic Enables Open-World
Reasoning
• Several papers have False True
associated logical atoms with [0,0] [1,1]
subsets of the [0,1] interval
• MANCaLog (Shakarian et al.,
2013) proposed that with Uncertain
annotated logic for [0,1]
knowledge graph reasoning
• LNN’s also use intervals
associated with logical atoms
P. Shakarian, G. Simari, D. Callahan. Reasoning about Complex Networks: A Logic Programming Approach. ICLP-13.
P. Shakarian, G. Simari, R. Schroeder. MANCaLog: A Logic for Multi-Attribute Network Cascades. AAMAS-13.
Neural Equivalent Architecture
• Building on ideas
from differentiable
ILP, we proposed an
architecture to
leverage annotated
logic for rule
learning
• Key idea: a
recurrent neural unit
aligns with a
deductive fixpoint
operator
P. Shakarian, G. Simari, Extensions to Generalized Annotated Logic and an Equivalent Neural Architecture, IEEE TransAI, 2022.
PyReason: Python-based temporal first-order logic explainable AI system
supporting uncertainty, open-world novelty, and graph-based reasoning.
D. Aditya, K. Mukherji, S. Balasubramanian, A. Chaudhary, P. Shakarian, PyReason: Software for Open World Temporal
Logic, AAAI Spring Symposium (Mar. 2023). Accepted.
PyReason
D. Aditya, K. Mukherji, S. Balasubramanian, A. Chaudhary, P. Shakarian, PyReason: Software for Open World Temporal
Logic, AAAI Spring Symposium (Mar. 2023). Accepted.
Logical Neural Networks
Papers:
Riegel, R., Gray, A., Luus, F., Khan, N., Makondo, N., Akhalwaya, I.Y.,
Qian, H., Fagin, R., Barahona, F., Sharma, U., Ikbal, S., Karanam, H.,
Neelam, S., Likhyani, A., Srivastava, S.: Logical neural networks (2020).
Sen, P., Carvalho, B.W.S.R.d., Rabdelaziz, I., Kapanipathi, P., Roukjos, S.,
Gray, A.: Logical Neural Networks for Knowledge Base Completion with
Embeddings and Rules, EMNLP (2022).
LNN’s
Key ideas:
• Each input to an operator is
associated with a parameter
(“importance weighting”)
• Forward pass is a deduction
algorithm (equivalent to the
fixpoint operator of general
annotated programs)
• Hyperparameter a sets a
threshold for truth and
falsehood
• Logic program known a priori
(like in LTN’s)
• Learning process finds
parameters such that
operators retain classical
functionality
LNN: Inference
• Input to inference:
• Set of formulas
• Initial truth bounds for each atom and formula
• Bias and weight for each formula (connector)
• Output:
• Final truth bounds for each formula and atom
• Authors propose a upward-downward pass
through the logic (this is different from forward-
backward pass used in gradient descent)
• The algorithm propagates truth values from atoms
to the formula (upward) and from the formulas to
the atoms (downward) until convergence
LNN: Activation Functions
LNN: Learning
• Neural architecture is derived directly from the a-
priori known formulas
• Historical inputs and outputs used as samples
during the training process
• Loss function depends not only on standard ML
metrics (e.g., MSE) but also the number of
inconsistencies (neurons associated with a truth
bound where L>U.
• The functions used to combine formulas are
differentiable with respect to the weights
• The forward pass of the learning process can be
done with the inference algorithm (which is
essentially a fixpoint operator)
LNN: Syntax Tree / Neural Structure
Objective Function with Constraints
• To ensure reasonable settings of parameters, the
authors show how the parameter-learning problem
can be framed as an optimization problem with
constraints
Key advantages:
- The logic is maintained regardless of the weights
- The authors claim that the output is independent of b, so they define it
as
Paper:
Evans, R., Grefenstette, E.: Learning explanatory rules
from noisy data. J. Artif. Int. Res. 61(1), 1–64 (2018)
The Logic of dILP
First-Order Logic (predicates and constants)
• Given predicates p, q constant c and variable x
• p(c) is a ground atom
• p(x) is non-ground
• q(x) p(x) is a non-ground rule (“if p then q”)
• Each non-ground rule will have multiple
grounded instances
• Assume facts (ground atoms) and (non-ground)
rules
• Limits on the number of inference steps (i.e.,
applications of a fixpoint operator) – denoted T
ILP by SAT Solving
Intuition:
1. A “template” specifies the format of a rule (e.g., number of body predicates, free
existentially quantified variables, number of invented predicates, etc.)
2. For a given predicate, “candidate” rules are generated based on a template
4. Given positive and negative facts, find a subset of the propositions that “turn on”
candidate rules such that positive facts are entailed and negative facts are not
Learning Problem
• The overall goal is to compute the following
conditional probability
Atom, value
(in {0,1} )
pairs in
ground truth
Overall Architecture
Inductive Inference by Gradient Descent
Vector at+1
Amalgamation of vectors at and bt using probabilistic sum
Vectors bt
Vectors (for each intentional predicate) takes weighted
average for truth values of each ground atom from the
corresponding ct vectors using softmax and the
associated weights
Recurrence
Weights associated with each structure
intentional predicate and pair of For each
templates associated with the
predicate. inference step as
specified by
Pre-Processing program template
Vectors ct hyperparameters
Clause Vectors (for each intentional predicate and pair of
Generation templates) giving the maximum truth value to atoms
formed with each intentional predicate
Conversion of
Background
Vector at
knowledge to
Initial atom values
Vector
Representation
Bounding the Number of Templates
• A major source of complexity is the number of
templates, which can be bounded by various
quantities as follows:
An expression tree is a
syntax tree for
mathematical expressions.
Generate distribution of
expressions with an RNN
1 1
(𝑦𝑖 − 𝑓(𝑋𝑖 ))2
𝜎 𝑛 𝑖
Reward for Individual Expressions
Time steps
in episode i
CVaR policy gradient considers only the lowest risk candidates in a given sample
(Tamar et al., 2014)
This work uses a risk-seeking policy gradient that is looking at the highest-reward
expressions
Gradient
Adding Multiple Dimensions
• The authors note that multiple action
dimensions leads to a combinatorial explosion
• They overcome the problem using a non-
symbolic “anchor model”
• The intuition is that each action dimension is
learned sequentially.
• When action dimension i is learned, the algorithm
uses previously learned symbolic actions 1,…,i-1
and the anchor (non-symbolic NN-learned) actions
for i+1,…,n.
Deep Symbolic Policies
STL Net
Paper:
Ma, M., Gao, J., Feng, L., Stankovic, J.A.: Stlnet: Signal temporal logic
enforced multivariate recurrent neural networks. 34th Conference on
Neural Information Processing Systems (NeurIPS 2020).
Signal Temporal Logic (STL)
• A temporal logic (like
LTL or CTL)
• Semantic structure
assumes multiple
worlds or states over
time
• Allows for reasoning
over time intervals
(like MTL)
• Predicates represent
analog signals
meaning that they
are equivalent to the
value of a function
exceeding a certain
value (usually zero)
STL Net: Setup
• m signals over time
• Subsequence of a signal
• Goal: Find model parameters that both minimize loss and adhere to STL
specifications
Student-Teacher Framework
STL Loss Function
Miltersen, P.B., Radhakrishnan, J., Wegener, I.: On converting cnf to dnf. Theoretical Computer Science 347(1), 325–335 (2005).
SATNet
Papers:
Wang, P., Donti, P.L., Wilder, B., Kolter, J.Z.: Satnet: Bridging deep learning and logical reasoning using
a differentiable satisfiability solver. In: K. Chaudhuri, R. Salakhutdinov (eds.) Proceedings of the 36th
International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California,
USA, Proceedings of Machine Learning Research, vol. 97, pp. 6545–6554.
Chang, O., Flokas, L., Lipson, H., Spranger, M.: Assessing satnet's abil- ity to solve the symbol
grounding problem. In: H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, H. Lin (eds.) Advances in
Neural Information Process- ing Systems, vol. 33, pp. 1428–1439. Curran Associates, Inc. (2020).
Topan, S., Rolnick, D., Si, X.: Techniques for symbol grounding with satnet. Advances in Neural
Information Processing Systems 34, 20733–20744 (2021)
The MAX SAT problem
INPUT:
Given Boolean variables 𝑥1 , … , 𝑥𝑚 and causes 𝐶1 , … , 𝐶𝑛 where each clause
is a disjunction of literals (atoms or negations). (Conjunctive normal form)
OUTPUT:
An assignment of Boolean variables such that the number of satisfied
clauses is maximized.
Notes:
• Known to be NP-hard (even when each clause has just two literals)
• The clauses can be numerically represented by matrix (𝑀) of dimensions
𝑛 × 𝑚 (in the original paper, 𝑆 is used instead of 𝑀)
• Often framed as an optimization problem
Partial Knowledge MAX SAT variant
𝑜𝑢𝑡 𝑖𝑛
𝑎𝑘+1,…,𝑚 = 𝑆(𝑎1,…,𝑘 , 𝑀)
A visual variant of the problem is one in which
the input is presented as images instead of text.
Where 𝑆 is an oracle and 𝑀 is
the matrix representing the
clauses.
SAT Net Framework
• Later, Chang et al. (2020) showed that SATNet leveraged label leakage. SATNet
fails catastrophically when labels are masked.
Despite its shortcomings, SAT Net has
significacne