0% found this document useful (0 votes)
6 views64 pages

Shakarian AAAI Tutorial

The AAAI 2023 tutorial on Advances in Neuro Symbolic Reasoning covers various frameworks for integrating neural and symbolic methods for reasoning and learning. It discusses approaches such as Logical Neural Networks, Differentiable Inductive Logic Programming, and applications in real-world scenarios like the Department of Defense. The tutorial aims to explore the strengths of both neural and symbolic methods to enhance reasoning capabilities while addressing their individual limitations.

Uploaded by

Narenkumar. N
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views64 pages

Shakarian AAAI Tutorial

The AAAI 2023 tutorial on Advances in Neuro Symbolic Reasoning covers various frameworks for integrating neural and symbolic methods for reasoning and learning. It discusses approaches such as Logical Neural Networks, Differentiable Inductive Logic Programming, and applications in real-world scenarios like the Department of Defense. The tutorial aims to explore the strengths of both neural and symbolic methods to enhance reasoning capabilities while addressing their individual limitations.

Uploaded by

Narenkumar. N
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

Advances in Neuro Symbolic

Reasoning (and Learning)

AAAI 2023 Tutorial


Overview
• 8:30-8:45 Tutorial Overview (Shakarian)
• 8:45-9:30 Introduction and overview of neuro symbolic frameworks for
reasoning and learning (Shakarian)
• LNN, Annotated Logic, dILP, DSP, SATNet
• 9:30-9:40 Break
• 9:40-10:40 Neuro symbolic based approaches for deduction (Simari)
• LTN and deep ontological networks
• 10:40-10:50 Break
• 10:50-11:50 Combining perceptual neural networks with logic and
applications (Baral)
• NeurASP, and NLP/VQA applications
• 11:50-12:00 Break
• 12:00-12:30 Neuro Symbolic Reasoning for the DoD (DARPA)
(Velasquez)
Resources: Tutorial Web Page
• https://labs.engineering.asu.edu/labv2/2023-
aaai-tutorial-advances-in-neuro-symbolic-
reasoning/
Resources: YouTube Channel
• https://www.youtube.com/@neurosymbolic
Resources: Book

• Anticipated release
in late Spring/early
Summer 2023

• Published by
Springer-Nature
Why Neuro Symbolic?
• Neural Methods provide great results, but
often have difficulties in reasoning,
explainability, and modularity.

• Symbolic methods excel in the above areas,


but have difficulty coping with noise and
deriving robust models from real-world data

• Can we have the best of both worlds?


ChatGPT Failures on Math Word Problems

P. Shakarian, A. Koyyalamudi, N. Ngu, L. Mareedu, An Independent Evaluation of ChatGPT on Mathematical Word Problems
(MWP), AAAI Spring Symposium (Mar. 2023). Accepted.
More Additions → More Failures for ChatGPT

R2=0.82, 95% confidence intervals

P. Shakarian, A. Koyyalamudi, N. Ngu, L. Mareedu, An Independent Evaluation of ChatGPT on Mathematical Word Problems
(MWP), AAAI Spring Symposium (Mar. 2023). Accepted.
Pac-Man
Can we tell Pac-Man to not take dumb
moves?

Idea: express such restrictions in logic.


→ This is called “Logical shielding”

Will it make the Pac-Man playing AI work


better?
Pac Man Performance on
“Medium” Difficulty Board
Approx. Q-learning w. logical
shielding
Approx. Q-learning w/o logical shielding
Reward

Q-learning w. logical
shielding

Q-learning w/o logical


shielding

Training Episode

Thanks to Tanmay Khandait. Devendra Rajendra Parkar, and Kirby Kuznia for
allowing us to share the results of their student project from CSE 591.
Introduction and overview of neuro symbolic
frameworks for reasoning and learning
Paulo Shakarian
Associate Professor
Arizona State University
[email protected]
AAAI 2023, Tutorial Section
Overview
• Generalizing differentiable logic with
annotated logic
• Logical Neural Networks
• Differentiable Inductive Logic Programming
• Deep Symbolic Policy Learning
• Learning with STL Constraints: STL Net
• Learning Constraints to Combinatorial
Problems: SAT Net
Generalizing Differentiable Logic with Annotated Logic

Papers:
M. Kifer, V.S. Subrahmanian, Theory of Generalized Annotated Logic Programs
and its Applications. Journal of Logic Programming, Elsevier, 1992.
P. Shakarian, G. Simari, Extensions to Generalized Annotated Logic and an
Equivalent Neural Architecture, IEEE TransAI, 2022.
D. Aditya, K. Mukherji, S. Balasubramanian, A. Chaudhary, P. Shakarian,
PyReason: Software for Open World Temporal Logic, AAAI Spring Symposium
(Mar. 2023).
Annotated Logic
Logical atoms are “annotated” with values from a lattice structure or functions
(“annotation functions”) over such a structure, below is an example of a rule in
general annotated logic.

If we specify our lattice as scalars in [0,1] and use T-norms, T-conforms and
other fuzzy logic operators as annotation functions, then we can capture various
other real-valued logics (e.g., the logics used in LTN’s).

However, provided that we keep the annotation functions differentiable, there


framework offers more flexibility.

Kifer and Subrahmanian 1992 show results in the general case (annotations
from arbitrary lattice structures), specifically showing that a fixpoint operator
provides exact deductive inference.
M. Kifer, V.S. Subrahmanian, Theory of Generalized Annotated Logic Programs and its Applications. Journal of Logic Programming, Elsevier, 1992.
Annotated Logic Enables Open-World
Reasoning
• Several papers have False True
associated logical atoms with [0,0] [1,1]
subsets of the [0,1] interval
• MANCaLog (Shakarian et al.,
2013) proposed that with Uncertain
annotated logic for [0,1]
knowledge graph reasoning
• LNN’s also use intervals
associated with logical atoms

• A key advantage over scalars


is that this approach permits
open-world novelty

P. Shakarian, G. Simari, D. Callahan. Reasoning about Complex Networks: A Logic Programming Approach. ICLP-13.
P. Shakarian, G. Simari, R. Schroeder. MANCaLog: A Logic for Multi-Attribute Network Cascades. AAMAS-13.
Neural Equivalent Architecture

• Building on ideas
from differentiable
ILP, we proposed an
architecture to
leverage annotated
logic for rule
learning
• Key idea: a
recurrent neural unit
aligns with a
deductive fixpoint
operator

P. Shakarian, G. Simari, Extensions to Generalized Annotated Logic and an Equivalent Neural Architecture, IEEE TransAI, 2022.
PyReason: Python-based temporal first-order logic explainable AI system
supporting uncertainty, open-world novelty, and graph-based reasoning.

• Supports generalized annotated logic with temporal,


graphical and uncertainty extensions, capturing a wide
variety of fuzzy, real-valued, interval, and temporal logics
• Modern Python-based system supporting reasoning on GitHub:
graph-based data structures (e.g., exported from Neo4j, https://github.com/lab-v2/pyreason
GraphML, etc.)
• Rule-based reasoning in a manner that support Python:
pip install pyreason
uncertainty, open-world novelty, non-ground rules,
quantification, etc., agnostic to selection of t-norm, etc.
• Fast, highly optimized, correct fixpoint-based deduction
allows for explainable AI reasoning, scales to graphs
with over 30 million edges

D. Aditya, K. Mukherji, S. Balasubramanian, A. Chaudhary, P. Shakarian, PyReason: Software for Open World Temporal
Logic, AAAI Spring Symposium (Mar. 2023). Accepted.
PyReason

Supports Generalized Annotated Logic

Finite-time temporal logic (e.g., as used in STLNet)

Supports inference with fuzzy operators (e.g., LTN))

Supports open world novelty and parameterized operators


(e.g., LNN)

Supports graph-based reasoning (e.g., MANCALog)

D. Aditya, K. Mukherji, S. Balasubramanian, A. Chaudhary, P. Shakarian, PyReason: Software for Open World Temporal
Logic, AAAI Spring Symposium (Mar. 2023). Accepted.
Logical Neural Networks

Papers:
Riegel, R., Gray, A., Luus, F., Khan, N., Makondo, N., Akhalwaya, I.Y.,
Qian, H., Fagin, R., Barahona, F., Sharma, U., Ikbal, S., Karanam, H.,
Neelam, S., Likhyani, A., Srivastava, S.: Logical neural networks (2020).
Sen, P., Carvalho, B.W.S.R.d., Rabdelaziz, I., Kapanipathi, P., Roukjos, S.,
Gray, A.: Logical Neural Networks for Knowledge Base Completion with
Embeddings and Rules, EMNLP (2022).
LNN’s

Key ideas:
• Each input to an operator is
associated with a parameter
(“importance weighting”)
• Forward pass is a deduction
algorithm (equivalent to the
fixpoint operator of general
annotated programs)
• Hyperparameter a sets a
threshold for truth and
falsehood
• Logic program known a priori
(like in LTN’s)
• Learning process finds
parameters such that
operators retain classical
functionality
LNN: Inference
• Input to inference:
• Set of formulas
• Initial truth bounds for each atom and formula
• Bias and weight for each formula (connector)
• Output:
• Final truth bounds for each formula and atom
• Authors propose a upward-downward pass
through the logic (this is different from forward-
backward pass used in gradient descent)
• The algorithm propagates truth values from atoms
to the formula (upward) and from the formulas to
the atoms (downward) until convergence
LNN: Activation Functions
LNN: Learning
• Neural architecture is derived directly from the a-
priori known formulas
• Historical inputs and outputs used as samples
during the training process
• Loss function depends not only on standard ML
metrics (e.g., MSE) but also the number of
inconsistencies (neurons associated with a truth
bound where L>U.
• The functions used to combine formulas are
differentiable with respect to the weights
• The forward pass of the learning process can be
done with the inference algorithm (which is
essentially a fixpoint operator)
LNN: Syntax Tree / Neural Structure
Objective Function with Constraints
• To ensure reasonable settings of parameters, the
authors show how the parameter-learning problem
can be framed as an optimization problem with
constraints

• Here, E(B,W) is the traditional loss function and the


summation is designed to reduce
the number of inconsistencies
• Some Issues: (fully listed in section F.1)
• Requirement of additional slack parameters
• Parameter updates require constraint satisfiaction
• b must be learnt for each neuron, hinders interpretability and
leads to overfitting

Formalism is from Riegel et. al, 2020 (Sec. 6)


Tailored Activation Function

Shown here is a tailored activation function for disjunction with b=1.

Key advantages:
- The logic is maintained regardless of the weights
- The authors claim that the output is independent of b, so they define it
as

Formalism is from Riegel et. al, 2020 (Sec. 6)


Overall Strategy
• Use gradient descent to find weights, using
normal back-propagation and the
aforementioned inference process for the
forward pass
• Loss function combines normal metrics (e.g.,
MSE) and a count of inconsistent neurons
Key Issues to Consider
• Parameters w and b must be set in a way such
that classical logic outcomes for the operators
behave as expected
• Learning parameters that fit vs. interpretability
Inconsistency
• Not fundamentally guaranteed
• Authors do mention that consistency check
can be done during the training process
• It is noteworthy that consistency is based on
neurons, not atoms – so for all formulas
known a-prior, you know which ones will be
inconsistent
• However, if you are checking an entailment
query against the logic, there are no
guarantees if it is correct
Recent Work
• Sen et al. (2022) extend LNN’s for rule-
learning in knowledge graphs and achieve
state-of the art performance on KBC tasks.
• Approach is complementary to embedding-
based approaches to the problem – and the
combination provides further improvement.
Differentiable Inductive Logic Programming

Paper:
Evans, R., Grefenstette, E.: Learning explanatory rules
from noisy data. J. Artif. Int. Res. 61(1), 1–64 (2018)
The Logic of dILP
First-Order Logic (predicates and constants)
• Given predicates p, q constant c and variable x
• p(c) is a ground atom
• p(x) is non-ground
• q(x) p(x) is a non-ground rule (“if p then q”)
• Each non-ground rule will have multiple
grounded instances
• Assume facts (ground atoms) and (non-ground)
rules
• Limits on the number of inference steps (i.e.,
applications of a fixpoint operator) – denoted T
ILP by SAT Solving
Intuition:

1. A “template” specifies the format of a rule (e.g., number of body predicates, free
existentially quantified variables, number of invented predicates, etc.)
2. For a given predicate, “candidate” rules are generated based on a template

3. An additional atomic proposition is added to the end of each candidate

4. Given positive and negative facts, find a subset of the propositions that “turn on”
candidate rules such that positive facts are entailed and negative facts are not
Learning Problem
• The overall goal is to compute the following
conditional probability

Truth value for Weights (learned), program


ground atom a template, language, and facts

• Such that the following loss is minimized

Atom, value
(in {0,1} )
pairs in
ground truth
Overall Architecture
Inductive Inference by Gradient Descent

Vector at+1
Amalgamation of vectors at and bt using probabilistic sum

Vectors bt
Vectors (for each intentional predicate) takes weighted
average for truth values of each ground atom from the
corresponding ct vectors using softmax and the
associated weights
Recurrence
Weights associated with each structure
intentional predicate and pair of For each
templates associated with the
predicate. inference step as
specified by
Pre-Processing program template
Vectors ct hyperparameters
Clause Vectors (for each intentional predicate and pair of
Generation templates) giving the maximum truth value to atoms
formed with each intentional predicate

Conversion of
Background
Vector at
knowledge to
Initial atom values
Vector
Representation
Bounding the Number of Templates
• A major source of complexity is the number of
templates, which can be bounded by various
quantities as follows:

• Under the assumptions in the paper, the


above bound is equal to the following:
Experimental Notes
• Training data:
• Training data consists of multiple ℬ, 𝒫, 𝒩 triples
• At each step, one of the triples is samples
• From the sampled triple, a mini batch of 𝒫⋃𝒩 is selected
• Authors state that this method helps escape local minima
• Training occurs in 6,000 steps
• Other notes:
• Cross-entropy loss (seen in other work as well)
• RMS Prop used as optimizer with a learning rate of 0.5
• Adam also gave reasonable results
• RMS Prop performed well with lower learning rates (e.g.,
0.01)
• Clause weights initialized from a normal distribution over the
interval [0,1] (mean zero, sd between 0 and 2)
Deep Symbolic Policy Learning
Papers:
Petersen, B.K., Larma, M.L., Mundhenk, T.N., Santiago, C.P., Kim, S.K., Kim,
J.T.: Deep symbolic regression: Recovering mathematical expressions from
data via risk-seeking policy gradients. arXiv preprint arXiv:1912.04871 (2019)
Landajuela, M., Petersen, B.K., Kim, S., Santiago, C.P., Glatt, R., Mundhenk,
N., Pettit, J.F., Faissol, D.: Discovering symbolic policies with deep
reinforcement learning. In: International Conference on Machine Learning, pp.
5979–5989. PMLR (2021)
DSP: Motivating Example

Key idea: Understanding physical systems


based on data with compact mathematical
expressions.
Expression Tree

An expression tree is a
syntax tree for
mathematical expressions.

Unlike LNN’s where the syntax


tree is given and embedded
into the NN, in DSO/DSR we
learn the tree using an RNN in
a RL framework.
Why Symbolic Policies for Control?
• Traditional control theory and mathematical
physics approaches for control result in simple
but effective models

• Further, these models are mathematical


equations that are simple (hence regularized),
easily understood, and can be efficient to
implement

• Prior work on RL for control results in black-


box models that do not have these features
Deep Symbolic Regression

Update RNN parameters

Generate distribution of
expressions with an RNN

Compute gradient using top


epsilon expressions via
“risk-seeking” reward
function
∇𝜃 𝐽𝑟𝑖𝑠𝑘 (𝜃; 𝜀)

Evaluate reward associated


with expressions based on
NRMSE to identify top
epsilon expressions

1 1
෍ (𝑦𝑖 − 𝑓(𝑋𝑖 ))2
𝜎 𝑛 𝑖
Reward for Individual Expressions

Time steps
in episode i

Reward for time t


Number of episodes
in episode i
“Risk-Seeking” Policy Gradient
Standard policy gradient is based on an overall expected value

CVaR policy gradient considers only the lowest risk candidates in a given sample
(Tamar et al., 2014)

This work uses a risk-seeking policy gradient that is looking at the highest-reward
expressions

Risk-seeking reward function

Gradient
Adding Multiple Dimensions
• The authors note that multiple action
dimensions leads to a combinatorial explosion
• They overcome the problem using a non-
symbolic “anchor model”
• The intuition is that each action dimension is
learned sequentially.
• When action dimension i is learned, the algorithm
uses previously learned symbolic actions 1,…,i-1
and the anchor (non-symbolic NN-learned) actions
for i+1,…,n.
Deep Symbolic Policies
STL Net
Paper:
Ma, M., Gao, J., Feng, L., Stankovic, J.A.: Stlnet: Signal temporal logic
enforced multivariate recurrent neural networks. 34th Conference on
Neural Information Processing Systems (NeurIPS 2020).
Signal Temporal Logic (STL)
• A temporal logic (like
LTL or CTL)
• Semantic structure
assumes multiple
worlds or states over
time
• Allows for reasoning
over time intervals
(like MTL)
• Predicates represent
analog signals
meaning that they
are equivalent to the
value of a function
exceeding a certain
value (usually zero)
STL Net: Setup
• m signals over time

• Subsequence of a signal

• Model takes in “prefix” subsequences to determine the rest of the signal

• Goal: Find model parameters that both minimize loss and adhere to STL
specifications
Student-Teacher Framework
STL Loss Function

hyperparameter ground truth teacher result

• The specialized loss function is used to compare the


result from the neural network not only with the
training data, but also with the result of the parent
network
• Hyperparameter beta measures the trade-off between
the two
Adjusting Neural Network Results to Meet a
Specification

• We can think of the result of the neural model


as a sequence of worlds over time (just note
that the atoms in the worlds depend on analog
values)
• If we can express a specification in a manner
that allows us to simply compare worlds to the
spec, we can update the trace in a
straightforward manner
Key Idea: Converting Specification to
DNF Form

• Turning the specification into DNF form


(disjunction of conjunctions of literals) provides a
few useful properties:
1. Only one clause of the disjunction must be satisfied
for the specification to be satisfied (so you can just
iterate through clauses)
2. The conjunctions of literals are very easy to compare
with a world (as essentially you are just checking
every atom and negation with each world in the
sequence)
3. Changing a sequence to meet a specification
becomes trivial as you can simply modify the value
associated with a specific predicate – and you can
perform the modifications in such a way to be near
the trace (via L1) as possible
Scalability
• STLNet relies the ability to convert STL
formulas from CNF to DNF.
• However, doing so could result in an
exponentially-large formula (Miltersen et al.,
2005)
• Left alone, this could hinder the scalability of
the approach, though unde

Miltersen, P.B., Radhakrishnan, J., Wegener, I.: On converting cnf to dnf. Theoretical Computer Science 347(1), 325–335 (2005).
SATNet
Papers:
Wang, P., Donti, P.L., Wilder, B., Kolter, J.Z.: Satnet: Bridging deep learning and logical reasoning using
a differentiable satisfiability solver. In: K. Chaudhuri, R. Salakhutdinov (eds.) Proceedings of the 36th
International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California,
USA, Proceedings of Machine Learning Research, vol. 97, pp. 6545–6554.
Chang, O., Flokas, L., Lipson, H., Spranger, M.: Assessing satnet's abil- ity to solve the symbol
grounding problem. In: H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, H. Lin (eds.) Advances in
Neural Information Process- ing Systems, vol. 33, pp. 1428–1439. Curran Associates, Inc. (2020).
Topan, S., Rolnick, D., Si, X.: Techniques for symbol grounding with satnet. Advances in Neural
Information Processing Systems 34, 20733–20744 (2021)
The MAX SAT problem

INPUT:
Given Boolean variables 𝑥1 , … , 𝑥𝑚 and causes 𝐶1 , … , 𝐶𝑛 where each clause
is a disjunction of literals (atoms or negations). (Conjunctive normal form)

OUTPUT:
An assignment of Boolean variables such that the number of satisfied
clauses is maximized.

Notes:
• Known to be NP-hard (even when each clause has just two literals)
• The clauses can be numerically represented by matrix (𝑀) of dimensions
𝑛 × 𝑚 (in the original paper, 𝑆 is used instead of 𝑀)
• Often framed as an optimization problem
Partial Knowledge MAX SAT variant

An extension to the problem Sudoku can be framed as an instance of MAX


is to partition the 𝑚 Boolean SAT with partially known Boolean variables
variables into two groups:
𝑖𝑛
input (𝑎1,…,𝑘 ) and output
𝑜𝑢𝑡
(𝑎𝑘+1,…,𝑚 )

Hence, we can think of MAX


SAT as the following problem:

𝑜𝑢𝑡 𝑖𝑛
𝑎𝑘+1,…,𝑚 = 𝑆(𝑎1,…,𝑘 , 𝑀)
A visual variant of the problem is one in which
the input is presented as images instead of text.
Where 𝑆 is an oracle and 𝑀 is
the matrix representing the
clauses.
SAT Net Framework

Input Input Relaxation Coordinate Descent Output


Discrete or Relax each input Approximates solution to Use of threshold or
probabilistic into a random MAXSAT using relaxed randomized rounding to
propositional atoms unit vector constraint matrix produce final output

• A relaxation via Semi Definite programming is used to solve the


MAX SAT instance
• The gradient is propagated through the SATNet layers using
coordinate descent
• A relaxed constraint matrix is learned (i.e., for the approximate
solution of the problem) as opposed to interpretable constraints
• 98.3% accuracy for Sudoku, 63.2% reported accuracy for visual
case
• For the visual Sudoku problem, LeNet is used to classify the
digits (pre-trained)
Visual Variant
• While standard CNN approach fails in visual case, SATNet approaches a
theoretical limit (based on digit accuracy) for accuracy in visual Sudoku

• Later, Chang et al. (2020) showed that SATNet leveraged label leakage. SATNet
fails catastrophically when labels are masked.
Despite its shortcomings, SAT Net has
significacne

• It successfully could learn constraints in a


differentiable framework

• Combinatorial forward pass and ability to


derive gradients for backpropagation

• Significantly outperformed standard DL


architectures
• (this was significant at the time of publication)
The relationship between symbols and
perception
• Transduction problem
• If they exist, how then, are the perceptual states mapped into
amodal symbols? (Barsalou, 1999)

• Symbol grounding problem


• The reverse of the transduction problem
• How are amodal symbols grounded in perception? (Barsalou,
1999)

• Chang et al. argue that SAT Net did not adequately


solve the symbol grounding problem.
• They probably really mean the transduction problem – as the
issue was the transduction of perception into symbols
• Note: Symbol grounding does come up in ML verification –
ensuring that a symbol maps back properly to percpetion
Unsupervised Learning to address
transduction/symbol grounding

• Topan et al., seek to directly address the


shortcomings of SAT Net:
• Use of unsupervised learning for digit recognition
• Additional loss term to account for in accurate digits
• Addition of proofreader layer improved performance (an
extra boost, but not directly related to the problem of
transduction)
More recent work
• Symbol grounding
• Abduction (Dai & Muggleton, 2021), (Dai et al.,
2019)
• Appreciation / binarized neural networks (Evans et
al., 2021)
• DeepLogic (Duan, 2022)

• Learning constraints to combinatorial


problems via deep learning
• CombOptNet (Paulus et al., 2021)
• Solver-Free (Nandawi et al., 2022)
Questions

You might also like