100% found this document useful (1 vote)
105 views45 pages

Introduction To Abductive Learning and Neuro-Symbolic (RL)

The document introduces abductive learning and neuro-symbolic reinforcement learning. It discusses how abductive learning can bridge machine learning and logical reasoning by generating hypotheses to explain observations based on background knowledge. The key aspects covered include: - Abductive learning frameworks that combine perception models like CNNs with logical knowledge bases to perform tasks such as equation decipherment. - How symbolic reinforcement learning aims to discover symbolic policies using deep RL techniques. - The VAEL framework that integrates probabilistic logic programming into deep generative models in an end-to-end differentiable manner.

Uploaded by

Derek G
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
105 views45 pages

Introduction To Abductive Learning and Neuro-Symbolic (RL)

The document introduces abductive learning and neuro-symbolic reinforcement learning. It discusses how abductive learning can bridge machine learning and logical reasoning by generating hypotheses to explain observations based on background knowledge. The key aspects covered include: - Abductive learning frameworks that combine perception models like CNNs with logical knowledge bases to perform tasks such as equation decipherment. - How symbolic reinforcement learning aims to discover symbolic policies using deep RL techniques. - The VAEL framework that integrates probabilistic logic programming into deep generative models in an end-to-end differentiable manner.

Uploaded by

Derek G
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

www.lamda.nju.edu.

cn

Introduction to Abductive Learning and


Neuro-symbolic (RL)

Jiacheng Xu
2022.11.17
Outline
www.lamda.nju.edu.cn

• Abductive Learning & Neuro-symbolic


• Bridging machine learning and logical reasoning by abductive
learning. NIPS 2019 [73]
• VAEL: Bridging Variational Autoencoders and Probabilistic
Logic Programming. NIPS 2022
• Symbolic RL
• Towards Deep Symbolic Reinforcement Learning. 2016 [213]
• Discovering symbolic policies with deep reinforcement learning.
ICML 2021 [34]

http://www.lamda.nju.edu.cn
Abductive learning
www.lamda.nju.edu.cn

NIPS 2019

In this paper, we present the abductive learning, where machine learning


and logical reasoning can be entangled and mutually beneficial.

http://www.lamda.nju.edu.cn
Neuro-based success
www.lamda.nju.edu.cn

http://www.lamda.nju.edu.cn
Treating Reasoning As Perception
www.lamda.nju.edu.cn

Hard to tell machines what we know;


Hard to understand machines from what they learned.

http://www.lamda.nju.edu.cn
The Separated Perception And Reasoning
www.lamda.nju.edu.cn

http://www.lamda.nju.edu.cn
Motivation
www.lamda.nju.edu.cn

Perception and reasoning are two abilities of intelligence that are


integrated seamlessly.
In AI, perception is usually statistics/neural-based learning and
reasoning is often formalized by logic based AI.

http://www.lamda.nju.edu.cn
Background
www.lamda.nju.edu.cn

If it's cloudy, then it's going to rain.

Deduction: Given the rule and the cause, deduce the effect.
Induction: Given a cause and an effect, induce a rule.
Abduction: Given a rule and an effect, abduce a cause.

http://www.lamda.nju.edu.cn
Human Abductive Problem-Solving
www.lamda.nju.edu.cn

Row 8-9
are results
of row 1-7.

Column1 is
conjectures.

Column 2
is based on
conjectures
and know-
ledge.

http://www.lamda.nju.edu.cn
Handwritten Equation Decipherment Task
www.lamda.nju.edu.cn

The equations are constructed from images of symbols ("0", "1",


"+" and "=").
They are generated with unknown operation rules.
Each example is associated with a label that indicates whether the
equation is correct.
This task demands the same ability as a human jointly utilising
perceptual and reasoning abilities.

Given x and predict y.

http://www.lamda.nju.edu.cn
ABL Structure
www.lamda.nju.edu.cn

http://www.lamda.nju.edu.cn
Problem Setting
www.lamda.nju.edu.cn

Given 𝐷𝐷 = {< 𝑥𝑥1 , 𝑦𝑦1 >, . . . , < 𝑥𝑥𝑛𝑛 , 𝑦𝑦𝑛𝑛 >} , domain knowledge base 𝐵𝐵.
The target of abductive learning is to output a hypothesis model 𝐻𝐻 = 𝑝𝑝 ∪ Δ𝐶𝐶 .
• Perception Model:
𝑝𝑝: 𝜒𝜒 → Ρ is mapping from the feature space to primitive symbols.
• Knowledge model:
∆𝐶𝐶 is a set of first-order logical clauses that define the target concept
𝐶𝐶 with 𝐵𝐵
The hypothesis model should satisfy:

http://www.lamda.nju.edu.cn
Instantiation interpretation
www.lamda.nju.edu.cn

Domain knowledge base 𝐵𝐵:


Involves only the structure of the equations (All equations are X+Y=Z;
Digits are lists of 0 and 1.) and a recursive definition of bit-wise
operations(Calculated bit-by-bit, from the last to the first).

Knowledge model ∆𝐶𝐶:


The specific rules for calculating the operations are undefined in B, i.e.,
results of "0+0", "0+1" and "1+1" could be "0", "1", "00", "01" or even
"10". The missing calculation rules form the knowledge model ∆C , which
are required to be learned from the data.

http://www.lamda.nju.edu.cn
ABL Structure
www.lamda.nju.edu.cn

http://www.lamda.nju.edu.cn
Instantiation interpretation
www.lamda.nju.edu.cn

1. Machine Learning
Map 𝑥𝑥 to 𝑒𝑒𝑞𝑞0 = [1,1,1,1,1], and ALP cannot abduce a
consistent hypothesis.
2. Consistency Optimization
RACOS will learn a 𝛿𝛿 that marking the “possibly incorrect”
as blank, 𝑒𝑒𝑞𝑞1 = [1, _, 1, _, 1].
3. Logical Abduction
ALP can abduce a consistent hypothesis and list of revised
pseudo-labels 𝑒𝑒𝑞𝑞1 ′ = [1, +, 1, =, 1] for re-training CNN.

http://www.lamda.nju.edu.cn
Performance
www.lamda.nju.edu.cn

http://www.lamda.nju.edu.cn
Module Analysis
www.lamda.nju.edu.cn

Volunteers do not suffer from distinguishing different symbols, but


machines are better in checking the consistency of logical theories —
in which people are prone to make mistakes.
Machine learning systems should make use of their advantages in
logical reasoning.
http://www.lamda.nju.edu.cn
ABL Structure
www.lamda.nju.edu.cn

1. Transfer the CNN learned from the DBA addition task to XOR
equations constructed by the same characters
2. Transfer the learned knowledge model from RBA to DBA domains

http://www.lamda.nju.edu.cn
Discussion
www.lamda.nju.edu.cn

Superiority
ABL utilises logical abduction and trial-and-error search to bridge
machine learning with original first-order logic, without using gradient.
ABL inherits the full power of first-order logical reasoning (also DL).

Future Direction
Abductive reasoning connects high-level reasoning and low-level
perception;
The dividing line between high-level and low-level is unclear, how to
combine symbolic and sub-symbolic AI more efficiently is still an open
question.

http://www.lamda.nju.edu.cn
VAEL
www.lamda.nju.edu.cn

NIPS 2022

This work is the first to propose a general-purpose end-to-end


differentiable framework integrating probabilistic logic programming
into a deep generative model.

http://www.lamda.nju.edu.cn
Algorithm Overview
www.lamda.nju.edu.cn

http://www.lamda.nju.edu.cn
Probabilistic Logic Programming
www.lamda.nju.edu.cn

1. MLP map 𝑧𝑧𝑠𝑠𝑠𝑠𝑠𝑠 into the probabilities


of the facts in the program.
2. The program compute the label 𝑦𝑦
and a possible world.

http://www.lamda.nju.edu.cn
Downstream Application
www.lamda.nju.edu.cn

• Label Classification: Given 𝑥𝑥 and predict labels 𝑦𝑦


• Image Generation: Sample 𝑧𝑧 = [𝑧𝑧𝑠𝑠𝑠𝑠𝑠𝑠 , 𝑧𝑧] and 𝑤𝑤𝑓𝑓 ~𝑃𝑃(𝑤𝑤𝑝𝑝 ; 𝑝𝑝).
• Conditional Image Generation: Sample 𝑧𝑧 = [𝑧𝑧𝑠𝑠𝑠𝑠𝑠𝑠 , 𝑧𝑧], compute
probability 𝑃𝑃 𝑤𝑤𝐹𝐹 𝐸𝐸; 𝑝𝑝), sample 𝑤𝑤𝑓𝑓 ~𝑃𝑃(𝑤𝑤𝐹𝐹 |𝐸𝐸; 𝑝𝑝) and generate an
image consistent with evidence 𝐸𝐸.
• Task Generalization: once trained VAEL on a specific symbolic
task (e.g. the addition of two digits), we can generalize to any
novel task that involves reasoning with the same set of
probabilistic facts by simply changing the ProbLog program
accordingly (e.g. we can generalize to the multiplication of two
integers).

http://www.lamda.nju.edu.cn
Experiment
www.lamda.nju.edu.cn

𝑚𝑚𝐺𝐺𝐺𝐺𝐺𝐺 : using an independent Classifier to discriminate number/position


(whether the generated image preserves semantics)
http://www.lamda.nju.edu.cn
Image Generation
www.lamda.nju.edu.cn

http://www.lamda.nju.edu.cn
Task Generalization
www.lamda.nju.edu.cn

http://www.lamda.nju.edu.cn
Data Efficiency
www.lamda.nju.edu.cn

http://www.lamda.nju.edu.cn
Deep Symbolic RL
www.lamda.nju.edu.cn

We propose an end-to-end reinforcement learning architecture


comprising a neural back end and a symbolic front end.

http://www.lamda.nju.edu.cn
Motivation
www.lamda.nju.edu.cn

Current DRL inherit some shortcomings from the current generation


of deep learning techniques:
• Data Inefficiency
• Lack the ability to reason on an abstract level, poor performance
in transfer learning, analogical reasoning, and hypothesis-based
reasoning.
• Largely opaque to humans, rendering them unsuitable for domains
in which verifiability is important

Propose an reinforcement learning architecture comprising a neural


back end and a symbolic front end with the potential to overcome
each of these shortcomings.

http://www.lamda.nju.edu.cn
Algorithm Architecture
www.lamda.nju.edu.cn

http://www.lamda.nju.edu.cn
Task
www.lamda.nju.edu.cn

circle (‘o’) returns a negative reward,


cross ('x’) returns a positive reward.

http://www.lamda.nju.edu.cn
Symbolic Representations
www.lamda.nju.edu.cn

http://www.lamda.nju.edu.cn
Low-level Symbol Generation
www.lamda.nju.edu.cn

http://www.lamda.nju.edu.cn
Experiment
www.lamda.nju.edu.cn

http://www.lamda.nju.edu.cn
Transfer
www.lamda.nju.edu.cn

We hypothesize that, while DQN might eventually learn to play the


random game effectively when trained on the same game, it will never
achieve competence at the random game when trained on the grid setup.

http://www.lamda.nju.edu.cn
Discovering Symbolic Policies
www.lamda.nju.edu.cn

Lawrence Livermore National Laboratory, Livermore, California, USA.

ICML 2021

We propose deep symbolic policy, a novel approach to directly


search the space of symbolic policies.

http://www.lamda.nju.edu.cn
Motivation
www.lamda.nju.edu.cn

The complexity of neural network-based policies render them


problematic to understand, trust, and deploy.
We propose deep symbolic policy, a novel approach to directly
search the space of symbolic policies.
Advantages of the deep symbolic policy:
• Interpretability
• Generalizability
• Deployability
• Transparency and Verifiability:
• Performance:

http://www.lamda.nju.edu.cn
Symbolic Policy
www.lamda.nju.edu.cn

http://www.lamda.nju.edu.cn
Algorithm Overview
www.lamda.nju.edu.cn

(Top) The Policy Generator samples


an example expression.

(Middle) The current policy is a


construct of previously learned
symbolic policies, the current sample,
and an anchor model.

(Bottom) The Policy Evaluator


applies the policy to the environment,
and the reward is used to train the
Policy Generator.

http://www.lamda.nju.edu.cn
Policy Generator
www.lamda.nju.edu.cn

Expressions can be represented as


symbolic expression trees.
Represent an expression tree as a
sequence of nodes by using its
pre-order traversal.

Tokens are selected from a library ℒ, e.g. {+,×, 𝑠𝑠𝑠𝑠𝑠𝑠, 𝑠𝑠1 , 𝑠𝑠2 , 0.1,5.0}
So the search is reduced to a discrete sequence.
Use an autoregressive recurrent neural network to generate control
policies represented by tractable mathematical expressions 𝑓𝑓: 𝑆𝑆 → ℝ.

http://www.lamda.nju.edu.cn
Policy Evaluator
www.lamda.nju.edu.cn

Given a sequence 𝜏𝜏 from the Policy Generator, we instantiate the


corresponding mathematical expression 𝑓𝑓.
We then employ 𝑓𝑓 directly as the control policy in environment:
𝑎𝑎 = 𝑓𝑓 𝑠𝑠 .

Use the standard REINFORCE policy gradient to train


autoregressive models.

With techniques: risk-seeking, constant optimization, exploration.


http://www.lamda.nju.edu.cn
Scaling Up to Multiple Action Dimensions
www.lamda.nju.edu.cn

When 𝐴𝐴 ∈ ℝ𝑛𝑛 , a simple idea is to continuously generate {𝑓𝑓1 , … , 𝑓𝑓𝑛𝑛 }, but


then the sampling space is Ο(|ℒ 𝑛𝑛𝑛𝑛 |).

So use a existing pre-trained policy model (e.g. a neural-network based


policy trained using an existing DRL algorithm), as the anchor model, to
reduce the sampling space, and train one by one 𝑓𝑓𝑖𝑖 .

So it's kind of like fine-tuning 𝑓𝑓𝑖𝑖 one by one.

http://www.lamda.nju.edu.cn
Experiment
www.lamda.nju.edu.cn

http://www.lamda.nju.edu.cn
Conclusion
www.lamda.nju.edu.cn

• Neuro-symbols are combined in a variety of ways.


• Neuro-symbol is an effective means to combine perception and
reasoning capabilities, and it can be clearly seen that it has
advantages over the current end-to-end neural networks in
some tasks.
• Neuro-symbol naturally introduces and retains domain
knowledge to improve sample efficiency, interpretability, and
transferability.
• These advantages are also exactly what current RL focuses on.

http://www.lamda.nju.edu.cn
www.lamda.nju.edu.cn

Thanks!

http://www.lamda.nju.edu.cn

You might also like