0% found this document useful (0 votes)
56 views31 pages

Paper Review

The document summarizes recent research on neurosymbolic artificial intelligence (AI). It discusses how fully neural and fully symbolic AI each have limitations, and how a hybrid neurosymbolic approach may overcome these. Specifically: 1) Neural networks lack interpretability, robustness, and the ability to reason symbolically, while symbolic AI struggles with noise, uncertainty, and generalization. 2) Recent work aims to integrate neural learning with symbolic knowledge representation and logical reasoning to build richer, more explainable AI systems. Approaches include encoding symbolic rules into neural networks and performing differentiable reasoning. 3) Open challenges include developing scalable methods for higher-order logical reasoning and knowledge extraction from large neural networks in a sound, efficient way

Uploaded by

Aayushee Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views31 pages

Paper Review

The document summarizes recent research on neurosymbolic artificial intelligence (AI). It discusses how fully neural and fully symbolic AI each have limitations, and how a hybrid neurosymbolic approach may overcome these. Specifically: 1) Neural networks lack interpretability, robustness, and the ability to reason symbolically, while symbolic AI struggles with noise, uncertainty, and generalization. 2) Recent work aims to integrate neural learning with symbolic knowledge representation and logical reasoning to build richer, more explainable AI systems. Approaches include encoding symbolic rules into neural networks and performing differentiable reasoning. 3) Open challenges include developing scalable methods for higher-order logical reasoning and knowledge extraction from large neural networks in a sound, efficient way

Uploaded by

Aayushee Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

SIG Paper Reading

Presented By:
Aayushee Gupta
Supervised By:
Prof. G.Srinivasaraghavan
• Discussed current state of art and future of AI
• Whether or not an AI system truly understands what it
does
• What all AI lacks: adaptability, robustness, abstraction,
AI Debate 2 generalizability, common sense, and causal reasoning
• Importance of building hybrid systems
• Neuro v/s Symbolic AI  Neuro-Symbolic AI
Thinking Fast
and Slow
• Human decisions are guided
by two main capabilities:
 System 1 : Fast, intuitive and
unconscious thinking
 System 2 : Slow, rational and
logical thinking
• Loose comparison between
System 1 and Machine Learning
System 2 and Symbolic Logic Learning

• Several research questions to be answered:

Thinking Fast  Metrics for a hybrid system


 Modeling introspection and governance
and Slow in AI  Handling complex environments with
competing priorities
 Abstraction
 Epistemic reasoning and planning while model
building
 Type of architectural choices (Multi-agent
view)
Neuro-Symbolic
AI: The 3rd Wave
• Summarizes 20 years of research in AI and its
future from the perspective of neurosymbolic
systems
• To build a rich AI system, that is, a semantically
sound, explainable and ultimately trustworthy
AI system, one needs to include with it a sound
reasoning layer in combination with deep
learning
• Integrating neural network-based learning with
symbolic knowledge representation and logical
reasoning
Key Questions
• Identifying the necessary and sufficient building
blocks of AI
• Bottlenecks in developing ML systems that
make AI trustworthy, interpretable and
explainable
• Directions for best representation and
taxonomy for Neurosymbolic AI
• Brittleness
• Lack of explainability
• Lack of parsimony
• Lack of robustness

Deep
Learning Two most fundamental aspects of intelligent cognitive behaviour:
the ability to learn from experience and the ability to reason from
Criticism what has been learned – Leslie Valiant
Develop a framework for building systems that can routinely
acquire, represent, and manipulate abstract knowledge, with a
focus on building systems that use that knowledge in the service
of building, updating, and reasoning over complex, internal models
of the external world – Gary Marcus
Advantages of Neuro-Symbolic Integration

Knowledge learned by a neural network can be represented symbolically

Reasoning takes place either symbolically or within the network in distributed form

Neural network-based learning and inference under uncertainty have been expected
to address the brittleness and computational complexity of symbolic systems

Symbolism has been expected to provide additional knowledge in the form of


constraints for learning to handle generalization problem in NN
• Current DL systems cannot represent full first- order or
higher-order logic

State of the • Logic Tensor Network: translates logical statements into


the loss function rather than into the network architecture

art Neuro- • DeepProbLog: Neural networks replaces a node in the


probabilistic inference tree of a symbolic ML system

symbolic • Differentiable reasoning: perform differentiable


unification and theorem proving inside the neural network

systems • Compositional embedding: transform symbolic


representations into vector spaces where reasoning can
take place through matrix computations over distance
functions
• Deep Logic Networks : Encode and extract logical rules
from Deep Belief Networks
• A probabilistic logic programming language that
incorporates deep learning by means of neural
predicates
DeepProbLog • Neural predicates serve as an interface between
the logic and the neural side, with both sides
treating the other as a black box
Deep Logic Networks
Encode symbolic knowledge into Deep Belief Networks in a
hierarchical fashion in the form of confidence rules
Deep Belief Network is a modular system consisting of a stack of
restricted Boltzmann machines (RBMs)
Extracts propositional rules enriched with confidence values from
RBMs layer-wise
Loss in information when experimented on complex image data
like MNIST
Differentiable Reasoning

• End-to-end differentiable proving of queries to knowledge bases by operating


on dense vector representations of symbols
• Recursively build a neural network enumerating all the possible proof paths for
proving a query (or goal) on a given KB and aggregate all their proof scores via
max pooling.
• a unification module, which compares sub-symbolic representations of logic
atoms,
• mutually recursive OR and AND modules, which jointly enumerate all possible
proof paths
• final aggregation module selects the highest-scoring proof path
Building blocks

• Key Questions:
 Attention layer or graph networks?
 Whether to use probability theory and at what level
 Does theorem proving using neural networks bring a significant gain?
 Localist v/s distributed representation
 Deep networks or Bayesian networks?
• Ingredients:
 Gradient-based optimization used by deep learning to handle large amounts of data
 Language for describing encoded knowledge: First order logic, nonmonotonic and modal logic and logic
programming
 Modularity: refer to large parts of the network by the composition of symbols and relations among them
 Reasoning, within or outside the network, exact or approximate
 Constraint satisfaction as part of the interplay between learning and reasoning
Ways of hybrid integration

• Translate and encode symbolic knowledge in the set of weights of a


network
• Translate and encode symbolic knowledge into the loss function of the
network
• An intermediate representation with factor graphs in between neural
networks and logical representations
Taxonomy for Neurosymbolic AI

• Type 1: Standard deep learning with symbols as input and output


• Type 2: Hybrid system where core neural network is loosely-coupled with a symbolic problem
solver
• Type 3: Hybrid system whereby a neural network focusing on one task (e.g. object detection)
interacts via its input and output with a symbolic system specialising in a complementary task
(e.g. query answering)
• Type 4: Symbolic knowledge is compiled into the training set of a neural network and should
include the study of first-order logic, higher-order, manyvalued and non-classical logic
• Type 5: Distributed neuralsymbolic systems where a symbolic logic rule is mapped onto an
embedding which acts as a soft-constraint (a regularizer) on the network’s loss function
• Type 6: Combinatorial and true symbolic reasoning inside a neural engine
Representation for Neurosymbolic AI

Option 1: Symbols are translated into a neural network


and one seeks to perform reasoning within the network

Option 2: Hybrid approach in which the neural network


interacts with a symbolic system for reasoning

Option 3: Precise sound reasoning when expert


knowledge is available
Application areas of Neurosymbolic AI
• Commonsense reasoning
• Planning
• Knowledge-base completion and data-driven ontology learning
• Modeling cause and effect
• Intervention and counterfactual reasoning
A Neurosymbolic Cycle

learning is carried out from data by neural networks which use gradient descent optimization

efficient forms of propositional reasoning can also be carried out by the network, c.f. neural-
symbolic cognitive reasoning

rich first-order logic reasoning and extrapolation needs to be done symbolically from
descriptions extracted from the trained network

once symbolic meaning has emerged from the trained network, symbols can be manipulated
easily by the current computer and can serve as constraints for further learning from data
Challenges for the third wave in AI
• First order logic and higher order knowledge extraction from very large
networks that is provably sound and efficient
• Goal-directed commonsense and efficient combinatorial reasoning
• Human-network communication for promoting communication and
argumentation
• How symbolic meaning emerges from large networks of neurons
• Proofs are needed of the capability of different neural architectures at
representing various logical languages.
• Setting up standard benchmarks and associated comprehensibility tests
for comparative evaluation in the next decade
Thinking Fast and Slow: Efficient Text-to-Visual
Retrieval with Transformers

• Given a text string, how would you search for related images and videos on a large scale?

• How humans would do it?


- First isolate a few promising candidates by giving a quick glance at all the images with a fast process
- Pay more attention to image details with a slow process

• Can we replicate this with machines?


- Combination of a fast (dual encoder) and slow (cross attention via transformer) approach
• Fast approach: indexable, scalable, efficient but limited accuracy
• Slow approach: highly accurate but slow and not scalable
Major Contributions

• Propose a fine-grained cross-attention architecture to transformer-based


model that improves retrieval accuracy while maintaining scalability
• Propose a generic approach for combining a Fast dual encoder model with
Slow but accurate transformer-based model via distillation and re-ranking
• Perform image retrieval with the COCO and Flickr30K datasets and reduced
the inference time of powerful transformer-based models by 100x whilst
getting state of the art results.
Model Architecture
• Fast model
Use dual encoders that independently
encode the input image and text to compute
a similarity score via a single dot product
• Slow model
Use transformer based cross-attention
models that jointly process the input image
and text with cross-modal attention to
compute a similarity score
• Distillation
Fast model is improved by transfer of
knowledge from Slow model via distillation at
training time (offline)
• Reranking
Slow model is accelerated and improved with
the distilled Fast model using a re-ranking
strategy at query time
Fast Model : Dual Encoder

• The objective is to learn image embeddings f(x) and text embeddings g(y) so that
semantically related images and text have high similarity and the similarity of unrelated
images and text is low
• Noise Contrastive Estimation Loss:

• The image encoder (f) can be a globally pooled output of a CNN while the text encoder
(g) is either a bag-of-words representation or a more sophisticated BERT encoder
Slow Model : Cross-attention
• Given image 𝓍 and text , following model computes similarity between the
two:
where is the model, is a visual encoder and is the cross-attention
network that computes similarity between text and image
• Fine-grained cross-attention: Gradually upsample the last convolutional
feature map conditioned on earlier higher resolution feature maps
Slow Model : Details

• A stack of Transformer decoders taking the visual feature map (x) as


an encoding state.
• Each layer of the decoder is composed of a masked text self-attention
layer, followed by a cross-attention layer that enables the text to
attend to the visual features and finally a feed forward layer
• Commonly used objective: Cross-modal image-text matching loss
• Proposed objective: Captioning loss that performs retrieval by
searching for an image x that is the most likely to decode caption y.
Distillation
• Helps transfer knowledge of superior cross-attention model to the dual
encoder model
• Standard formulation of distillation cannot be applied due to infinite set of
possible word sequences describing an image
• Given an image-text pair (xi ,yi), sample a finite subset:
• Probability distribution of likelihood of a pair given slow teacher model and
fast student model:

• Distillation Loss: Measures cross entropy between student and teacher


distributions:

• Overall objective function:


Reranking

Rank entire dataset through


Distilled Fast model and then Helps in recovering
re-rank top-K results by the performance of Slow model
Slow model:
Evaluation

• PreTrain Datasets: MS-COCO, CC (Conceptual Captions)


• Validation, Test Dataset: Flickr30k
• Visual Model: ResNet-50 v2 CNN trained from scratch
• Fast Model: bag-of-words on top of word2vec pretrained embeddings and
pretrained BERT Base
• Slow Model: stack of 3 Transformer decoders with hidden dimension 512
and 8 attention heads
• Comparison with state-of-the-art cross attention models: VirTex, PixelBERT,
UNITER, OSCAR, VATEX
Findings
• Cross-attention models are better than Dual
Encoders
• BoW performs better than BERT Dual Encoder
model
• captioning loss leads to better results than using
an image-text matching loss coupled with a MLM
loss.
• Backward captioning further improves retrieval
performance.
• Gradual upsampling improves performance for
both image and video retrieval
• Distilled DE model performs 10% better than Fast
DE model although taking longer to converge
• Recover or outperform the performance of the
Slow model (R@1) while significantly decreasing
the query time (100x reduction) in retrieval time
• Perform better than PixelBERT, VirTex, VATEX but
not UNITER and OSCAR
Distillation and Reranking Results
Comparison with state-of-the-art models
References

• Serafini, Luciano, and Artur d'Avila Garcez. "Logic tensor networks: Deep learning and logical reasoning
from data and knowledge." arXiv preprint arXiv:1606.04422 (2016).
• Tran, Son N., and Artur S. d’Avila Garcez. "Deep logic networks: Inserting and extracting knowledge from
deep belief networks." IEEE transactions on neural networks and learning systems 29.2 (2016): 246-258.
• Tran, Son N., and Artur S. d’Avila Garcez. "Deep logic networks: Inserting and extracting knowledge from
deep belief networks." IEEE transactions on neural networks and learning systems 29.2 (2016): 246-258.
• Rocktäschel, Tim, and Sebastian Riedel. "End-to-end differentiable proving." Advances in neural
information processing systems 30 (2017).
• Daniel, Kahneman. "Thinking, fast and slow." (2017).
• Manhaeve, Robin, et al. "Deepproblog: Neural probabilistic logic programming." Advances in Neural
Information Processing Systems 31 (2018).
• Booch, Grady, et al. "Thinking fast and slow in AI." arXiv preprint arXiv:2010.06002 (2020).
• Garcez, Artur d'Avila, and Luis C. Lamb. "Neurosymbolic AI: the 3rd wave." arXiv preprint arXiv:2012.05876 (2020).
• Miech, Antoine, et al. "Thinking fast and slow: Efficient text-to-visual retrieval with transformers." Proceedings of
the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.

You might also like