Neurosymbolic AI - Why, What, and How
Neurosymbolic AI - Why, What, and How
Neurosymbolic AI - Why, What, and How
Abstract—Humans interact with the environment using a to support human intelligence and enable individuals to un-
combination of perception - transforming sensory inputs from derstand and interact with the world around them. Daniel
their environment into symbols, and cognition - mapping symbols
arXiv:2305.00813v1 [cs.AI] 1 May 2023
Based Reasoning …
b.
Algorithm-level Features Application-level Features
1. Large-scale Perception (L) 2. Abstraction (M) 1. User-Explainability (L) 2. Domain Constraints (L)
3. Analogy (M) 4. Planning (M) 3. Scalability (L) 4. Continual (L)
2.
Intertwined Integration between Neural 1. Program Abstraction Induction Methods (e.g., Prob. Programs)
Symbolic Structured and Symbolic Components 2. End-to-End Differentiable Methods (e.g., PK-iL)
Knowledge Based …
Reasoning b. Algorithm-level Features Application-level Features
1. Large-scale Perception (H) 2. Abstraction (H) 1. User-Explainability (H) 2. Domain Constraints (H)
3. Analogy (H) 4. Planning (H) 3. Scalability (H) 4. Continual (H)
Fig. 1. The two primary types of neurosymbolic techniques—lowering and lifting—can be further divided into four sub-categories. Across the low (L),
medium (M), and high (H) scales, these methods can be used to provide a variety of functions at both algorithmic and application levels.
1 1 0
Apple
has
Antioxidants
0 0 1
Watermelon
with neural processing pipelines, system scores tend to be
Grape
Watermelon is_a
Fruit Representation-level Structured Knowledge Compression
low across all application-level aspects of user-explainability,
Apple
domain constraints, scalability, and continual adaptation, as
1. Antioxidants has-1 Apple is_a Fruit Grape
2. Antioxidants has-1 Grape is_a Fruit
3. Watermelon is_a Fruit Watermelon denoted by the letter (L) in Figure 1. This is primarily due
to the effect of a significant user-technology barrier. End-
users must familiarize themselves with the rigor and details
Fig. 2. The figure illustrates two methods for compressing knowledge graphs of formal logic semantics to communicate with the system
to integrate them with neural processing pipelines. One approach involves (e.g., to provide domain constraint specifications).
embedding knowledge graph paths into vector spaces, enabling integration
with the neural network’s hidden representations. The other method involves Algorithm-Level Analysis of Methods in Category 2.
encoding knowledge graphs as masks to modify the neural network’s inductive For category 2(a), the proliferation of large language models
biases. An example of an inductive bias is the correlation information stored and their corresponding plugins has spurred the development
in the self-attention matrices of a transformer neural network [8], [9].
of federated pipeline methods. These methods utilize neural
networks to identify symbolic functions based on task de-
scriptions that are specified using appropriate modalities such
network, as depicted in Figure 2. Nonetheless, this process as natural language and images. Once the symbolic function
has limited constraint specification capabilities, because large is identified, the method transfers the task to the appropriate
neural networks have multiple processing layers and moving symbolic reasoner, such as a math or fact-based search tool.
parts ((M) in Figure 1). It is challenging to determine whether Figure 3 illustrates a federated pipeline method that utilizes the
modifications made to the network are retained throughout the Langchain library. These methods are proficient in supporting
various processing layers. Neural processing pipelines do offer large-scale perception through the large language model ((H)
a high degree of automation, making it easier for a system to in Figure 1). However, their ability to facilitate algorithm-level
scale across various use cases (such as plugging in use case- functions related to cognition, such as abstraction, analogy,
specific knowledge graphs) and to support continual adaptation reasoning, and planning, is restricted by the language model’s
os.environ["SERPAPI_API_KEY"] = REDACTED API KEY
tools = load_tools(["serpapi","llm-math"],llm=llm)
#see list of agent types such as "zero-shot-react-description" in the langchain documentation
federated_agent = initialize_agent(tools,
llm,
agent="zero-shot-react-description") Federated Pipeline Methods
#enter a query
query = """Assuming it takes an hour to prepare for the drive,
how much time should be allotted for the total journey by car from NYC, USA to LA, USA?"""
#run the federated agent
federated_agent.run(query)
What is 1 + the number of hours it takes to go from NYC,USA to LA, USA? Wolfram
1 + 41 = 42
Alpha-API
How many hours does it take to drive from NYC, USA to LA, USA? Google Serp-API 41 hours
Fig. 3. Illustrates a federated pipeline method using the Langchain library. The method employs a language model trained on chain-of-thought reasoning
to segment the input query into tasks. The language model then utilizes task-specific symbolic solvers to derive solutions. Specifically, the language model
recognizes that search and scientific computing (mathematics) symbolic solvers are necessary for the given query. The resulting solutions are subsequently
combined and transformed into natural language for presentation to the user.
comprehension of the input query ((M) in Figure 1). Category also limits the constraint modeling capability, which depends
2(b) methods use pipelines similar to those in category 2(a) on the language model’s ability to comprehend application
federated pipelines. However, they possess the added abil- or domain-specific concepts ((M) in Figure 1). Federated
ity to fully govern the learning of all pipeline components pipelines excel in scalability since language models and appli-
through end-to-end differential compositions of functions that cation plugins that facilitate their use for domain-specific use
correspond to each component. This level of control enables cases are becoming more widely available and accessible ((H)
us to attain the necessary levels of cognition on aspects of in Figure 1). Unfortunately, language models require an enor-
abstraction, analogy, and planning that is appropriate for the mous amount of time and space resources to train, and hence
given application ((H) in Figure 1) while still preserving the continual domain adaptation using federated pipelines remains
large-scale perception capabilities. Figure 4 shows an example challenging ((L) in Figure 1). Nonetheless, advancements in
of this method for mental health diagnostic assistance. language modeling architectures that support continual learn-
Application-Level Analysis of Methods in Category 2. ing goals are fast gaining traction. Category 2(b) methods
For the systems belonging to category 2(a), tracing their show significant promise as they score highly regarding all
chain-of-thought during processing immensely enhances the application-level aspects, including user-explainability, domain
application-level aspects of user-explainability. However, the constraints, scalability across use cases, and support for contin-
language model’s ability to parse the input query and relate it ual adaptation to application-specific changes ((H) in Figure
to domain model concepts during response generation limits 1). This is due to the high modeling flexibility and closely
this ability ((M) in Figure 1). Furthermore, the specification of intertwined coupling of system components. Thus, a change
domain constraints in natural language using prompt templates in any particular component leads to positive changes in all
Snomed Concept:
Health Related Map_Function(𝐗 = query, 𝐙 = concept, 𝚹1) If you are struggling with your thoughts and emotions, it
Behavior Finding is recommended that you seek help from a mental
Really struggling with my bisexuality which is causing chaos in
DSM-5 Definition: health professional, such as a therapist or counselor.
my relationship with aObsessive-compulsive
girl. I am equal todisorder
worthlessis a disorder
for her.in which
I’m people They can equip you with the necessary resources and
Snomed Concept:
Intrusive
have obsessive, intrusive thoughts, ideas or sensations that techniques to manage your mental health in a healthy Recorded Expert Agreement
now Thoughts
starting to get drunkmake because I can’t
them feel driven to do cope
somethingwith the
repetitively using Generated Responses
manner. In case you feel like you may harm yourself
obsessive, intrusive thoughts, and need to get out of my immediately, please contact your local emergency
head. services or the nearest emergency room for immediate
assistance. You can also reach out to a suicide
Suicidality Concept:
prevention hotline, such as the National Suicide
70%
Non-specific active suicidal thoughts
Prevention Lifeline at 1-800-273-8255, for guidance and
I have been there and got nothing same as my life. I have a
support.
gun. I’m not on a ledge or something but i have a gun on my
lap.
Continual
Process
47%
Knowledge-
Wish to be Dead infused
Learning Constrained_Response_Generation_Model
True False (PKiL) = 𝚷(𝐘, 𝚹2)
Non-Specific
Suicidal
Active Suicidal
Indication (𝐘)
Thoughts LLMs PkiL
Response Constraints.
True If (𝐘 in {Suicidality: [Ideation, Behavior, Attempt]}),
False
Output: “Please reach out to a mental health User-explainability
Active Suicidal
professional, such as a therapist or counselor, who can i.e., Clinicians and Patients
Ideation with Any
provide you with the tools and resources to cope with +
Methods
your thoughts and emotions healthily. If you are in Domain Constraints
True False immediate danger of harming yourself, please call your Verify adherence to the clinical
local emergency services or the nearest emergency guideline on diagnosis which a
Suicidal Suicidal
Behavior or room for immediate help. You can also contact a suicide clinician understands.
Ideation (𝐘)
Attempt (𝐘) prevention hotline, such as the National Suicide
Prevention Lifeline, at 1-800-273-8255 for support and
guidance.”
𝐘 = Expert_Defined_Domain_Model(𝐗 = query, 𝐙 = concept)
Fig. 4. depicts a pipeline that is fully differentiable from end to end. It consists of a composition of functions corresponding to various pipeline components.
This pipeline enables the development of application-tailored AI systems that can be easily trained end-to-end. To accomplish this, trainable map functions
are applied to raw data, converting it to concepts in the domain model. The example given in the figure relates to mental health diagnosis and conversational
assistance. The map functions link fragments of raw data to decision variables in the diagnosis model, which are then used to apply constraints to the patient’s
response generated by the text generation model. Results from an existing implementation demonstrates that expert satisfaction levels reached 70% using such
a pipeline, compared to 47% with LLMs in federated pipelines, such as OpenAI’s text-Davinci-003 [10].
components within the system’s pipeline. Notably, in an imple- In summary, this article highlights the effectiveness of
mented system for the mental health diagnostic assistance use combining language models and knowledge graphs in current
case, shown in Figure 4, we see drastic improvements in expert implementations. However, it also suggests that future knowl-
satisfaction with the system’s responses, further demonstrating edge graphs have the potential to model heterogeneous types
the immense potential for 2(b) category methods. of application and domain-level knowledge beyond schemas.
This includes workflows, constraint specifications, and process
III. T HE F UTURE OF N EUROSYMBOLIC AI structures, further enhancing the power and usefulness of neu-
In this article, we compared different neurosymbolic ar- rosymbolic architectures. Combining such enhanced knowl-
chitectures, considering their algorithm-level aspects, which edge graphs with high-capacity neural networks would provide
encompass perception and cognition, and application-level the end-user with an extremely high degree of algorithmic
aspects, such as user-explainability, domain constraint spec- and application-level utility. The concern for safety is behind
ification, scalability, and support for continual learning. The the recent push to hold further rollout of generative AI sys-
rapid improvement in language models suggests that they tems such as GPT*, since current systems could significantly
will achieve almost optimal performance levels for large- harm individuals and society without additional guardrails.
scale perception. Knowledge graphs are suitable for symbolic We believe that guidelines, policy, and regulations can be
structures that bridge the cognition and perception aspects encoded via extended forms of knowledge graphs such as
because they support real-world dynamism. Unlike static and shown in Figure 4 (and hence symbolic means), which in turn
brittle symbolic logics, such as first-order logic, they are can provide explainability accountability, rigorous auditing
easy to update. In addition to their suitability for enterprise- capabilities, and safety. Encouragingly, progress is being made
use cases and established standards for portability, knowledge on all these fronts swiftly, and the future looks promising.
graphs are part of a mature ecosystem of algorithms that
enable highly efficient graph management and querying. This ACKNOWLEDGEMENTS
scalability allows for modeling large and complex datasets This work was supported in part by the National Sci-
with millions or billions of nodes. ence Foundation under Grant 2133842, “EAGER: Advancing
Neuro-symbolic AI with Deep Knowledge-infused Learning.”
AUTHORS
Amit Sheth is the founding director of the AI Institute
of South Carolina (AIISC), NCR Chair, and a professor of
Computer Science & Engineering at USC. He received the
2023 IEEE-CS Wallace McDowell award and is a fellow
of IEEE, AAAI, AAIA, AAAS, and ACM. Contact him at:
[email protected]
Kaushik Roy is a Ph.D. student with an active publica-
tion record in the area of this article. Contact him at:
[email protected]
Manas Gaur is an assistant professor at UMBC. His dis-
sertation was on Knowledge-infused Learning, with ongoing
research focused on interpretability, explainability, and safety
of the systems discussed in this article. He is a recipient of
the EPSRC-UKRI Fellowship, Data Science for Social Good
Fellowship, and AI for Social Good Fellowship, and was
recently recognized as 2023 AAAI New Faculty. Contact him
at: [email protected]
R EFERENCES
[1] D. Kahneman, “Thinking, fast and slow,” Farrar, Straus and Giroux,
2011.
[2] J. Jumper, R. Evans, A. Pritzel, T. Green, M. Figurnov, O. Ronneberger,
K. Tunyasuvunakool, R. Bates, A. Žı́dek, A. Potapenko et al., “Highly
accurate protein structure prediction with alphafold,” Nature, vol. 596,
no. 7873, pp. 583–589, 2021.
[3] A. Fawzi, M. Balog, A. Huang, T. Hubert, B. Romera-Paredes,
M. Barekatain, A. Novikov, F. J. R Ruiz, J. Schrittwieser, G. Swirszcz
et al., “Discovering faster matrix multiplication algorithms with rein-
forcement learning,” Nature, vol. 610, no. 7930, pp. 47–53, 2022.
[4] D. Gentner, “Structure-mapping: A theoretical framework for analogy,”
Cognitive science, vol. 7, no. 2, pp. 155–170, 1983.
[5] S. Bubeck, V. Chandrasekaran, R. Eldan, J. Gehrke, E. Horvitz, E. Ka-
mar, P. Lee, Y. T. Lee, Y. Li, S. Lundberg et al., “Sparks of artificial
general intelligence: Early experiments with gpt-4,” arXiv preprint
arXiv:2303.12712, 2023.
[6] A. Sheth, M. Gaur, K. Roy, R. Venkataraman, and V. Khandelwal,
“Process knowledge-infused ai: Toward user-level explainability, inter-
pretability, and safety,” IEEE Internet Computing, vol. 26, no. 5, pp.
76–84, 2022.
[7] A. d. Garcez and L. C. Lamb, “Neurosymbolic ai: The 3 rd wave,”
Artificial Intelligence Review, pp. 1–20, 2023.
[8] V. Rawte, M. Chakraborty, K. Roy, M. Gaur, K. Faldu, P. Kikani,
H. Akbari, and A. P. Sheth, “Tdlr: Top semantic-down syntactic lan-
guage representation,” in NeurIPS’22 Workshop on All Things Attention:
Bridging Different Perspectives on Attention.
[9] R. Wang, D. Tang, N. Duan, Z. Wei, X. Huang, G. Cao, D. Jiang,
M. Zhou et al., “K-adapter: Infusing knowledge into pre-trained models
with adapters,” arXiv preprint arXiv:2002.01808, 2020.
[10] K. Roy, M. Gaur, Q. Zhang, and A. Sheth, “Process knowledge-infused
learning for suicidality assessment on social media,” arXiv preprint
arXiv:2204.12560, 2022.