Grounding AI Cognitive Science
Grounding AI Cognitive Science
edu
DEPARTMENT: NEUROSYMBOLIC AI
R
obust communication transcends human–human the grounding of machine agents. However, it remains
communication settings to include human– a multidimensional challenge, encompassing diverse
machine, machine–machine, and multiagent contexts, abstractions, and modalities of understand-
human–machine teams. Grounding fosters a common ing (see Figure 1). In the absence of a clear definition,
understanding among agents performing a task— we are unable to determine genuine advances or task-
typically in the real world. With the growing number specific adjustments.
of human–AI interactions, grounding is a fundamen- This article sheds light on different aspects of
tally important capability of AI systems, models, and grounding through the lens of cognitive science and AI,
agents.5,8,11,15 Grounding allows AI systems to bridge discusses specific neurosymbolic solutions for ground-
semantic gaps in the real world, team with other ing, and highlights future work.
agents in such environments, process inputs from
the environment, and learn from interactions. A suc- COGNITIVE SCIENCE LENS
cessful synthetic teammate requires several cognitive
Identification of the symbol grounding problem in
capacities, including situation assessment, task behavior,
cognitive science dates to Harnad. Following the intro-
language comprehension and generation,3 and knowl-
duction of computation with symbols credited to
edge gap resolution processes. Grounding enables
McCarthy, Searle’s Chinese room problem revealed
agents with different capabilities to communicate.
how amodal symbol manipulation lacks grounding.
Both cognitive scientists and computer scientists
However, symbol manipulation is often the foundation
have focused on how to make internal mechanisms
of contemporary AI systems. Challenges to amodal
(or representations) of external entities intrinsic to the
symbol manipulation include embodied grounding12
agent itself rather than being defined by an external
and the recent use of language and simulation to
designer or interpreted by an observer.15 Recent efforts
establish grounding.13 Given this long-standing research
in natural language processing (NLP), computer vision
problem, Ziemke15 groups grounding efforts into two
(CV), and human–computer interaction (HCI) improve
categories: 1) cognitivist or 2) enactivist.
1541-1672 © 2024 IEEE
Cognitivism grounds atomic primitives in sensori-
Digital Object Identifier 10.1109/MIS.2024.3366669 motor invariants.15 Concepts constructed from these
Date of current version 10 April 2024. inherit the grounding of their constituents.
66 IEEE Intelligent Systems Published by the IEEE Computer Society March/April 2024
NEUROSYMBOLIC AI
FIGURE 1. Grounding has multiple dimensions and needs knowledge at different levels of abstractions. (This is similar to
the need for linguistic, common sense, world model, and domain knowledge for language understanding; see https://bit.ly/KiLU,
Fig. 3). Grounding may occur at different levels in the task execution process. For example, the drone agent must be able to com-
prehend the instructions in a natural language format. Next, the instruction may require parsing relevant symbols (in the case of
neurosymbolic methods) for reasoning processes. The information extracted must then be grounded to the drone’s capabilities
(common sense grounding). Finally, the instruction must be grounded in the specific navigation task.
Nevertheless, different agents may reason with differ- mechanisms serve as an interface, peripheral to
ent abstractions, creating a divergence that requires core cognitive operations.
repair. A more recent perspective, enactivism values
Barsalou predicts that future cognitive research
the role of action, embodiment, and environment.
will integrate classic symbolic architecture, statistical/
Robotic agents can potentially obtain grounding by
dynamic systems, and grounding cognition. Formal and
physically linking to an environment through sensory
computational accounts of grounding will shift from
input and motor output. Agent functions can be either
epiphenomenal to casual. Grounding mechanisms may
engineered or learned. With meticulously engineered
potentially replace the amodal mechanisms in cogni-
grounding, systems may demonstrate the “correct”
tive architectures. We believe that advancements in
behavior, but their internal mechanisms are not inher-
neurosymbolic AI will be instrumental here. Next, we
ent to the system. Alternatively, an agent function can
describe the grounding problem from a machine learn-
be acquired by adjusting connection weights instead
ing (ML) and AI perspective and highlight what is miss-
of requiring programming. However, the definition of a
ing from a cognitive science lens.
correct agent function and the question of how to
evaluate various agent functions remain.
AI LENS
Given our interest in real-world human–AI interac-
Grounding in AI has typically been referred to as con-
tions and multimodal systems, we emphasize enacti-
necting concepts to other knowledge bases (KBs) or
vist grounding. However, cognitivist grounding remains
world models. We first discuss the different grounding
essential, as we summarize from Barsalou13:
efforts in the AI11 community. We then draw parallels
Mental imagery, cognitive grammar, mental with the cognitive science community13,15 and identify
spaces, and compositional reasoning support some nuanced distinctions.
explanations for thought. Neuroimaging shows Chandu et al.11 note the use of static versus
that higher cognition is realized in the brain’s dynamic terminology used in the NLP/CV context, simi-
model systems. One theory is that grounding lar to the cognitivist versus enactivist distinction in
FIGURE 2. Types of grounding. (a) The static and dynamic definitions in neurosymbolic AI loosely mirror cognitivist and enactivist
grounding in cognitive science, respectively, although there are subtle differences. Cognitive scientists tend to focus more on the
specific monitoring and repair processes in the case of different perspectives between interacting agents, whether due to differ-
ences in static knowledge or engagement opportunities with an open world. (b) For an AI or cognitive agent to interact effectively
with its environment, it must understand the language used by external agents, accurately assess the current situation and context,
and identify and address gaps in its knowledge.4 This process involves recognizing when it lacks understanding or has difficulty with
language comprehension and communicating these knowledge gaps to other agents or external sources. By utilizing these pro-
cesses, agents can update their knowledge of the task and environment, avoiding catastrophic errors and allowing for the iterative
and interactive aspects of dynamic grounding. Consider a machine agent, like a drone, working with a human operator to survey a
specific area. The agent can use its background (stored) knowledge to understand the monitored environment and the mission’s pur-
pose (static/cognitivist grounding). Suppose the agent comes across something it has not seen before and cannot identify. In that
case, it will interactively and iteratively ask the human operator for assistance in closing this gap (dynamic/enactivist grounding).
cognitive science (see Figure 2). Static grounding—the agent (e.g., machine) interacting with a static KB to
most predominant form—relies on accessible evidence frame a response and deliver it back to the other agent
supporting the common ground to connect concepts (e.g., human). Both agents may share this common
within a given context to the real world.11 For static ground by assuming its universality, i.e., no external
grounding, common ground is typically through an references. The success of grounding is measured based
on the agent’s ability to link the query to the available and, more recently, grounding large language model-
data. In contrast, dynamic grounding establishes com- based agents (https://rb.gy/2pfq1g). Nevertheless, ground-
mon ground iteratively, where both agents can commu- ing in mainstream ML exploits deductive learning.5
nicate to seek and provide clarifications, typically in a
potentially changing physical environment. This allows Limitations
corrections to misunderstandings and involves several One notable omission in ML is grounding with sensors
rounds of clarification and acknowledgments. or environmental data.1 Furthermore, the study of
latent pragmatics is also missing from the work of
Static Grounding Chandu et al.11 Pragmatic analysis, pioneered by Austin,
According to Harnad, manipulating symbolic represen- Grice, and Searle and extended by others, such as
tations without meaning cannot support reasoning. AI Sperber, focuses on understanding functional inten-
researchers responded with different representations tions and implications based on variations in linguistic
for different uses. Typically, these frameworks have the content across different contexts.
following key components: the designator, denoting The lack of a consistent definition of grounding cre-
the name or symbol utilized to identify the category; ates considerable ambiguity regarding how to ground,
the epistemological representation, employed to rec- what to ground, and where to ground. To bridge this
ognize instances of the category; and the inferential gap, Chandu et al.11 outline the following grounding
representation, comprising “encyclopedic” knowledge stages: stage 1, localization; stage 2, external knowl-
about the category and its members. The epistemologi- edge; stage 3, common sense; and stage 4, personal-
cal representations are termed concept descriptions. ized consensus. However, these stages are insufficient.
In CV, these representations are considered object Consider a challenging example of grounding a drone
models.5 (with an AI-based decision-making model) to ensure
This form of grounding is typically established using safety concerns.
deductive learning. It is consistent with cognitivist Localization requires that the drone accurately
grounding, with little to no active or online supervision determine its position relative to its surroundings.
from the environment or an external agent. ML efforts Next, grounding with external knowledge introduces
for static grounding include using entity slot filling, additional information, such as weather and airspace
adversarial references to grounding visual referring regulations, that inform subsequent action. However,
expressions, visual semantic role labeling, and disam- which external knowledge sources should be used and
biguation of concepts and entities.11 Additionally, meth- prioritized in the decision making? Stage 3, common
ods designed for manipulating representations include sense, could include avoiding obstacles in its flight
fusion and concatenation, representation alignment, path and taking proactive measures to avoid risks or
and projection of representations into a shared space. mitigate potential harm. However, it is not easy to
Finally, the ML community has designed different quantify and measure this grounding stage. Stage 4,
learning objectives to address the grounding problem, personalized consensus, can help ground decisions to
including multitasking and joint training, the design of the drone’s perception of the environment and prior
new loss functions, and adversarial learning methods. experiences. However, it is unclear whether the drone
must adhere to its prior experiences when they conflict
Dynamic Grounding with human instructions that reflect a broader under-
Dynamic efforts in ML are typically designed for situa- standing of the situation.
tions where an entity in an environment is matched While a helpful initial attempt, these stages are not
with an epistemological representation that activates sufficiently specific to standardize how grounding occurs
a larger knowledge structure containing the composite and at what abstraction levels. The lack of definition
concept representation. Such systems learn to ground allows for creative interpretation of the problem and,
their own experience dynamically in the environment, therefore, new tasks, datasets, and methods. However,
creating more robust capabilities not dependent on in the future, the community will benefit from a stan-
preprogrammed representations.5 Grounding frame- dardized, comprehensive definition.
works, such as learning from example and learning by
conversation, are consistent with enactivist grounding NEUROSYMBOLIC GROUNDING
in cognitive science. Efforts in the ML community for Neurosymbolic methods can benefit grounding by inte-
dynamic grounding include grounding embodied agents; grating traditional symbolic reasoning approaches with
natural interactions with human-in-the-loop feedback; the generalization capabilities of neural networks.
Neurosymbolic systems integrate the statistical learning approaches can allow for the grounding of new con-
capabilities of neural networks with the structured, sym- cepts in an environment and facilitate knowledge
bolic representations used in classical AI. Neurosym- gap detection, identification, and resolution processes,
bolic AI seeks to benefit from the synergy of symbolic leading to adaptive and robust models.
and neural methods. Traditional symbolic reasoning
methods use formal languages (e.g., Planning Domain
Definition Language) for reasoning over knowledge CONCLUDING REMARKS
stored in a structured format and represented by We have focused on explicit notions of grounding. We
“symbols.” These reasoning methods manipulate and conclude with remarks on implicit forms of grounding.
infer from structured symbolic representations, such Recently, the use of digital twins has emerged to
as logic-based rules, knowledge graphs, or ontologies. augment the performance of ML systems in domains
The symbolic representations provide a transparent ranging from autonomous transportation6 to next-
and interpretable framework for knowledge represen- generation wireless communication.7 Digital twins can
tations and logical reasoning, but these systems are provide a high-fidelity representation of physical enti-
brittle and often cannot be generalized. However, com- ties by accurately modeling the structure, behavior,
bined with neural methods, traditional methods leverage and characteristics of a real-world system (world
advancements in deep learning to acquire knowledge model) governed by physical laws. Such ideas can
representations and enhanced generalization capabili- expand the training dataset (out of distribution), allow-
ties effectively.10 Neural networks consist of intercon- ing models to be generalized. This implicit grounding
nected layers of artificial neurons that use weighted in physical laws can prevent nonfactual generation,
connections, enabling them to learn complex mappings reduce hallucinations, and anchor model responses to
between inputs and outputs. These networks are instru- specific information, facilitating harmonization with
mental in pattern recognition, classification, regression,
the corresponding world model. We note, regarding
and sequence prediction. Therefore, neurosymbolic
using digital twins for grounding parallels, the conten-
methods leverage the strengths of each paradigm.
tion of Pickering and Garrod8: that interlocutors implic-
As noted earlier, many of the grounding efforts in
itly comprehend each other by aligning their models of
ML rely on deductive learning and lack active or online
the discussed situation at various levels of cognitive
supervision from the environment or an external agent.
and linguistic representation. Such implicit alignment
Neurosymbolic methods for grounding can offer sev-
processes between agents (akin to digital twins) inspire
eral advantages, including compositional reasoning
computational grounding processes.
and situational awareness.
Finally, we advocate for more knowledge-infused
One particular asset of neurosymbolic methods is
neurosymbolic learning and reasoning systems that
the use of functional modules. Natural language texts
naturally integrate linguistic, common sense, general
(i.e., instructions or queries) are mapped to functional
(world model), and domain-specific processes and
modules that carry out atomic actions. These func-
knowledge to facilitate static grounding. We expect
tional modules can be user defined or learned. This
fundamental progress on the synergistic use of neural
allows agent functions grounded in symbolic represen-
networks and structured semantics to advance from
tations to complete specific actions or generate
responses. Additionally, the compositional nature of content processing to content understanding and
these functional programs allows for generalization to reasoning with the infusion of symbolic knowledge
new combinations of parsed instructions or queries. (https://rb.gy/67wgx3). These methods require dynamic
The functional modules can be used for both dynamic knowledge-elicitation strategies integrating multimodal
and static grounding, as the modules can operate pragmatic context and interactions with domain experts
over KBs (including knowledge graphs). This aspect of in the loop to achieve dynamic grounding and alignment
neurosymbolic methods can help establish common with user intent. A version of this article with additional
ground, enabling agents to interpret and execute references is available in the IEEE Computer Society
instructions in a manner that aligns with symbolic Digital Library at https://arxiv.org/abs/2402.13290.
human reasoning. For example, Mao et al.,14 designed a
new neurosymbolic concept learner that can learn ACKNOWLEDGMENTS
embeddings for symbolic visual inputs. Learning these The authors acknowledge National Science Foundation
mappings allows for continual learning and adaptation (NSF) Award 2335967 (to Amit Sheth and Valerie L. Shalin)
of new environmental variables while preserving an and NSF Award CNS-2112471 (to Goonmeet Bajaj and
agent’s task behavior. In this manner, neurosymbolic Srinivasan Parthasarathy).