Coresense D1.3
Coresense D1.3
Coresense D1.3
0 HORIZON
2023/11/05 EUROPE
Public
D1.3
Theory of
Understanding
An initial formal definition
CORESENSE
Horizon Europe Project #101070254
HORIZON-CL4-2021-DIGITAL-EMERGING-01-11
The CORESENSE Project is a four year research project focused in the development of a technology of machine
understanding for the construction of dependable autonomous systems. It is a project funded by the European
Union.
Programme: Horizon Europe
Project number: 101070254
Project name: CoreSense: A Hybrid Cognitive Architecture for Deep Understanding
Project acronym: CORESENSE
Topic: HORIZON-CL4-2021-DIGITAL-EMERGING-01-11
Type of action: HORIZON Research and Innovation Actions
Granting authority: European Commission-EU
Project start date: 1 October 2022
Project end date: 30 September 2026
Project duration: 48 months
Most of CORESENSE results are open science but IPR regulations may still hold. Requests for permission to
reproduce this document or to prepare derivative works of this document should be addressed to the CORE-
SENSE Consortium IPR management.
Copyright © 2022-2026 by The CORESENSE Consortium.
UNIVERSIDAD POLITECNICA DE MADRID (UPM)
TECHNISCHE UNIVERSITEIT DELFT (TUD)
FRAUNHOFER GESELLSCHAFT ZUR FORDERUNG DER ANGEWANDTEN FORSCHUNG (FHG)
UNIVERSIDAD REY JUAN CARLOS (URJC)
PAL ROBOTICS SL (PAL)
IRISH MANUFACTURING RESEARCH COMPANY LIMITED BY GUARANTEE (IMR)
CZECH TECHNICAL UNIVERSITY IN PRAGE (CVUT)
For information about other CORESENSE products, reports or articles, please visit our Web site:
http://www.coresense.eu/
The CORESENSE project is funded by the EC Horizon Europe programme though grant HE
#101070254 inside the HORIZON-CL4-2021-DIGITAL-EMERGING-01-11 topic. Views and
opinions expressed in this document are however those of the author(s) only and do not
necessarily reflect those of the European Union or the Horizon Europe Programme. Nei-
ther the European Union nor the granting authority can be held responsible for them.
Executive Summary
Deliverable Approval
2023/10/22 — WPL: M.Rodriguez (UPM)
2023/10/20 — PC: R.Sanz (UPM)
2023/11/03 — QA: N.Hammoudeh (FHG)
Executive summary 3
Document Versions 4
Contents 5
List of Figures 7
List of Tables 9
1 Introduction 11
1.1 Towards Understanding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2 Context and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3 Document Structure and Content . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4 Analysis 31
4.1 Understanding and Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5
Contents
4.2 Coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2.1 Coverage of perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2.2 Coverage of the Testbeds’ Needs . . . . . . . . . . . . . . . . . . . . . . . 33
4.3 Future development of the theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3.1 Other directions of development . . . . . . . . . . . . . . . . . . . . . . . 39
4.3.2 Reification as reusable assets . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3.3 Towards a CT Definition of Understanding . . . . . . . . . . . . . . . . . . 39
Bibliography 43
1.1 Cognitive science sits at the convergence point of many disciplines, of both hu-
man and technical nature. Our theory of understanding targets the general do-
main with cross-cutting implications in subdomains. . . . . . . . . . . . . . . . . . 12
2.1 Theory development proceeds along the project in a feedback relation with testbed
development. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.1 While ”understanding” is commonly tied to language processing the deep un-
derstanding that we need for our robots is not just language, it is systemic: the
cognitive agent creates an abstract system in its head that matches some spe-
cific aspect of the reality. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
7
List of Figures
9
List of Tables
The question of ”understanding” is a central problem in cognitive science. However, with the
relevant exceptions of language and education, ”understanding” has mainly stayed as a mere
descriptive term or a topic for abstract philosophical debate. Only recently has it received at-
tention from the artificial intelligence community as a core issue for intelligent machines. A
similar analysis can be done concerning the question of ”awareness”, with the extra difficul-
ties associated to the analysis of phenomenological experience and its potential realisation in
machines.
The problems of understanding and awareness by machines are of special importance in the
case of autonomous robots because misunderstandings concerning what is going on in the
robot environment may lead to catastrophic failures in real-world deployments. Being aware of
the world and itself, understanding of orders, understanding its environment, understanding
itself, understanding others —esp. humans—, become critical capabilities for the autonomous
robot to properly complete a mission in an open, dynamic and uncertain environment.
So far, these problems have been tackled by directly engineering specific ad-hoc solutions for
concrete deployed systems. However, there is yet no solid, universally accepted ”theory of un-
derstanding and awareness” that can guide, in general, the construction of cognitive architec-
tures for autonomous robots. Only in the domain of artificial general intelligence is it possible
to find some initial attempts at such a theory.
11
Chapter 1. Introduction
Psychology
Linguistics Philosopy
Cognitive
Science
Artificial
Understanding Education
Intelligence
Automatic
Robotics
Control
Cybernetics
CS-IMG-023-U
We will not include here all the relevant text from the state-of-the-art analysis [Sanz et al., 2023],
but some of its content will be repeated here to obtain a more or less self-contained document.
The concept of understanding is a fishy target. A common view of understanding is being able
to label things in scenes: ”Car, car, car, person, traffic light”, and we can hear the environment
understanding mechanisms of an autonomous car. Some tests implicitly assert that under-
standing is the capability of responding adequately to given orders: ”Robot, bring me a glass
of water”, and the robot performs as ordered. Some authors argue that understanding means
being able to explain. ”Robot, why did you put this glass of water on the table?” and the robot
tells us that it did so due to following our orders. Other people do think that understanding is
getting useful mental representations, or reaching agreements with people and the world, or
answering questions about stories, or being aware of owned knowledge. There are so many
understandings of understanding [Mason, 2003, Baumberger, 2014].
However, what guides us in the CORESENSE project, and provides a cornerstone for the con-
tent of this document —a theory of understanding— is that we believe that:
• There is some core commonality behind all those situations because all of them show
a capability of using some information meaningfully, hence implying that there exists a
fundamental aspect of understanding that powers all of them; item that this core com-
ponent is essential for achieving higher levels of capability, performance, robustness and
resilience; and
• that such core interpretation of understanding can be captured using systematic and rig-
This document will try to provide such a definition after distilling our understanding of under-
standing as a collective that was the content of [Sanz et al., 2023]. We will propose an oper-
ational synthesis of all of them to address the needs of both robot engineers and cognitive
scientists at large.
and its mechanisation. The report focuses on a conceptual analysis and the introduction of a
new theoretical approach to understanding —the capability of envisioning of future value— and
some architectural guidelines for AIs to implement relevant mechanisms for improved under-
standing of situations that serve as a basement for robot autonomy and even future machine
consciousness.
Chapter 1: This chapter sets the context for the rest of the document.
Chapter 3: Is the main content of the document and provides the current snapshot of the
theory based on set theory.
Chapter 4: Analyses the current version (1.0) of the theory from the stated needs and consid-
ers its potential future development using category theory.
The main content of this deliverable (D1.3) is an initial presentation of the theory of understand-
ing as stated in the GA [CORESENSE Consortium and EC, 2022]. Building such a theory is going
to proceed in stages. This deliverable presents just a first release towards the final formal ex-
pression of such a theory. As stated in the GA [CORESENSE Consortium and EC, 2022], it will
be re-released in final form at the end of the project (in M48). We also expect to have non re-
leased versions of the theory during project execution. Interim releases will be made available
through the project website and the Open Science repositories.
The content of this chapter is a description of the overall strategy of the project towards the
development of the theory.
”the capability of understanding will be based on the agent having actionable, hybrid,
causal models, that can be populated, adapted, exploited, and shared with other agents”
In deliverable D1.1 [Sanz et al., 2023] we analyse some of the concepts of understanding that
have been proposed in the past in the broad domain of cognitive systems. In most of the cases
they were part of linguistic theories. In a small number of cases, they are related to Artificial
Intelligence (AI) and robotics One possible strategy was to attempt a synthesis of the most
relevant theories from the perspective of the needs of the CoreSense architecture.
For example, Zagzebski [Zagzebski, 2019] proposed that understanding is the grasp of struc-
15
Chapter 2. Strategy for building the theory
ture. The structure of an object shapes it, gives it organization and unity and this lets us see it
as a single object. When we grasp an object’s structure, we understand the object. This idea
of understanding suits ours, because our models are simplified representations of what the
agent grasps. The larger and more complex the object of understanding, the bigger the sim-
plification that may leave out of the model object aspects that may be important at different
times or for different purposes.
The synthesis is a good strategy to get at results of broader acceptance. However, in the project
we shall also follow a more pragmatic approach to understanding: it shall be useful in the de-
velopment of intelligent machines. At the end, we need an hybrid approach. The CORESENSE
theory of understanding is being developed using a strategy 2 that:
• Tries to address the multiple perspectives of understanding identified in [Sanz et al., 2023]
(when relevant).
• Is driven by the needs of the domain —cognitive robotics— and the three testbeds; it is
applied in them and gets feedback from them.
In the remaining of the chapter we will address what shall be the content of such a theory and
what are the steps that implement this strategy.
2.2 On Theories
The Theory of Understanding (ToU) shall be a theory that is both scientific and operationalis-
able. As a scientific theory, the ToU shall be a well-substantiated explanation of some aspect
of the natural world that is based on a body of evidence, observations, and experiments. This
body of evidence comes from the cognitive operation of humans and also from the cognitive
operation of robots, esp. in the domains of the testbeds. The ”aspect of the natural world”
that we are interested in, is the phenomenon of ”understanding”.
We want the ToU to be a solid theory —a robust and well-established scientific explanation—
to serve as the framework upon which scientists could build their understanding of the phe-
nomenon of understanding and pile-up solid research results, and engineers use this under-
standing in its operationalisation as applied science in building better cognitive robots.
Testability: Scientific theories are testable and falsifiable [Popper, 1959]. This means that they
can be subjected to experimentation and observation, and there should be clear criteria
that, if not met, would disprove the theory. The ability to test and potentially disprove
a theory is a fundamental aspect of the scientific method. The KPIs of our testbeds will
provide the necessary evidence.
Predictive Power: A strong scientific theory can make predictions about future observations
or experiments. These predictions should be based on the theory’s principles and should
be verifiable through empirical testing. The theory’s ability to make accurate predictions
lends further credibility to it. This is an essential aspect for a theory that is used as a
design asset in an engineering endeavour. Engineers will use cognitive patterns based on
this theory to guarantee future effectiveness in the robot systems under construction.
Scope and Explanatory Power: A scientific theory should have a broad scope, meaning it can
explain a wide range of related phenomena. The more phenomena it can explain, the
more powerful and influential the theory is considered. This is at the core of our ambition.
We expect to develop a theory that not only addresses robotic understanding but general
cognitive systems understanding.
Simplicity: Simplicity, often referred to as the principle of Occam’s razor, suggests that if two
or more theories explain the same phenomena equally well, the simpler one is generally
preferred. Simplicity makes theories more elegant and easier to work with. So far, there
is no real competing theory of understanding (i.e. in the terms stated here), nevertheless,
we will try to make the ToU simple using compact and effective abstractions. The use of
category theory is in part motivated by this search for simplicity.
Reproducibility: The results and observations that support a scientific theory should be repro-
ducible by other scientists using the same methods and conditions. This is a cornerstone
of the scientific method and ensures the reliability of the theory. This has always been
a problem in cognitive robotics and in cognitive systems in general. Replication of sys-
tems and experimental conditions is not easy. To this end, benchmarking has been used
in robotics to enhance this reproducibility. The RoboCup@Home used in WP8 specifically
addresses this problem. Concerning the robot system implementation, the model-based
methods of WP4 will provide the necessary means to achieve reproducibility.
Peer Review: Scientific theories are typically subjected to peer review, where experts in the
field assess the theory’s validity and the quality of the evidence supporting it. Peer review
is an important process for maintaining the rigor and reliability of scientific theories. The
Open Science approach of CORESENSE will specifically address this need. Besides formal
deliverables and publications we may produce specific materials for this need.
Consensus : Over time, a scientific theory gains acceptance and support from a significant
portion of the scientific community. While scientific consensus can evolve and change,
a well-established theory is generally widely accepted within the scientific community.
In the domain of psychology this is not easy. Psychological constructs suffer from the
”toothbrush” problem: no self-respecting psychologist wants to use anyone else’s. But in
rigorous science, we are sometimes forced to do so by the force of the facts. If our theory
is solid, well documented and effective, consensus will emerge. This is in fact, one of the
major challenges for the ToU and one of the major motivations for seeking a rigorous
approach.
Understandability : As quantum mechanics demonstrate, understandability is not a necessary
characteristic of scientific theory. However, we would like our theory to be understand-
able by a broad community of scientists and engineers (see Figure 1.1). This may require
from us the expression of the theory in different ways that can reach these people. This
is also important for the effectiveness of the work in WP5 and WP9-
It’s important to note that scientific theories are not absolute truths but are our best current
explanations based on the available evidence. They are subject to revision or even replacement
as new evidence and understanding emerge. The process of forming and refining scientific
theories is an ongoing and dynamic aspect of the scientific endeavor.
theory.
Theory Development
Figure 2.1: Theory development proceeds along the project in a feedback relation with testbed
development.
This chapter contains the first formal statement on the theory of understanding. The core seed
element of the ToU is a definition of what is understanding. This will be the central concept that
provides a ground to the theory.
This chapter also describes the expected content for a full-fledged theory and defines a set of
auxiliary elements that will be used in the definition of understanding. The chapter ends with
some considerations concerning the core definition.
• A closed system. A closed system is a system that is completely isolated from its environ-
ment. In this case, a phenomenon would be a part of the system (subsystem). In order
to fully understand a system, it would be necessary to understand all the elements and
relations that constitute that system, i.e., all the phenomena that can appear in the sys-
tem. Although in engineering systems are open, sometimes it is convenient to consider
them as closed.
• An open system. An open system is a system that has flows of information, energy, and/or
matter between the system and its environment and which adapts to the exchange. In
this case, a phenomenon would be constituted by elements of the system and the en-
vironment and their relations. This is the more generic assumption, and it covers cases
where there are different systems interacting among themselves and with the environ-
ment.
23
Chapter 3. Towards a Theory of Understanding
Figure 3.1 shows a general open system (with interrelated elements) with several intercon-
nected subsystems and where every subsystem is also composed of interrelated elements.
Instance: particular event, situation, occurrence of the system (and the environment).
certain volumes of water. In this case, the subject of understanding is the human user, and the
phenomenon to be understood is how the volume of water changes with respect the water
inputs and outputs.
Example 2: Thermostat. The system includes the room where the temperature is measured,
the thermostat mechanism (with sensors, actuators and display for temperature adjustment by
the user) and the conditioning system for heating or cooling the room. In this case, the subject
of understanding can be the thermostat mechanism and the phenomenon to understand is the
change of the room temperature in relation to the on/off actions of the climate control system.
How understanding is enabled
According to the analysis of the state of the art, a subject is considered to understand if: 1) it can
interpret and structure the phenomena (relation between objects) [Boon, 2009], a theory ex-
ists [de Regt, 2009], 2) it grasps the relationships and has cognitive control [Hills, 2016],control
the phenomenon [Ylikoski, 2009], 3) it grasps structure [Zagzebski, 2008], 4) it grasps how the
various elements in a body of information are related to each other [Kvanvig, 2003], 5) it has
mental representations that encode the right kind of dependence relations manifest in the
system [Lombrozo and Wilkenfeld, 2019], 6) it has a set of models [Thórisson et al., 2016].
All of them share the need to have a representation of elements and their relations (a system
structure). This fits with what is generally accepted as the definition of a model, a model en-
codes and represents the elements and relations that are modelled. Thus, understanding is
enabled by the existence of a set of models of the phenomenon to be understood.
The use of understanding
It is widely accepted that understanding a phenomenon should enable several cognitive abili-
ties, among which are usually cited: reasoning about questions, making inferences (and coun-
terfactual inferences), making predictions, recognising consequences of a phenomenon, giv-
ing explanations, answering What If questions, anticipating behavior or understanding similar
phenomena.
All of them could be summarised in a way that mainly consists of making inferences (including
predictions and retrodictions), giving explanations, sharing knowledge with other agents, and un-
derstanding similar phenomena. We share this common vision, but in the case of explanation,
although it is a desirable property, we concur with recent opinions arguing that understanding
• Inference: Drawing a conclusion about something by using information that you already
have about it.
• Prediction: A statement of what will happen in the future (or to utilise present informa-
tion to infer a future event or state of affairs).
• Retrodiction: To utilise present information to infer a past event or state of affairs.
These are just examples of model use. The features presented are essentially related to the
nature of the models. Models have a key property: they are representations of something that
enable several modes of understanding by means of a set of possible model exploitations. The
work in WP3 will specifically address the different classes of exploitations in the realisation of
cognitive functions.
Subject is the individual (human or machine) that performs the action of under-
standing. Examples: aerial robot, social robot, physicist, economist, sociologist.
Sound inferences: are those that follow the modeling relation, i.e., that implica-
tions in the model correspond to causality in the system (modelled). The conclusions
inferred from premises using the model correspond to object states of the modelled
system.
However, this definition may seem too restrictive, specifically addressing teleological agents.
This variant may match better the needs of an agent as related with its value system. This will
be addressed in the future Theory of Awareness (ToA).
Figure 3.3: Left figure shows partial understanding of the phenomena as there is no model of
the interaction between them, the right figure shows a model covering also the interaction
definition, complete understanding would imply having a complete and perfect model of the
phenomenon that is being understood. In general, a model cannot be complete and perfect,
as that would be the modeled object itself. A model is a representation of the modeled system
that captures (part of) its structure through some encoding mechanisms. Thus, understanding
is always limited in breadth and depth. The degree of understanding is defined by the quality of
the models, that is, by their completeness and accuracy.
Figure 3.3 shows two phenomena that occur in a system (red and blue). On the left (scenario
1), the subject has a model of both phenomena and can understand both of them, but there
are some consequences that are not understandable, as the subject does not have a model of
how these two phenomena interact (purple relation). On the right (scenario 2), the subject has
a model comprising both phenomena and their interactions, so higher-level understanding can
be achieved in this case.
• The definition is general enough to cover particular types (or cases) of understanding.
It covers understanding when time is of importance (for example, understanding a sit-
uation) and when it is not (for example, understanding a structure, such as a layout or
a map). The definition covers mechanistic and functional understanding, as both can be
achieved having the right type of models. These particular types of understanding will be
developed in interim versions of this document in relation to the development of testbed
implementations.
• We take the following as a definition of “A subject understands a system”: A subject un-
derstands a system if the subject understands all the relevant phenomena that can occur
in the system for a specific scope under study (cf. the previous discussion on mission-
oriented understanding).
• The definition serves for real or imaginary systems as long as they are defined.
• Complete understanding of a real system is not possible in general for real world systems.
This would imply having a complete and perfect model of all the phenomena that can
happen in a system. In principle, a model cannot be complete and perfect (as this would
be the modeled system); however, this may be the case for formal abstract systems that
can be isomorphic to their models (e.g. we can understand conics geometry by modeling
it with algebraic equations).
• Answering questions about a system includes making retrodictions and predictions of the
phenomenon.
• Notice that the set of models includes any type of model (physical, mathematical, and
computational —if considered different from mathematical— because this is the core
of the hybrid nature of the CoreSense architecture) as long as they provide the abilities
above-mentioned. For example, a map is a physical model that allows one to answer
questions about which route is the shortest (making predictions) or how someone can be
at some point starting from another (retrodictions).
• Other features or properties are provided through understanding, like, for example, goal
achievement or making inferences about analogous phenomena. Although understand-
ing is necessary for these additional features, it is not sufficient. In the case of goal
achievement, meaning (in the sense presented by [Thórisson et al., 2016]) is also needed
(this will be the basis for what in this project is called situational awareness).
• Understanding is enabled by a set of models. It happens that many times the concrete
model of the observed phenomenon alone enables only low-level understanding. In or-
der to achieve higher levels of understanding, this model has to be integrated with other
(existing) models, and, in this way, more consequences can be predicted (or more diag-
nosis can be made).
• A bad model implies that the phenomenon is either not understood or misunderstood.
• It can happen that a subject has a model but the model has missing/uncertain data, in
that case it is possible for the subject to understand, but the inferences (predictions, etc.)
done will be uncertain, only qualitative, vague, or probabilistic predictions (different fu-
ture scenarios) can be done in that case.
• Maybe a model cannot be executed on time (lack of time or lack of capabilities of the
subject); in that case, the derivation of timely actions cannot be based on deep under-
standing, or it shall happen in a qualitative way. We may develop some form of scalable,
anytime understanding to address this issue.
• The degrees of understanding derived from the limitations of the models are not the only
limiting factors of understanding. The types of inferences applicable to the model also
determine different degrees of understanding. For example, an agent can be able to do
predictions but not retrodictions due to limitations of its cognitive engines.
To understand the world the way humans do, agents will need
the ability to interact with it like we do - and this is exactly why
research in causal cognition should be tightly linked with efforts
to design robots that can learn and utilize causal information
about their environments. [Stocking et al., 2022]
In this chapter we provide an initial analysis of the initial theory of understanding presented in
this document in terms of both the project objectives and the concrete systems to be built:
• We analyse to what extent this theory offers some novelty as a theoretical contribution
to cognitive science at large.
• We analyse to what extent this theory covers the different perspectives identified in
[Sanz et al., 2023] both from the theoretical point of view and the testbeds point of view.
• We analyse to what extent this theory is seen as promising to be useful for robots to
perform their duties and for their engineers to build such robots. We also analyse to what
extent it is useful as a general theory of understanding in the wider domain of cognitive
science.
• Finally, we analyse the potential future development of the theory using other formalisms
suitable to attain the project objectives.
The models shall not only be actionable, they shall also be integrable and multiplely
actionable when integrated.
31
Chapter 4. Analysis
The affirmation that all cognition is based on models from the very bottom, provides a
conceptual and operational structure for their effective integration even when they are
heterogeneous. For example, you may have a robot able to see (having vision models) and
able to hear (having audio models) but the proper handling of perceptual multimodality re-
quires their deep integration. This can be in an algorithm-centric, ad-hoc manner, or can be
based on model integration based on a formal systemic background framework.
In essence, besides the common use of the term ”understanding” in human language, under-
standing is not just language, but deep systemics: 1) on the side of the object, 2) on the side
of the subject, and 3) on the whole, as a system-of-systems (see Figure 4.1).
4.2 Coverage
The theory we are seeking shall be operational in the construction of better machines, but at
the same time shall provide a glimpse into a theory of understanding in general cognition.
1. System: The main system in this scenario is the inspection area equipped with a high-
definition camera and advanced inspection algorithms.
The insights derived from the understanding process can be leveraged by human operators to
adjust manufacturing parameters effectively, reducing the likelihood of defects during reiter-
ation or when producing a new batch; and more straightforward to decide the next action to
take depending on the piece status.
UC-MT-1.5: Human Robot Coordination and Cooperation
Following the understanding definition from Section 3.2, we shall identify the following as-
pects:
1. System: The main system is the cobot which interacts with a human operator and the trays
and parts for assembly.
• Predictions regarding how this distance is likely to change based on both human
behavior and the cobot’s actions.
• The predictions above can later be used for dynamic adjustment of operation zones
in response to real-time conditions and safety requirements.
The insights derived from the understanding process should provide a comprehensive determi-
nation of the operation zones responsible for the cobot’s speed regulations. The propositions
derived from understanding may contribute to enhancing the efficacy of the assembly process
while maintaining a high level of safety. It ensures that the cobot’s speed is synchronized with
the human’s position, speed, and intentions, thereby optimizing overall performance.
To attain such a comprehensive understanding, the system shall handle some knowledge on as-
pects such as the reliability of laser scanners and the functioning of 2D positioning algorithms,
which are essential for accurately estimating the distance between the human and the robot.
It also involves the development of a human intentional model, which considers factors such
as its speed and the current assembly status to determine whether the human co-worker in-
tends to monitor the process or retire a completed piece. Additionally, knowing the principles
of operation zones is crucial, as it determines when and for how long adjustments should be
made to ensure optimal operation.
UC-MT-2.1: Driving between workcells
During the execution of this scenario, the following questions emerge:
Gaining a solid understanding of this process can greatly improve its performance. According
to the definition discussed in Section 3.2, we shall identify the following aspects for under-
standing:
1. System: The system in this scenario is the mobile robot with the navigation task. How-
ever, in this case, the complex environment in which the system is deployed is especially
relevant. It is an industrial plant where paths may be blocked or restricted, there may
be humans walking and moving forklifts carrying heavy pieces. Moreover, there may be
large parts added or removed from the surroundings.
2. Phenomenon to understand: The primary phenomenon in this scenario is the process of
localization in a complex environment with static objects changing its position, dynamic
obstacles, and restricted areas.
3. Provided understanding: In this scenario, our primary goal is to improve the navigation
process while prioritizing safety. Specifically, our understanding aims to provide valuable
insights into the following aspects:
• Enhance scan-based localization by incorporating semantic context information.
• Improve robot control to prevent erratic movements.
• Identify how features of dynamic obstacles can affect the system and respond ac-
cordingly, taking into account their nature, expected speed, and trajectory.
To attain this degree of understanding, the system shall have knowledge, at least to some ex-
tent, about the different types of obstacle and their expected behavior, particularly when they
involve humans walking or forklifts carrying heavy parts. Furthermore, it requires having infor-
mation about the workspace and models that can effectively align with the robot’s localization
and planning algorithms. Moreover, the robot controller should be capable of integrating all
these factors to ensure a smooth and predictable robot motion. Such capabilities can signif-
icantly enhance navigation tasks, particularly within the complex environment of an aircraft
assembly facility.
1. System: The system in this example is the physical body of the aerial robot that includes
the different hardware components such as rotors, cameras, etc. In particular, the electric
power is provided by a battery which is equiped with sensors that measure periodically
its charge.
2. Subject of understanding. The subject of understanding is the aerial robot that performs
the inspection task.
Checking periodically assumptions about the battery power consumption allows the robot to
continue its flight more safely. If any of these assumptions fail, an alarm is raised that can be
used by the robot to change its flight plan. For example, if the battery has discharged faster
than expected, but there is still enough charge left to complete part of the mission, the flight
plan can be modified to cover that part and return to the starting point before the battery is
completely discharged.
The lack of such a battery charge understanding mechanism would mean that the robot would
have to make drastic decisions to ensure safe flight. For example, the robot could have a reac-
tive mechanism based on a safety threshold that lands the robot immediately when the charge
drops below the threshold. A limitation of this solution is that it may cause the robot to land far
away from the return point where the human operator is located and, in addition, an accident
may occur due to a forced landing in an unsafe place.
As an additional benefit of using the battery understanding mechanism, the user can obtain
explanations in order to better understand the reasons behind the decisions made. With such
information the user will be able to determine that the vehicle’s battery needs to be recharged
or, on the contrary, it should be marked as faulty due to malfunctioning.
UC-IT-4.3: Spatial distribution of the components of the photovoltaic plant
1. System: The system in this example includes the components of photovoltaic plant (e.g.,
panels arranged in sets of lines, etc.), the aerial robots that perform the inspection and
the humans involved in the inspection task (e.g., operators in charge of robots).
2. Subject of understanding. The subject of understanding is a specific aerial robot that per-
forms the inspection task.
4. Provided understanding. The availability of the understanding mechanism allows, for ex-
ample, mobile robots to act by planning motion actions to reach user-specified goals, en-
visioning different courses of action and choosing the best ones. As specific advantages,
the spatial understanding process presented in this example allows for greater location
accuracy due to the redundancy provided by multiple aerial vehicles (derived from col-
lective SLAM), and to communicate information to the user with more understandable
concepts (panels, panel lines, etc.), rather than simply through geographic coordinates
(derived from semantic SLAM).
1. System: The system in this example includes the components of photovoltaic plant (e.g.,
panels arranged in sets of lines, etc.), the aerial robots that perform the inspection and
the humans involved in the inspection task (e.g., operators in charge of robots).
2. Subject of understanding. The subject of understanding is the aerial robot that performs
the inspection task.
3. Phenomenon to understand. In this example, the subject tries to understand camera image
defects.
4. Provided understanding. The lack of an automatic understanding mechanism for camera
image defects would require constant manual monitoring by the flight operator to ver-
ify the quality of the images. This alternative solution would significantly increase the
cognitive load of the human operator during flight monitoring and could also result in
the acquisition of faulty images due to human error due to attention lapses. As an addi-
tional benefit of using the image quality understanding mechanism, the user can obtain
more precise explanations of what is happening (e.g. types of camera defects) in order to
understand the reasons behind the decisions made. With such information the user can
determine whether the camera is operating correctly or whether it should be marked as
faulty for future calibration or replacement.
1. System: The user, the Tiago robot and the office space where the interaction takes place.
2. Phenomenon to understand: the phenomenon to understand is the intention of the hu-
man, the robot has a model of user movement, face, or whatever and uses it to predict
that the user wants to interact.
3. Provided understanding: in this scenario, a subject with understanding (the robot):
• Integrates multiple sensory input (readings about the human location and about
self-localization) in its model (lay out of the office and potential relevant locations
for humans in it) to reliable assign interaction intention to the human (which then
becomes a user for the robot).
• Executes appropriate actions, both external (motions scially compliant with facing
someone for interaction) and internal (predictions, belief’s updates), and monitors
its expected effect for the human to become aware of the robot awareness of the
human intention to interact
This scenario requires the subject of understanding to possess models of the office and ex-
pected human use of its areas, as well as a theory of mind and social rules for interaction, and
the ability to use them to predict and to evaluate the outcomes of its actions.
UC-ST-2.2: Implicit instruction
Following the understanding definition from Section 3.2, we identify the following aspects:
1. System: The user, the Tiago robot and the office space and objects where the interaction
takes place.
2. Phenomenon to understand: the primary phenomenon is the integration of the input from
the user in the internal model of the subject as a goal. There are two objects of under-
standing: the physical objects related to the task (object and locations), and the belief
about the goal of the user.
3. Provided understanding: in this scenario, a subject with understanding (the robot):
• integrates multi-modal information (speech and non-vernal behavior) into its inter-
nal model of the desires of the user.
• explains the result of that integration to the user using one or a combination of
channels (display, voice, non-verbal/pointing)
This scenario requires the robot to posses knowledge about the relationship between the ob-
ject and the locations in the environment, as well as the value of the object for the user, the
user plausible desires, and relationship between that knowledge and multi-modal information
(speech and non-vernal behavior).
UC-ST-3.2: Obstacles in my way
Following the understanding definition from Section 3.2, we identify the following aspects:
1. System: The user, the Tiago robot and the office space where the interaction takes place.
2. Phenomenon to understand: the primary phenomenon is the use of models to support the
planning and execution of actions to achieve the goal by generating meaning of interme-
diate states.
3. Provided understanding: in this scenario, a subject with understanding (the robot):
• Moves according to social rules.
• Decides appropriately whether to: a) move towards the humans blocking the way in
the hope they will leave way, b) request verbally from the humans to move aside, c)
request humans to reach the object and hand it over.
• Updates its plan based on the observed result of its actions in relation with its plan
and its belief about the humans desires.
In this scenario the robot shall have knowledge about human behavior and social rules, in ad-
dition to navigation skills.
• Artificial General Intelligence (AGI). The central topics of CORESENSE directly touch on
some of the central aspects of the Artificial General Intelligence research programme. We
will monitor their contributions and see to what extent they are aligned with ours. How-
ever, the AGI programme is essentially focused on achieving human-level intelligence,
while we try to remove any anthropomorphic bias.
• Metacognition. The ”cognition are models” adage can be applied to cognition itself. The
agent can model its own cognitive processes giving a path to higher-order thought pro-
cesses and self-awareness. This is an essential property for resilient, adaptive systems.
• Uncertainty. The modelling relation shall be able to handle the inherent uncertainty
present in the world and injected into the perceptual process.
• Explore the spectrum of forms of model use. Inference, as the exploitation of the deep
relation of the model with the causality of the object modelled, is not the only form of
model exploitation. A deeper understanding is achieved if the model can be used in other
ways (e.g. shared, merged, compressed, verified, etc.). Most of these ”other” forms of ex-
ploitation will be more related to the machinery of understanding than to the modelling
relation itself.
• Investigate the lifecycle of models. These other forms of model use lead to a broader
consideration of the life cycle of the mental models used by an agent. How are the mod-
els created? When shall they be eliminated? This points to the classic problems of truth
maintenance and nonmonotonic reasoning in logic-based systems.
System conceptualization: The theory will be translated into WP1 ontologies to be used in
the conceptualization of new systems.
System design: The theory will be translated into SysMLv2 models to be used in the systems
engineering processes supported by the WP4 tooling.
System implementation: The theory will be translated into ROS 2 packages to be used in the
robotics community addressed by WP5.
For example, following the human-readable definitions in Section 3.2, we can start by finding
three main categories. The Category of Systems (S), the Category of Behaviors (B) and the
Category of Values (V ).
We define a partition of the category of systems, a smaller category S. This category includes
the objects and morphisms of interest for a particular aspect, stakeholder, or situation, as de-
fined in Section 3. A functor M , establishes a map between the observed system and the model
of it.
M : Si → MSi
The model MSi is the result of applying a model functor M to the system of interest category
Si . Note that models —i.e. the modeling relation— are seen as functors because they provide
a notion of sameness between two categories [Rupnow et al., 2023]. The MSi category can be
seen as a simplified version of the system, complete enough for S partition.
In the coming months we will explore this approach by trying to identify the proper categories
and functors for both general cognitive aspects —e.g. self-awareness mechanisms— and also
for the three testbeds. We will also evaluate the use of CT in terms of both rigour and effective
use of the formalism in both engineering and runtime (e.g. by direct mapping of categories into
ROS-friendly datatypes).
AI Artificial Intelligence. 15
category theory Category theory is a general theory of mathematical structures and their re-
lations. 17, 20
cognitive pattern A reusable partial design to endow a system with a concrete cognitive ca-
pability. 17
doxastic concepts or theories related to beliefs, including the study of belief systems, their
formation, justification, and evaluation. 32
epistemic concepts or theories related to knowledge, including the study of justification, use,
transfer and truth. 32
robotics An engineering discipline that deals with the mechanical structures and control sys-
tems of machines that move and act in the world. 15
41
Glossary
[Awodey, 2010] Awodey, S. (2010). Category Theory. Oxford Logic Guides. Oxford: Oxford
University Press, 2 ed. edition.
[Barr and Wells, 1990] Barr, M. and Wells, C. (1990). Category Theory for Computing Science.
Prentice-Hall International series in Computer Science. Prentice Hall.
[Baumberger, 2014] Baumberger, C. (2014). Types of understanding: Their nature and their
relation to knowledge. Conceptus, 40.
[Boon, 2009] Boon, M. (2009). Understanding in the Engineering Sciences: Interpretive Struc-
tures. In de Regt, H. W., editor, Scientific Understanding: Philosophical Perspectives, pages
249–270. University of Pittsburg Press.
[CORESENSE Consortium and EC, 2022] CORESENSE Consortium and EC (2022). CORESENSE
Grant Agreement. Technical report, The CORESENSE Consortium and the European Com-
mission Horizon Europe Programme.
[Craik, 1943] Craik, K. J. W. (1943). The Nature of Explanation. Cambridge University Press.
[de Regt, 2009] de Regt, H. W. (2009). The Epistemic Value of Understanding. Philosophy of
Science, 76(5):585–597.
[Fong and Spivak, 2019] Fong, B. and Spivak, D. I. (2019). Seven Sketches in Compositionality: An
Invitation to Applied Category Theory. Cambridge University Press.
[Hobbes, 1651] Hobbes, T. (1651). Leviathan, or The Matter, Forme, and Power of a Common-
wealth Ecclesiasticall and Civill. James Thornton, Oxford.
[Kozen et al., 2006] Kozen, D., Kreitz, C., and Richter, E. (2006). Automating proofs in category
theory. In Furbach, U. and Shankar, N., editors, Automated Reasoning, pages 392–407, Berlin,
Heidelberg. Springer Berlin Heidelberg.
[Kvanvig, 2003] Kvanvig, J. L. (2003). The Value of Knowledge and the Pursuit of Understanding.
Cambridge Studies in Philosophy. Cambridge University Press.
[Lawvere and Schanuel, 1997] Lawvere, F. and Schanuel, S. (1997). Conceptual Mathematics: A
First Introduction to Categories. Cambridge University Press.
[Lloyd, 2021] Lloyd, K. A. (2021). Category theoretic foundations for systems science and en-
gineering. In Metcalf, G. S., Kijima, K., and Deguchi, H., editors, Handbook of Systems Sciences,
pages 1227–1249. Springer Singapore, Singapore.
43
Bibliography
[Lombrozo and Wilkenfeld, 2019] Lombrozo, T. and Wilkenfeld, D. (2019). Mechanistic versus
Functional Understanding. In Grimm, S. R., editor, Varieties of Understanding, chapter 11,
pages 209–230. Oxford University Press.
[MacLane, 1998] MacLane, S. (1998). Categories for the Working Mathematician. Graduate Texts
in Mathematics. Springer New York, NY.
[Medawar, 1984] Medawar, P. B. (1984). Pluto’s Republic. Oxford University Press, Oxford.
[Norman, 1980] Norman, D. A. (1980). Twelve Issues for Cognitive Science. Cognitive Science,
4:1–32.
[OMG, 2022] OMG (2022). Omg systems modeling language (sysml®) version 2.0 beta 1. part
1: Language specification. OMG Draft Adopted Specification ptc/2023-06-02, Object Man-
agement Group.
[Popper, 1959] Popper, K. (1959). The Logic of Scientific Discovery. Basic Books.
[Rupnow et al., 2023] Rupnow, R., Randazzo, B., Johnson, E., and Sassman, P. (2023). Same-
ness in Mathematics: a Unifying and Dividing Concept. International Journal of Research in
Undergraduate Mathematics Education, 9(2):398–425.
[Sanz et al., 2023] Sanz, R., Rodriguez, M., Molina, M., Zamani, F., Aguado, E., Miguel Fernán-
dez, D. P. a. G., and Hernandez, C. (2023). State of the art and needs for a theory of under-
standing and awareness. Deliverable D1.1 CS-026, The CORESENSE Project.
[Spivak, 2014] Spivak, D. I. (2014). Category Theory for the Sciences. {MIT} Press.
[Stocking et al., 2022] Stocking, K. C., Gopnik, A., and Tomlin, C. (2022). From Robot Learning
To Robot Understanding: Leveraging Causal Graphical Models For Robotics. In Faust, A., Hsu,
D., and Neumann, G., editors, Proceedings of the 5th Conference on Robot Learning, volume
164 of Proceedings of Machine Learning Research, pages 1776–1781. PMLR.
[Thórisson et al., 2016] Thórisson, K. R., Kremelberg, D., Steunebrink, B. R., and Nivel, E. (2016).
About Understanding. In Steunebrink, B., Wang, P., and Goertzel, B., editors, Artificial Gen-
eral Intelligence. Proceedings of the 9th Conference on Artificial General Intelligence (AGI 2016),
pages 106–117.
[Wikipedia contributors, 2023] Wikipedia contributors (2023). Theory — Wikipedia, the free
encyclopedia. https://en.wikipedia.org/w/index.php?title=Theory&oldid=
1168082115. [Online; accessed 18-October-2023].
In this Appendix we provide an introduction to Category Theory (CT) for robotics. CT allows us
to be completely precise about otherwise informal concepts [Kozen et al., 2006].
Note that this annex is just a basic description for robot builders; for further reading, refer
to classics like [Lawvere and Schanuel, 1997], [MacLane, 1998], [Awodey, 2010] from a mathe-
matical perspective, [Barr and Wells, 1990] from a computing perspective, and [Spivak, 2014],
[Fong and Spivak, 2019] for a scientific application perspective with a strong mathematical fla-
vor.
A.1 Category
A category C, is a collection of elements with relations between them. It constitutes an ag-
gregation of objects with an imposed structure [Lloyd, 2021]. To specify a category, we need
three constituents:
• identity: for every object X ∈ Ob(C), there exists an identity morphism idX : X → X .
• unitality: for any morphism f : X → Y , the composition with the identity morphisms at
each object does not affect the result, idX ◦ f = f and f ◦ idY = f .
A simple example is the Set category, in which objects are sets, morphisms are functions be-
tween sets, and a composition operation is the composition between its functions.
A.2 Functor
A functor F , is a map between two categories C, D. It assigns objects to objects and morphisms
to morphisms, preserving identities and composition properties. Functors preserve structures
when projecting one category inside another.
Diagram A.1 represents a functor F : C → D, mapping between a category C composed of
C0 , C1 ∈ Ob(C) with morphism f : C0 → C1 , and a category D composed of D0 , D1 , D2 ∈
47
Appendix A. Category Theory for Roboticists
Ob(D) with morphisms g1 : D0 → D1 and g2 : D1 → D2 . Functors map are the dashed blue
arrows.
D0
g1
C0 D1 (A.1)
f g2
C1 D2
Functors can map between the same types of category, such as Set → Set, or between differ-
ent categories, such as Set → Vect, between the category of sets and the category of vector
spaces.
C α D (A.2)
To specify a natural transformation, we define a morphism αc : F (c) → G(c) for each object
c ∈ C, such that for every morphism f : c → d in C the composition rule αd ◦ F (f ) = G(f ) ◦ αc
holds. This condition is often expressed as the commutative diagram shown in Diagram A.3,
where the natural transformation morphisms are represented as dashed arrows. This means
that the projection of C in D through F can be transformed into projections through G. The
commutative condition implies that the order in which we apply the transformation does not
matter.
αc
F (c) G(c)
F (f ) G(f ) (A.3)
αd
F (d) G(d)