2015 - Modeling Motivation in MicroPsi 2
2015 - Modeling Motivation in MicroPsi 2
Artificial
General Intelligence
8th International Conference, AGI 2015
Berlin, Germany, July 22–25, 2015
Proceedings
123
Lecture Notes in Artificial Intelligence 9205
Artificial
General Intelligence
8th International Conference, AGI 2015
Berlin, Germany, July 22–25, 2015
Proceedings
123
Editors
Jordi Bieger Alexey Potapov
Reykjavik University Saint Petersburg State University
Reykjavik of Information Technologies,
Iceland Mechanics and Optics
St. Petersburg
Ben Goertzel Russia
Hong Kong Polytechnic University
Hong Kong SAR
Almost exactly 60 years ago, in the summer of 1955, John McCarthy coined the term
“artificial intelligence” (AI) to refer to “the science and engineering of making intelli-
gent machines” in a proposal for a summer research project at Darthmouth College. The
subsequent Darthmouth Conferences of 1956 are often credited with the creation of the
field of AI. But as the problem proved much more difficult than anticipated, disillu-
sionment set in. The goal of creating machines that could think at a level comparable to
humans was set aside by many, in favor of the creation of “smart” applications that were
highly successful in specialized domains. Since then “AI” and “narrow AI” have
become almost synonymous and the development of systems showing more general
intelligence in a wide variety of domains was seen as unattainable.
But after having been largely ignored for many decades, the last ten years have seen
a small resurgence in the pursuit of what we now call artificial general intelligence
(AGI). While the contributions of narrow AI to science and society are undeniable,
many researchers were frustrated with the lack of progress toward the larger goal of AI.
Armed with novel technology and ideas, a new optimism has taken hold of the
community. Creating thinking machines may be a daunting task, but many people
today believe that it is not impossible, and that we can take steps toward that goal if we
keep our eye on the ball.
The AGI conference series, organized by the AGI Society, has been the main venue
for bringing together researchers in this re-emerging field. For the past eight years it has
facilitated the exchange of knowledge and ideas by providing an accessible platform
for communication and collaboration. This volume contains the research papers
accepted for presentation at the Eighth Conference on Artificial General Intelligence
(AGI-15), held during July 22–25, 2015, in Berlin. A total of 72 research papers were
submitted to the conference, and after desk rejecting 14 (19 %), each paper was
reviewed by at least two, and on average 2.93, Program Committee members. We
accepted 23 papers for oral presentation (32 %) as well as 19 posters (26 %), of which
one was withdrawn.
In addition to these contributed talks, the conference featured Jürgen Schmidhuber,
director of the Swiss AI lab IDSIA in Lugano, and Frank Wood, associate professor at
the University of Oxford, who gave invited keynote speeches on “The Deep Learning
RNNaissance” and probabilistic programming with the Anglican language. José
Hernández-Orallo, professor at the Polytechnic University of Valencia, gave a tutorial
on the evaluation of intelligent systems. Another tutorial was given by Alexey Potapov,
professor at the ITMO University and St. Petersburg State University, on the minimum
description length principle. A third tutorial, given by Nil Geisweiler, Cosmo Harrigan
and Ben Goertzel, described how to combine program learning and probabilistic logic
in OpenCog. Martin Balek and Dusan Fedorcak presented a visual editor for designing
the architecture of artificial brains. Finally, the conference also featured workshops on
Socioeconomic Implications of AGI and on Synthetic Cognitive Development and
VI Preface
Organizing Committee
Ben Goertzel AGI Society, USA
(Conference Chair)
Joscha Bach MIT and Harvard University, USA
Matthew Iklé Adams State University, USA
Jan Klauck Austrian Space Forum, Austria
(Local Chair)
Program Chairs
Jordi Bieger Reykjavik University, Iceland
Alexey Potapov AIDEUS and ITMO University, Russia
Program Committee
Bo An Nanyang Technological University, China
Itamar Arel University of Tennessee, USA
Joscha Bach MIT and Harvard University, USA
Tarek Besold University of Osnabrück, Germany
Cristiano Castelfranchi Institute of Cognitive Sciences and Technologies, Italy
Antonio Chella University of Palermo, Italy
Blerim Emruli Luleå University of Technology, Sweden
Stan Franklin University of Memphis, USA
Deon Garrett Icelandic Institute for Intelligent Machines, Iceland
Nil Geisweiller Novamente LLC, USA
Helmar Gust University of Osnabrück, Germany
José Hernández-Orallo Polytechnic University of Valencia, Spain
Bill Hibbard University of Wisconsin–Madison, USA
Marcus Hutter Australian National University, Australia
Matthew Iklé Adams State University, USA
Benjamin Johnston University of Sydney, Australia
Cliff Joslyn Pacific Northwest National Laboratory, USA
Randal Koene Carboncopies.org, USA
Kai-Uwe Kühnberger University of Osnabrück, Germany
Shane Legg Google Inc., USA
Moshe Looks Google Inc., USA
Maricarmen Martinez University of Los Andes, Colombia
Amedeo Napoli LORIA Nancy, France
Eric Nivel Icelandic Institute for Intelligent Machines, Iceland
VIII Organization
Additional Reviewers
Mayank Daswani Australian National University, Australia
Tom Everitt Stockholm University, Sweden
Matthias Jakubec Vienna Technical University, Austria
Jan Leike Australian National University, Australia
Lydia Chaido Siafara Vienna Technical University, Austria
Qiong Wu Nanyang Technological University, China
Steering Committee
Ben Goertzel AGI Society, USA (Chair)
Marcus Hutter Australian National University, Australia
Contents
A New View on Grid Cells Beyond the Cognitive Map Hypothesis . . . . . . . 283
Jochen Kerdels and Gabriele Peters
Joscha Bach()
1 Introduction
MicroPsi [1] is an architecture for Artificial General Intelligence, based on a
framework for creating and simulating cognitive agents [2]. Work on MicroPsi started
in 2003. The current version of the framework, MicroPsi2 [3], is implemented in the
Python programming language and may interface with various simulation worlds,
such as Minecraft (see [4]). MicroPsi agents are hierarchical spreading activation
networks that realize perceptual learning, motor control, memory formation and
retrieval, decision making, planning and affective modulation.
One of MicroPsi’s main areas of research concerns modeling a motivational
system: whereas intelligence may be seen as problem solving in the pursuit of a given
set of goals, human generality and flexibility stems largely from the ability to identify
and prioritize suitable goals. An artificial system that is meant to model human
cognition will have to account for this kind of autonomy. Our solution does not
presuppose any goals, but instead a minimal orthogonal set of systemic needs, which
are signaled to the cognitive system as urges. Goals are established as the result of
learning how to satisfy those needs in a given environment, and to avoid their
frustration. Since needs constantly change, the system will have to reevaluate its goals
and behaviors continuously, which results in a dynamic equilibrium of activities.
While cognition can change goals, expectations of reward and priorities, it cannot
directly influence the needs itself.
MicroPsi’s model of motivation [5] has its origins in the Psi theory [6, 7] and has
recently been extended to account for a more detailed understanding of personality
traits, aesthetic appreciation and romantic affection. While current MicroPsi2 agents
do not implement all aspects of the motivational system (especially not the full set of
need dimensions), the model is general enough to be adapted to a variety of
applications, and has been integrated into other cognitive architectures, such as
OpenCog [8]. In the following, I will focus especially on the discussion of concepts
that can be transferred into other systems.
Urge signals fulfill several roles in the cognitive system (figure 1). Their intensity
governs arousal, execution speed and the resolution of cognitive processing
(modulation). Changes in the urges indicate the satisfaction or frustration of the
corresponding needs. Rapid changes are usually the result of an action of the agent, or
of an event has happened to the agent. These changes are indicated with pleasure
signals (corresponding to satisfaction) or displeasure signals (frustration). Pleasure
and displeasure signals are used as reinforcements for motivational learning (figure
2). Each type of signal is connected to an associator. Associators establish a con-
nection between two representations in MicroPsi; here, between the urge signal and
the current situation or action. Furthermore, learning strengthens associations between
the current situation and those that preceded it. In combination with mechanisms for
Modeling Motivation in MicroPsi 2 5
forgetting and memory consolidation, this results in learning behavior sequences that
end in goal situations. Goals are actions or situations that allow satisfying the urge
(appetitive goals), or that threaten to increase the urge (aversive goals).
A motive is an urge that has been associated with a particular goal. Each action of a
MicroPsi agent is based on an active motive, i.e. directed on reaching an appetitive
goal, or avoiding an aversive goal.
The association between urges and goals allows the agent to identify possible
remedies whenever the need arises in the future, and prime its perception and memory
retrieval towards the currently active urges. Most importantly, urges signal the current
demands of the system, and thereby inform decision-making.
3 Types of Needs
The needs of a cognitive system fall into three groups: physiological needs, social
needs and cognitive. Note that needs do not form a hierarchy, as for instance
suggested by Maslow [9], but all act on the same level. This means that needs do not
have to be satisfied in succession (many needs may never be fully satisfied), but
concurrently. To establish priorities between different urges, each one is multiplied
with a weight parameter weightd that expresses its importance relative to other needs.
Each need has a decayd that specifies how quickly the cd drops towards v0 when left
alone, and thus, how often it has to be replenished. Furthermore, each need has a
parameter gaind and lossd, which expresses how much it reacts to satisfaction or
frustration. Using a suitable parameter set [weight, decay, gain, loss] for each demand
or an agent, we can account for individual variance of motivational traits, and
personality properties [10].
Physiological needs regulate the basic survival of the organism and reflect demands
of the metabolism and physiological well-being. The corresponding urges originate in
proprioceptive measurements, levels of hormones and nutrients in the bloodstream, etc.
Physiological needs include sustenance (food and drink), physical integrity and pain
avoidance, rest, avoidance of hypothermia and hyperthermia, and many more.
(See figure 3 for an example implementation of physiological needs in MicroPsi2.)
6 J. Bach
Social needs direct the behavior towards other individuals and groups. They are
satisfied and frustrated by social signals and corresponding mental representations,
but pleasure and displeasure from these sources is not necessarily less relevant to a
subject than physiological pain. Consequently, people are often willing to sacrifice
their food, rest, health or even their life to satisfy a social goal (getting recognition,
supporting a friend, saving a child, winning a partner, maintaining one’s integrity,
avoiding loss of reputation etc.). Individual differences in the weight of social needs
may result in more altruist or egotist, extraverted or introverted, abrasive or agreeable,
romantic or a-romantic personalities.
Social Needs
Affiliation is the need for recognition and acceptance by other individuals or groups. It
is satisfied by legitimacy signals, such as smiles and praise, and frustrated by frowns
Modeling Motivation in MicroPsi 2 7
Cognitive Needs
Competence is either task-related, effect-related or general:
• Epistemic, or task-related competence measures success at individual skills. The
execution of a skill, and acquisition of new skills lead to satisfaction; failure, or
anticipation of failure to frustration.
• Effect-related competence measures the ability to exert changes in the
environment, with a gain that is proportional to the size of the observed effect.
• General competence is a compounded measure of the ability to satisfy needs
(including the acquisition of epistemic competence). The strength of the urge is
used as a heuristic to reflect on general performance, and to predict success at
unknown tasks. Low general competence amounts to a lack of confidence.
8 J. Bach
4 Decision-Making
According to the Psi theory, all behaviors are either directed on the satisfaction of a
need, or on the avoidance of the frustration of a need. Even serendipitous behavior is
directed on need satisfaction (on exploration, rest or aesthetics). Identifying goals and
suitable actions is the task of the decision-making system (figure 5).
Once a need becomes active and cannot be resolved by autonomous regulation, an
urge signal is triggered, which brings the need to the attention of the reactive layer of
the cognitive agent. Through past experiences, the urge has been associated with
various actions, objects and situations that satisfied it in the past (appetitive goals),
and situations and objects that frustrated it (aversive goals). Via activation spreading
along these associations, relevant content in memory and perception is highlighted.
If an appetitive goal is perceived immediately, and there is no significant
interference with current activity, the urge can be satisfied opportunistically, which
will not require significant attentional processing. Otherwise, the agent attempts to
suppress the new urge (by subtracting a selection threshold that varies based on the
strength and urgency of the already dominant motive).
If the urge overcomes the selection threshold, and turns out to be stronger than the
currently dominant urge, the current behavior is interrupted. The agent now tries to
recall an applicable strategy, using spreading activation to identify a possible
sequence of actions/world states from the current world situation (i.e., the currently
active world model) to one of the highlighted appetitive goals. If such a strategy
cannot be discovered automatically, the agent engages additional attentional resources
and attempts to construct a plan, by matching known world situations and actions into
a possible chain that can connect the current world situation to one of the appetitive
goals. (At the moment, MicroPsi2 uses a simple hill-climbing planner, but many
planning strategies can be used.)
Modeling Motivation in MicroPsi 2 9
If plan construction fails, the agent gives up on pursuing the current urge, but
increases its need for exploration (which will increase the likelihood of orientation
behaviors to acquire additional information about the current situation, or even trigger
experimental and explorative behavior strategies).
A successfully identified plan or automatism amounts to a motive (a combination
of an active urge, a goal, and a sequence of actions to reach that goal). The strength of
the motive is determined by estimating the reward of reaching the goal, the urgency of
resolving the need, the probability of success, and dividing the result of these factors
by the estimated cost of implementing the plan. The strongest motive will be raised to
an intention, that is, it becomes the new dominant motive, and governs the actions of
the agent. The probability of succeeding in implementing a strategy is currently
estimated as the sum of the task specific competence (i.e. how likely the strategy
succeeded in the past), and the general competence (to account for the agents general
ability or inability to improvise and succeed in unknown circumstances).
5 Modulation
The strength of the needs of the agents does not only establish which goals an agent
follows, but also how it pursues them. Cognitive and perceptual processing are
configured by a set of global modulators (figure 6):
Arousal reflects the combined strength and urgency of the needs of the agent.
Arousal is reflected in more energy expenditure in actions, action readiness, stronger
responses to sensory stimuli, and faster reactions [12].
Valence represents a qualitative evaluation of the current situation. Valence is
determined by adding all current pleasure signals to a baseline, and then subtracting
all displeasure and currently active urges.
10 J. Bach
perceptual processing, while a static environment frees resources for deliberation and
reflection. In other words, the securing rate determines the direction of attention:
outwards, into the environment, or inwards, onto the mental stage.
The three additional modulator dimensions configure the attention of a MicroPsi
agent, by determining its width/detail, its focus, and its direction.
The values of the modulators are determined by the configurations of the urges,
and by interaction among the modulators themselves (figure 7). Arousal is determined
by the strength and urgency of all needs. A high arousal will also increase the
resolution level and increase the suppression. The resolution level is increased by the
strength of the current motive, but reduced by its urgency, allowing for faster
responses. Suppression is increased by the strength and urgency of the currently
leading motive, and is reduced by a low general competence. The securing rate is
decreased by the strength and urgency of the leading motive, but increases with low
competence and a high need for exploration (which is equivalent to experienced
uncertainty). Aggression is triggered by agents or obstacles that prevent the
realization of an important motive, and reduced by low competence.
Additionally, each modulator has at least four more or less fixed parameters that
account for individual variance between subjects: the baseline is the default value of
the modulator; the range describes the upper and lower bound of its changes, the
volatility defines the reaction to change, and the duration describes the amount of
time until the modulator returns to its baseline.
Emotions are either undirected, and can be described as typical configurations of the
modulators, along with competence and experienced uncertainty, or they are a valenced
reaction to an object, i.e. a particular motivationally relevant mental representation,
combined with an affective state. Examples of undirected emotions are joy (positive
valence and high arousal), bliss (positive valence and low arousal) or angst (negative
valence, high experienced uncertainty, submission and low competence). Directed
emotions are fear (negative valence directed on an aversive goal, submissiveness and
low competence) and anger (negative valence directed on an agent that prevented an
appetitive goal or caused the manifestation of an aversive goal, aggression, high
arousal). Jealousy may either manifest as a fear (directed on losing romantic attachment
or affiliation; submission), or as aggression (directed on an agent that prevents
satisfaction of affiliative or romantic needs).
12 J. Bach
6 Summary
MicroPsi explores the combination of a neuro-symbolic cognitive architecture with a
model of autonomous, polytelic motivation. Motives result from the association of
urges with learned goals, and plans to achieve them. Urges reflect various
physiological, social and cognitive needs. Cognitive processes are modulated in
response to the strength and urgency of the needs, which gives rise to affective states,
and allows for the emergence of emotions.
The current incarnation, MicroPsi2, adds further details to this motivational model,
especially a more detailed set of social needs (nurturing, dominance and romantic
affection). Parameters for each need (weight, gain, loss and decay) account for
individual variation and modeling of personality traits. Modulators reflect valence,
arousal and fight/flight tendency, as well as detail, focus and direction of attention.
Modulators are parameterized by baseline, range, volatility and duration. We are
currently applying the MicroPsi motivation model for analyzing the behavior of
human subjects in computer games. The motivation model is also used to control
behavior learning of autonomous AI agents in simulated environments.
While MicroPsi agents are implemented as hierarchical spreading activation
networks, the underlying theory of motivation can be integrated into other cognitive
models as well.
References
1. Bach, J.: Principles of Synthetic Intelligence – An architecture of motivated cognition.
Oxford University Press (2009)
2. Bach, J., Vuine, R.: Designing Agents with MicroPsi Node Nets. In: Günter, A., Kruse, R.,
Neumann, B. (eds.) KI 2003. LNCS (LNAI), vol. 2821, pp. 164–178. Springer, Heidelberg
(2003)
3. Bach, J.: MicroPsi 2: The Next Generation of the MicroPsi Framework. In: Bach, J.,
Goertzel, B., Iklé, M. (eds.) AGI 2012. LNCS, vol. 7716, pp. 11–20. Springer, Heidelberg
(2012)
4. Short, D.: Teaching Scientific Concepts Using a Virtual World—Minecraft. Teaching
Science 58(3), 55–58 (2012)
5. Bach, J.: A Framework for Emergent Emotions, Based on Motivation and Cognitive
Modulators. International Journal of Synthetic Emotions (IJSE) 3(1), 43–63 (2012)
6. Dörner, D.: Bauplan für eine Seele. Reinbeck (1999)
7. Dörner, D., Bartl, C., Detje, F., Gerdes, J., Halcour, D.: Die Mechanik des Seelenwagens.
Handlungsregulation. Verlag Hans Huber, Bern (2002)
Modeling Motivation in MicroPsi 2 13
8. Cai, Z., Goertzel, B., Zhou, C., Zhang, Y., Jiang, M., Yu, G.: OpenPsi: Dynamics of a
computational affective model inspired by Dörner’s PSI theory. Cognitive Systems
Research 17–18, 63–80 (2012)
9. Maslow, A., Frager, R., Fadiman, J.: Motivation and Personality, 3rd edn. Addison-
Wesley, Boston (1987)
10. Bach, J.: Functional Modeling of Personality Properties Based on Motivational Traits. In:
Proceedings of ICCM-7, International Conference on Cognitive Modeling, pp. 271–272.
Berlin, Germany (2012)
11. Fisher, H.E.: Lust, attraction and attachment in mammalian reproduction. Human Nature
9(1), 23–52 (1998)
12. Pfaff, D.W.: Brain Arousal and Information Theory: Neural and Genetic Mechanisms.
Harvard University Press, Cambridge, MA (2006)
13. Wundt, W.: Gefühlselemente des Seelenlebens. In: Grundzüge der physiologischen
Psychologie II. Engelmann, Leipzig (1910)
14. Schlosberg, H.S.: Three dimensions of emotion. Psychological Review 1954(61), 81–88
(1954)
15. Mehrabian, A.: Basic dimensions for a general psychological theory. Oelgeschlager, Gunn
& Hain Publishers, pp. 39–53 (1980)
Genetic Programming on Program Traces
as an Inference Engine for Probabilistic Languages
1 Introduction
Two crucial approaches in AGI are cognitive architectures and universal algorithmic
intelligence. These approaches start from very different points and sometimes are
even treated as incompatible. However, we believe [1] that they should be united
in order to build AGI that is both efficient and general. However, a framework is
required that can help to intimately combine them on the conceptual level and the
level of implementation. Probabilistic programming could become a suitable basis for
developing such a framework. Indeed, on the one hand, query procedures in the
Turing-complete probabilistic programming languages (PPLs) can be used as direct
approximations of universal induction and prediction, which are the central components
of universal intelligence models. On the other hand, probabilistic programming has
already been successfully used in cognitive modeling [2].
Many solutions in probabilistic programming utilize efficient inference techniques
for particular types of generative models (e.g. Bayesian networks) [3, 4]. However,
Turing-complete languages are much more promising in the context of AGI. These
PPLs allow for specifying generative models in the form of arbitrary programs includ-
ing programs which generate other programs. Inference over such generative models
automatically results in inducing programs in user-defined languages. Thus, the same
inference engine can be used to solve a very wide spectrum of problems.
© Springer International Publishing Switzerland 2015
J. Bieger (Ed.): AGI 2015, LNAI 9205, pp. 14–24, 2015.
DOI: 10.1007/978-3-319-21365-1_2
Genetic Programming on Program Traces as an Inference Engine 15
On the one hand, the performance of generic inference methods in PPLs can be ra-
ther low even for models with a small number of random choices [5]. These methods
are most commonly based on random sampling (e.g. Monte-Carlo Markov Chains) [2,
6]. There are some works on developing stronger methods of inference in Turing-
complete probabilistic languages (e.g. [5, 7]), but they are not efficient for all cases,
e.g. for inducing programs, although some progress in this direction is achieved [8].
Thus, more appropriate inference methods are needed, and genetic programming (GP)
can be considered as a suitable candidate since it has already been applied to universal
induction [9] and cognitive architectures [10].
On the other hand, wide and easy applicability of inference in PPLs is also desira-
ble by evolutionary computations. Indeed, one would desire to be able to apply some
existing implementation of genetic algorithms simply by defining the problem at hand
without developing binary representations of solutions or implementing problem-
specific recombination and mutation operators (and some attempts to overcome this
also exist in the field of genetic programming, e.g. [11]).
Consequently, it is interesting to combine generality of inference over declarative
models in Turing-complete PPLs and strength of genetic programming. This combi-
nation will give a generic tool for fast prototyping of genetic programming methods
for arbitrary domain specific languages simply by specifying a function generating
programs in a target language. It can also extend the toolkit of PPLs, since conven-
tional inference in probabilistic programming is performed for conditioning, while
genetic programming is intended for optimization of fitness functions.
In this paper, we present a novel approach to inference in PPLs based on genetic
programming and simulated annealing, which are applied to probabilistic program
(computation) traces. Each program trace is the instantiation of the generative model
specified by the program. Recombinations and mutations of program traces guarantee
that their results can be generated by the initial probabilistic program. Thus, program
traces are used as a “universal genetic code” for arbitrary generative models, and it is
enough to only specify such a model in the form of a probabilistic program to perform
evolutionary computations in the space of its instantiations. To the best of our know-
ledge, there are no works devoted to implementing an inference engine for PPLs on
the base of genetic programming, so this is the main contribution of our paper.
In [12] authors indicated that “current approaches to Probabilistic Programming are
heavily influenced by the Bayesian approach to machine learning” and the optimiza-
tion approach is promising since “optimization techniques scale better than search
techniques”. That is, our paper can also be viewed as the work in this direction, which
is much lesser explored in the field of probabilistic programming.
2 Background
Probabilistic Programs
Since general concepts of genetic programming are well known, we will concentrate
on probabilistic programming. Some PPLs extend existing languages preserving
their semantics as a particular case. Programs in these languages typically include
calls to (pseudo-)random functions. PPLs use an extended set of random functions
16 V. Batishcheva and A. Potapov
Church [2] including a number of simple functions (+, -, *, /, and, or, not, list, car,
cdr, cons, etc.), several random functions (flip, random_integer, gaussian, multinomi-
al), declaration of variables and functions (define, let), function calls with recursion.
Also, “quote” and “eval” were implemented. For example, the following program is
acceptable (which is passed to our interpreter as the quoted list)
'((define (tree) (if (flip 0.7) (random-integer 10)
(list (tree) (tree))))
(tree))
Traditional Lisp interpreters will return different results on each run of such pro-
grams. Interpreters of PPLs provide for query functions, which are used for calculat-
ing posterior probabilities or to perform sampling in accordance with the specified
condition. We wanted to extend this language with GP-based query procedures, which
accept fitness-functions instead of strict conditions. Let’s consider how genetic opera-
tors can be implemented in these settings.
Mutations
To combine genetic programming with probabilistic languages we treat each run of a
program as a candidate solution. The source of variability of these candidate solutions
comes from different outcomes of random choices during evaluation. Mutations con-
sist in slight modifications of the random choices performed during the previous eval-
uation that resembles some part of the MH-algorithm. All these choices should be
cached and bound to the execution context, in which they were made. To do this, we
implemented the following representation of program traces, which idea (but not its
implementation) is similar to that used in the mh-query implementation in Church [2].
In this representation, each expression in the original program during recursive
evaluation is converted to the structure (struct IR (rnd? val expr) #:transparent), where
IR is the name of the structure, rnd? is #t if random choices were made during evalua-
tion of the expression expr; val is the result of evaluation (one sample from the distri-
bution specified by expr). interpret-IR-prog function was implemented for evaluating
programs (lists of expressions) given in symbolic form. Consider some examples.
• (interpret-IR-prog '(10)) (list (IR #f 10 10)) meaning that the result of evalua-
tion of the program containing only one expression 10 is 10 and it is not random.
• (interpret-IR-prog '((gaussian 0 1))) (list (IR #t -0.27 (list 'gaussian (IR #f 0 0)
(IR #f 1 1)))) meaning that the result of evaluation of (gaussian 0 1) was -0.27.
• (interpret-IR-prog '((if #t 0 1))) (list (IR #f 0 (list 'if (IR #f #t #t) (IR #f 0 0)
1))) meaning that only one branch was evaluated.
• In the more complex case, random branch can be expanded depending on the
result of evaluation of the stochastic condition: (interpret-IR-prog '((if (flip) 0 1)))
(list (IR #t 1 (list 'if (IR #t #f '(flip)) 0 (IR #f 1 1)))).
• In definitions of variables only their values are transformed to IR: (interpret-IR-
prog '((define x (flip)))) (list (list 'define 'x (IR #t #f '(flip)))). Evaluation of
18 V. Batishcheva and A. Potapov
random choices). The main difference is in application of the basic random functions
since the previously returned values from both parents should be taken into account.
For example, in our implementation, the dual flip randomly returns one of the pre-
vious values, and the dual Gaussian returns (+ (* v1 e) (* v2 (- 1 e))), where v1 and
v2 are the previous values, and e is the random value in [0, 1] (one can bias the result
of this basic element of crossover towards initial Gaussian distribution). Mutations are
introduced simultaneously with crossover for the sake of efficiency.
However, such a branch can be encountered during re-evaluation that has not been
expanded yet in one or both parents. In the latter case, this branch is evaluated simply
as it was the first execution. Otherwise it is re-evaluated for the parent, for which it
has already been expanded (without crossover, but with mutations). It is not expanded
for another parent, since this expansion will be random and not evaluated by fitness
function, so it will simply clutter information from the more relevant parent.
Children can contain earlier expanded, but now unused branches, which can be ac-
tivated again in later generations due to a single mutation or even crossover. These
parts of the expanded program resemble junk DNA and fast genetic adaptations.
Let us consider one simple, but interesting case for crossover, namely a recursive
stochastic procedure '((define (tree) (if (flip 0.7) (random-integer 10) (list (tree)))).
Expansion of (tree) can produce large computation traces, so let us consider results of
crossover on the level of values of the last expression.
'(6 9) + '(8 0) '(7 6)
'(7 9) + '((0 7) (7 4)) '(7 (7 4))
'((3 (7 (1 7))) 5) + '((5 2) 2) '((4 (7 (1 7))) 2)
It can be seen that while the structure of trees matches, two program traces are re-
evaluated together and results of random-integer are merged in leaves, but when the
structure diverges, a subtree is randomly taken from one of the parents (depending on
the result of re-evaluating (flip 0.7)). This type of crossover for generated trees auto-
matically follows from the implemented crossover for program traces. Of course,
someone might want to use a different crossover operator based on the domain-
specific knowledge, e.g. to exchange arbitrary subtrees in parents. The latter is diffi-
cult to do in our program trace representation (and additional research is needed to
develop a more flexible representation). On the other hand, recombination of program
traces during dual re-evaluation guarantees that its result can be produced by the ini-
tial program, and also provides for some flexibility.
Using the described genetic operators, evolution-query was implemented.
4 Empirical Evaluation
We considered three tasks, each of which can be set both for conditional sampling and
fitness-function optimization. Three query functions were compared – mh-query
(web-church), annealing-query and evolution-query (Scheme implementation). mh-
query was used to retrieve only one sample (since we were interested in its efficiency
on this step; moreover, the tasks didn’t require posterior distributions).
20 V. Batishcheva and A. Potapov
Curve Fitting
Consider the generative polynomial model y=poly(x|ws), which parameters defined as
normally distributed random variables should be optimized to fit observations {(xi,
yi)}. Implementation of poly is straightforward. The full generative model should also
include noise, but such a model will be useless since it is almost impossible to blindly
guess noise values. Instead, MSE is used in annealing-query and evolution-query, and
the following condition is used in mh-query.
(define (noisy-equals? x y) (flip (exp (* -30 (expt (- x y) 2)))))
(all (map noisy-equals? ys-gen ys))
noisy-equals? can randomly be true, even if its arguments differ, but with decreas-
ing probability. Speed of this decrease is specified by the value, which equals 30 in
the example code. The smaller this value, the looser the equality holds. We chose
such the value that mh-query execution time approximately equals to that of anneal-
ing-query and evolution-query (which execution time is controlled by the specified
number of iterations), so we can compare precision of solutions found by different
methods. Of course, such comparison is quite loose, but it is qualitatively adequate
since linear increase of computation time will yield only logarithmic increase of pre-
cision. The results for several functions and data points are shown in Table 1.
RMSE
Task
mh-query annealing-query evolution-query
4x2+3x xs=(0 1 2 3) 1.71 0.217 0.035
2
4x +3x xs=(0 0.1 0.2 0.3 0.4 0.5) 0.94 0.425 0.010
3
0.5x –x xs=(0 0.1 0.2 0.3 0.4 0.5) 0.467 0.169 0.007
It can be seen that mh-query is inefficient here – it requires very loose noise-
equals? yielding imprecise results. Stricter condition results in huge increase of
computation time. Evolution-query is the most precise, while annealing-query works
correctly, but converges slower. The worst precision is achieved, when wn is selected
incorrectly. It is important to see, how crossover on program traces results in child-
ren’s “phenotypes”. Consider the example of how crossover affects ws values
'(1.903864 -11.119573 4.562440) +
'(-20.396958 -12.492696 -0.735389 3.308482)
'(-5.232313 -11.462677 2.3152821 3.308482)
The values in same positions are averaged with random weights (although these
weights are independent for different positions as in geometric semantic crossover). If
lengths (i.e. wn) of parent’s vector parameters ws differ, the child’s wn value will
correspond to that of one of the parents or will be between them.
Genetic Programming on Program Traces as an Inference Engine 21
Correct answers, %
ys
mh-query annealing-query evolution-query
'(0 1 2 3 4 5) 90% 100% 100%
'(0 1 4 9 16 25) 20% 100% 100%
'(1 2 5 10 17 26) 10% 70% 80%
'(1 4 9 16 25 36) 0% 90% 80%
'(1 3 11 31 69 131) 0% 90% 60%
mh-query yielded surprisingly bad results here, although its inference over other
recursive models can be successful. evolution-query also yielded slightly worse re-
sults than annealing-query. The reason probably consists in that this task could not be
well decomposed into subtasks, so crossover doesn’t yield benefits, but annealing can
approach to the best solution step by step.
Nevertheless, it seems that crossover operator over program traces produces quite
reasonable results in the space of phenotypes. If the structure of the parents matches,
each leaf is randomly taken from one of the parents, e.g. '(+ (+ 3 x) x) + '(- (- x x) x)
'(- (+ x x) x). In nodes, in which the structure diverges, a subtree is randomly taken
from one of the parents, e.g.
'(- (- (* (* 3 (* x x)) 3) (- x 8)) (* (- x 0) x)) +'(- 3 (- 5 x)) '(- 3 (* (- x 0) x))
'(* (+ 4 x) x) + '(* (* 2 (- 1 x)) 7) '(* (* 4 x) 7)
“Phenotypic” crossover effect is somewhat loose, but not meaningless, and it pro-
duces valid candidate solutions, which inherit information from their parents.
Genetic Programming on Program Traces as an Inference Engine 23
5 Conclusion
Acknowledgements. This work was supported by Ministry of Education and Science of the
Russian Federation, and by Government of Russian Federation, Grant 074-U01.
References
1. Potapov, A., Rodionov, S., Myasnikov, A., Begimov, G.: Cognitive Bias for Universal
Algorithmic Intelligence (2012). arXiv:1209.4290v1 [cs.AI]
2. Goodman, N.D., Mansinghka, V.K., Roy, D.M., Bonawitz, K., Tenenbaum, J.B.: Church:
a language for generative models (2008). arXiv:1206.3255 [cs.PL]
3. Minka, T., Winn, J.M., Guiver, J.P., Knowles, D.: Infer.NET 2.4. Microsoft Research
Camb. (2010). http://research.microsoft.com/infernet
4. Koller, D., McAllester, D.A., Pfeffer, A.: Effective Bayesian inference for stochastic
programs. Proc. National Conference on Artificial Intelligence (AAAI), pp. 740–747
(1997)
5. Stuhlmüller, A., Goodman, N.D.: A dynamic programming algorithm for inference in
recursive probabilistic programs (2012). arXiv:1206.3555 [cs.AI]
6. Milch, B., Russell, S.: General-purpose MCMC inference over relational structures. In:
Proc. 22nd Conference on Uncertainty in Artificial Intelligence, pp. 349–358 (2006)
7. Chaganty, A., Nori, A.V., Rajamani, S.K.: Efficiently sampling probabilistic programs via
program analysis. In: Proc. Artificial Intelligence and Statistics, pp. 153–160 (2013)
8. Perov, Y., Wood, F.: Learning Probabilistic Programs (2014). arXiv:1407.2646 [cs.AI]
24 V. Batishcheva and A. Potapov
9. Solomonoff, R.: Algorithmic Probability, Heuristic Programming and AGI. In: Baum, E.,
Hutter, M., Kitzelmann, E. (eds). Advances in Intelligent Systems Research, vol. 10 (proc.
3rd Conf. on Artificial General Intelligence), pp. 151–157 (2010)
10. Goertzel, B., Geisweiller, N., Pennachin, C., Ng, K.: Integrating feature selection into
program learning. In: Kühnberger, K.-U., Rudolph, S., Wang, P. (eds.) AGI 2013. LNCS,
vol. 7999, pp. 31–39. Springer, Heidelberg (2013)
11. McDermott, J., Carroll, P.: Program optimisation with dependency injection. In: Krawiec,
K., Moraglio, A., Hu, T., Etaner-Uyar, A., Hu, B. (eds.) EuroGP 2013. LNCS, vol. 7831,
pp. 133–144. Springer, Heidelberg (2013)
12. Gordon, A.D., Henzinger, Th.A., Nori, A.V., Rajamani, S.K.: Probabilistic programming.
In: Proc. International Conference on Software Engineering (2014)
Scene Based Reasoning
Calle Aprestadora 19, 12o 2a, E-08902 L'Hospitalet de LLobregal, Catalonia, Spain
[email protected], [email protected]
1 Introduction
In this paper we describe Scene Based Reasoning (SBR), a cognitive architecture in
the tradition of SOAR [14], ACT-R [1] and similar systems [10]. Particular similari-
ties exist with the ICARUS system [16] with respect to the explicit representation of
plans with decompositions, the “grounding in physical states”, the purpose of control-
ling a physical agent, the use of observable attributes as a semantic base and spatial
roles/ relationships between objects. Both systems share a development roadmap that
includes modeling social interaction [16].
The distinctive characteristic of SBR is the use of “scenes”, which can be thought
of as a generalization of “scene graphs” [24] (as in computer gaming in order to
represent 3D world states), Description Logic [2] (in order to represent the relation-
ships between scene objects) and STRIPS style planner states [7] (in order to model
time and action). Scenes are also used to represent internal SBR data-structures using
a kind of “gödelization” (encoding properties of the reasoning system in object-level
language): For example a “plan” (a directed graph composed of nodes and arrows)
can be mapped into a 2D “scene diagram”, similar to the way that humans draw fig-
ures and diagrams in order to gain clarity about complex subject matters. Once the
plan is available as a 2D scene, SBR can apply its object recognition and reasoning
mechanisms in order to classify the plan, create abstractions, analyze its effects, com-
pare it with other plans and modify the plan. The improved plan can be tested in a
“simulation sandbox” or the real world and can finally be converted back to an inter-
nal SBR structures for inclusion in the standard inventory of the system.
Fig. 1. An overview of SBR subsystems working together for simplified close-loop robot con-
trol. 3D reconstruction converts sensor data into a scene, which serves as an initial state for the
planner to develop plans. The attention subsystem executes plan actions and controls the atten-
tion focus in order to track execution.
In this paper the authors focus on the technical aspects of the SBR architecture and
a consistent definition of the SBR subsystems. A prototypical implementation of SBR
exists as the "TinyCog" open-source project on http://tinycog.sourceforge.net/. Tiny-
Cog currently runs several demos using a scene representation that unifies description
logics with planner states.
Scene Based Reasoning 27
2 Comparison
SBR shares characteristics with SOAR, ACT-R, ICARUS and a number of lesser
known cognitive architectures. The "Jonny Jackanapes" architecture [11] includes a
"fusion" of HTN planning with Description Logics. [22] describes a cognitive archi-
tecture with a focus on plan recognition designed to infer the intents of competitive
agents. PELA [12] describes a probabilistic planner that learns from interactions with
the world.
The symbolic "scene" representation resembles [21] semantic networks, while
scene graphs are commonly used in computer gaming [24]. [4] combine scene graphs
with semantic networks to model human vision and propose this as a representation
for "mental images".
[5] surveyed the combination of physics simulation and planning. IJCAI 2015 will
host an "Angry Birds Competition" that will require physics simulation.
[9] surveyed the combination of planning and description logics. [20] introduces
situation calculus to the FLEX DL system in order to allow for planning with DL
ABox structures.
The SBR attention subsystem resembles the [15] "Meander" subsystem for
the ICARUS cognitive architecture with similar execution tracking and re-planning
properties.
3 Architecture Overview
The proposed architecture consists of four layers with several subsystems each:
Fig. 4. "Eating dinner" - A plan for eating dinner, explaining the relationship between scenes,
scripts and plans. Tasks (in blue) may have multiple learned decompositions that are combined
by the planner to create plans with utility and cost.
Sub-Scenes. Scenes with only partially filled object attributes. Sub-scenes are used as
rule-heads and to describe the state change effect of an action.
Scripts. Consist of sequences of scenes (similar to [Schank et al 1977]) representing a
transition through time of the included objects.
Key Frames. Scenes marking the start and end points of important transitions.
Plans. A tree with one root task which is decomposed into a sequence of sub-tasks.
4 3D Scene Reconstruction
This subsystems performs the conversion of 2D sensor data into a 3D scene using an
iterative algorithm depicted below.
6 SBR Planner
The SBR takes as input an initial scene and a goal and returns a number of plans,
together with probability scores. The proposed planner includes several features from
recent research:
Fig. 6. Eating dinner satisfies hunger: Sub-scenes describe conditions and effects of tasks
Planning with these characteristics is only partially understood as of today and the
authors have not found any reference to practical systems combining stochastic plan-
ning and HTNs for more than toy domains. In the face of this situation, the authors
sketch below a new approach that relies on "active tasks" and "worst-case analysis" in
order to cope with the increased branching factor of probabilistic planning:
Active Task. A planner task with a known decomposition, together with statistics
about past executions of the task in the episodic memory, along the lines of
Scene Based Reasoning 31
PRODIGY [25] and PELA [12]. Past execution may have included re-planning or
escalation processes in order to deal with local failures, which have an impact on the
cost of the action and its duration.
The statistics of past executions of active tasks are analyzed with respect to the fac-
tors leading to success.
Worst-Case Analysis. Undesired outcomes from all tasks in a plan are treated indivi-
dually, as opposed to calculating probability distributions over "histories" [8]. Com-
bined with the cost of executing the plan and the impact of a failed plan, the SBR
planner can calculate a risk compensation and decide whether to pursue this path,
develop a better plan or to choose non-action.
Plan Statistics. The episodic memory maintains a history of "scripts" of past plan
executions, including the initial conditions and outcomes by means of the initial and
last scene. This allows to apply clustering and learning algorithms beyond the scope
of this paper.
Plan Optimization. Convert plans into 2D scenes using a "pen on paper" representa-
tion, compare, tweak, merge, mix and match different plans, pre-validate plans using
simulation and execute the new plan in the real world. All of these optimization steps
are implemented as meta-plans that can be optimized as well.
8 Logical Reasoning
The logical reasoning subsystem uses Description Logics (DL) [2] to maintains be-
liefs about the world together with confidence scores in a way similar to FLEX [20].
FLEX inference rules closely resemble SBR planner tasks, allowing the SBR planner
to execute inferences rules directly without the need for a separate DL system. The
DL "ABox" resembles SBR scenes, allowing to use DL in order to model symbolic
object relationships.
Using this architecture, the system can talk about its beliefs ("all birds can fly":
confidence=0.9), can test potential new hypotheses against a base of episodic memory
cases and track "clashes" (contradictions during reasoning like "penguin P is a bird
but doesn't fly") to the their axioms. New beliefs can be acquired via machine learn-
ing and checked against the episodic memory for consistency and explanation capa-
bility of actor's behavior. All of these operations are performed by "active tasks".
9 Attention Subsystem
The Attention Subsystem maintains a list of "persistent goals", a portfolio of plans
and controls a "focus of attention" while tracking the execution of plans.
Persistent Goals. A list of medium and long term goals. Persistent goals are created
manually by a human system operator (Asimov's laws of robotics), as a reaction to
"urges" or by SBR-Planner as part of a plan that can't be executed immediately.
Attention Focus. Most of the time the attention focus lies with the images from a
camera of a robot running SBR, but attention can also be focused on parts of the "self-
model". Sensor data are processed by 3D reconstruction and passed on to the episodic
memory in order to retrieve "associations", i.e. plans and scripts associated with the
focused objects in their context. These "ideas popping up" are matched against active
"persistent plans" in order to determine if the idea could contribute to an active plan.
When executing plans or "active tasks", the attention focus tracks the current vs.
planned world state and initiates re-planning if necessary.
Portfolio of Plans. A set of plans created in order to satisfy the list of persistent
goals. The attention subsystem evaluates the plans according to utility, cost and
chance for success and execute the plan with the highest value.
Scene Based Reasoning 33
10 Learning
Statistical learning is essential for a cognitive architecture based on a probabilistic
planner. However, most references to learning algorithms have been omitted in the
previous sections because their role is limited to auxiliary and relatively well unders-
tood tasks like calculating task success probabilities, guiding the planner search
process or clustering parameter values in order to generate new concepts. Also, the
exact choice of algorithms is irrelevant to the general AGI architecture.
This section summarizes the areas where statistical algorithms are employed:
3D Scene Reconstruction. Identify approximately objects and their positions from
sensor data.
SBR Planner. Learn and propose applicable planner tasks to given problems, learn
task decompositions, learn success factors for executing tasks.
Prediction Subsystem. Predict the behavior of agents as a script.
Episodic Memory. Maintain statistics about object occurrences in scenes, successful
execution of tasks, identify scenes leading to successful plan execution.
Plan Reasoning. Classify plans for generalization.
Logical Reasoning. Classify concepts for generalization, learn DL implication links
based on example.
Attention Subsystem. Learn the utility function of plans.
Also, non-statistical learning is employed:
Attention Subsystem. When trying to "understand" an input 3D script, the plan rec-
ognition system will try to classify all objects and to determine the plans of all in-
volved agents. Lack of such understanding may trigger active investigation, including
"asking the operator" or getting closer to the agents in order to gather better sensor
input.
Acknowledgment. The authors are grateful to Ben Goertzel, Sergio Jiménez, Anders Jonsson
and José Hernandez-Orallo for their comments on early drafts of this paper.
References
1. Anderson, J.R., Lebiere, C.: The newell test for a theory of cognition. Behavioral and
Brain Sciences 26(05), 587–601 (2003)
2. Brachman, R.J.: What’s in a concept: structural foundations for semantic networks. Inter-
national Journal of Man-Machine Studies 9(2), 127–152 (1977)
3. Carberry, S.: Techniques for plan recognition. User Modeling and User-Adapted Interac-
tion 11(1–2), 31–48 (2001)
4. Croft, D., Thagard, P.: Dynamic imagery: a computational model of motion and visual
analogy. In: Model-Based Reasoning, pp. 259–274. Springer (2002)
5. Davis, E., Marcus, G.: The scope and limits of simulation in cognition and automated
reasoning. Artificial Intelligence (2013)
34 F. Bergmann and B. Fenton
6. Erol, K.: Hierarchical task network planning: formalization, analysis, and implementation.
Ph.D. thesis, University of Maryland (1996)
7. Fikes, R.E., Nilsson, N.J.: STRIPS: A new approach to the application of theorem proving
to problem solving. Artificial intelligence 2(3), 189–208 (1972)
8. Ghallab, M., Nau, D., Traverso, P.: Automated planning: theory & practice. Elsevier
(2004)
9. Gil, Y.: Description logics and planning. AI Magazine 26(2), 73 (2005)
10. Goertzel, B., Pennachin, C., Geisweiller, N.: Engineering General Intelligence, Part 1,
Springer (2014)
11. Hartanto, R., Hertzberg, J.: Fusing DL reasoning with HTN planning. In: Dengel, A.R.,
Berns, K., Breuel, T.M., Bomarius, F., Roth-Berghofer, T.R. (eds.) KI 2008. LNCS
(LNAI), vol. 5243, pp. 62–69. Springer, Heidelberg (2008)
12. Jim_enez Celorrio, S.: Planning and learning under uncertainty. Ph.D. thesis, Universidad
Carlos III de Madrid, Escuela Politécnica Superior (2011)
13. Konolige, K., Nilsson, N.J.: Multiple-agent planning systems. AAAI. 80, 138–142 (1980)
14. Laird, J.E., Newell, A., Rosenbloom, P.S.: Soar: An architecture for general intelligence.
Artificial intelligence 33(1), 1–64 (1987)
15. Langley, P.: An adaptive architecture for physical agents. In: The 2005 IEEE/WIC/ACM
International Conference on Web Intelligence, 2005. Proceedings, pp. 18–25. IEEE (2005)
16. Langley, P.: Altering the ICARUS architecture to model social cognition (2013).
http://www.isle.org/~langley/talks/onr.6.13.ppt
17. Langley, P., McKusick, K.B., Allen, J.A., Iba, W.F., Thompson, K.: A design for the
ICARUS architecture. ACM SIGART Bulletin 2(4), 104–109 (1991)
18. Metzinger, T.: Being no One: The Self-Model Theory of Subjectivity. MIT Press (2003)
19. Nuxoll, A.M., Laird, J.E.: Extending cognitive architecture with episodic memory. In:
Proceedings of the National Conference on Artificial Intelligence. vol. 22, p. 1560. Menlo
Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999 (2007)
20. Quantz, J.J., Dunker, G., Bergmann, F., Kellner, I.: The FLEX system. KIT Report 124,
Technische Universität Berlin (1995)
21. Quillian, M.: A notation for representing conceptual information: An application to seman-
tics and mechanical English paraphrasing, sp-1395. System Development Corporation,
Santa Monica (1963)
22. Santos Jr., E.: A cognitive architecture for adversary intent inferencing: Structure of know-
ledge and computation. In: AeroSense 2003, pp. 182–193. International Society for Optics
and Photonics (2003)
23. Schank, R.C., Abelson, R.P.: Scripts, plans, goals, and understanding: An inquiry into hu-
man knowledge structures. Erlbaum (1977)
24. Strauss, P.S.: IRIS inventor, a 3d graphics toolkit. In: Proceedings of the Eighth Annual
Conference on Object-oriented Programming Systems, Languages, and Applications,
pp. 192–200. OOPSLA 1993, ACM, New York, NY, USA (1993). http://doi.acm.org/10.
1145/165854.165889
25. Veloso, M., Carbonell, J., Perez, A., Borrajo, D., Fink, E., Blythe, J.: Integrating planning
and learning: The prodigy architecture. Journal of Experimental & Theoretical Artificial
Intelligence 7(1), 81–120 (1995)
Anchoring Knowledge in Interaction: Towards
a Harmonic Subsymbolic/Symbolic Framework
and Architecture of Computational Cognition
1 A Harmonic Analogy
systems can promote robust learning from data, as part of an online learning
and reasoning cycle to be measured in terms of an improved experience, a faster
adaptation to a new task, and the provision of clear descriptions. On this level,
a lifting procedure shall be specified that will produce descriptions, thus lifting
grounded situations and an agent’s action patterns to a more abstract (symbolic)
representation, using techniques from machine learning like deep networks and
analogy-making. This can be seen as a natural consequence of recent research
developed for deep learning and neural-symbolic computing, the crucial added
value over the state of the art being the combination of these new methodologies
with analogical transfer of information between representation systems.
Knowledge at a symbolic level is usually considered to be static and error-
intolerant. Due to the fact that initial multi-modal representations lifted from
the subsymbolic level can be error-prone, and that different agents might use dif-
ferent and a priori possibly incompatible representation languages, the program’s
objective at the level of symbolic representations is a dynamic re-organization
based on ontology repair mechanisms, analogy, concept invention, and knowl-
edge transfer. These mechanisms foster adaptation of an agent to new situations,
the alignment between representations of different agents, the reformulation of
knowledge entries, and the generation of new knowledge.
In summary, the envisioned account of the emergence of representations
through cognitive principles in an agent (or multi-agent) setting can be con-
ceptualized as follows: Grounding knowledge in cognitively plausible multimodal
interaction paradigms; lifting grounded situations into more abstract representa-
tions; reasoning by analogy and concept blending at more abstract levels; repair
and re-organization of initial and generated abstract representations.
Applications for such a framework are manifold and not limited to the “clas-
sical” realm of robotic systems or other embodied artificial agents. Also, for
instance, future tools in e-learning education – in order to guarantee sustainable
and life-long learning tools for different groups of learners – will focus on aspects
such as, for instance, adaptivity to target groups of learners, modeling of the
knowledge level of group members, multi-modality, integration of a richer reper-
toire of interaction styles of learning including action-centered set-ups, promotion
of cooperative and social learning, etc. Such devices are inconceivable without a
cognitive basis, adaptation, multiple representations, concept invention, repair
mechanisms, analogical transfer, different knowledge levels, and robust learning
abilities.
and the subsymbolic level, to adapt, or in case of clashes, repair the initial rep-
resentations in order to fit to new situations, and to evaluate the approach in
concrete settings providing feedback to the system in a reactive-adaptive evolu-
tionary cycle.
The project’s scope is primarily focused on providing answers to several long-
standing foundational questions. Arguably the most prominent among these,
together with answers based on the conceptual commitments underlying the
discussed research program, are:
1.) How does knowledge develop from the concrete interaction sequences to the
abstract representation level? The crucial aspect is the lifting of grounded situ-
ations to more abstract representations.
2.) How can experience be modeled? Experience can be explained by deep learn-
ing.
3.) How is deeper understanding of a complex concept made possible? Theory
repair makes precisely this possible.
4.) To which extent do social aspects play a role? Analogical transfer of knowl-
edge between agents is a central aspect concerning efficient and flexible learning
and understanding.
Although efforts are directed towards reintegrating the different aspects of
agent cognition spanning from abstract knowledge to concrete action, there is
also a strong drive toward new concepts and paradigms of cognitive and agent-
based systems. A fresh look at the embodiment problem is proposed, as the
envisioned account goes significantly beyond the perception-action loop and
addresses the problem of the possibility of higher intelligence where it occurs,
namely at the level of the emergence of abstract knowledge based on an agent’s
concrete interaction with the environment. Similarly, learning aspects are tack-
led not only on a technical level, but furthermore pushed beyond the technical
area by gaining inspiration from cognitive science and concept-guided learning
in the sense of analogical learning and concept blending, as well as from newer
findings in neural networks learning.
The new approach for modeling knowledge in its breadth, namely from its
embodied origins to higher level abstractions, from the concrete interaction
between an agent and its environment to the abstract level of knowledge trans-
fer between agents, and from the holistic view of knowledge as an interplay
between perception, (inter)action, and reasoning to specific disembodied views
of knowledge, touches on different aspects and fields of research. It therefore
requires the integration of expressive symbolic knowledge representation for-
malisms, relational knowledge, variables, and first-order logic on the one hand
with representations of sensorimotor experiences, action patterns, connectionist
representations, and multi-modal representations on the other.
The different topics above will be formalized, algorithmically specified, imple-
mented in running applications and evaluated. With respect to the formalization,
40 T.R. Besold et al.
At the current stage, the suggested research program is still mostly in its concep-
tion and planning phase. Nonetheless, a basic conceptual architecture (see Fig. 1)
can already be laid out based on the considerations discussed in the previous sec-
tions: depending on the perspective and degree of abstraction, this architecture
can either be sub-divided into five hierarchical layers (respectively correspond-
ing to the five thrusts sketched in the previous section) or can be conceptualized
as structured in three (partially overlapping) functional components. In the lat-
ter case, the cognitive foundations and the anchoring layer are combined into a
low-level subsymbolic module, analogy/blending and concept formation/repair
into a high-level symbolic module, and anchoring, knowledge lifting, and anal-
ogy into an intermediate module bridging in the direction from the low-level to
the high-level component. Concerning the individual modules, interaction hap-
pens both between layers within components (as, e.g., between analogy/blending
and concept formation/reformation layer) as well as across components (as, e.g.,
through the feedback from the concept formation/reformation to the anchoring).
This results in an architecture adhering to and implementing the “harmonic anal-
ogy” setting from the introductory section, with changes in one layer propagating
to others in order to maintain a “harmonic” configuration.
Within the low-level module, conceptors and similar approaches are employed
in order to establish a certain initial structure of the perceptual input stream on
a subsymbolic level, additionally reinforcing the proto-structure already imposed
by the properties of the embodiment-inspired approach to computation. This ini-
tial structure can then be used as basis upon which the anchoring layer oper-
ates, coupling elements of this structure to objects and entities in the perceived
environment and/or to action-based percepts of the agent. This coupling goes
beyond the classical accounts of anchoring in that not only correspondences
on the object/entity level are created, but also properties and attributes of
objects/entities are addressed. Thus, subsymbolic correspondences between the
initial structured parts of the perceptual input stream as representational vehicles
Anchoring Knowledge in Interaction 43
Fig. 1. An overview of the conceptual structure, functional components, and the inter-
play between layers of the envisioned architecture implementing the cycle of learning
through experience, higher-order deliberation, theory formation and revision
and their actual representational content are established. These vehicle-content
pairs then can be arranged in a hierarchical structure, both on object/entity
level and on connected object/entity-specific property levels, based on general
attributes of the perceptual input stream (as, e.g., order of occurrence of the
respective structures, relations between structures) hinting at elements of the rep-
resentational content, and on direct properties of the representations in their func-
tion and form as representational vehicles.
Within the high-level module, analogy and blending are applied on rich logic-
based representations to find corresponding concepts and knowledge items, to
transfer and adapt knowledge from one context into an analogically-related simi-
lar one, and to combine existing concepts into new concepts based on analogical
correspondences between the inputs. Still, these processes are error-prone in
that they can reveal inconsistencies between existing concepts, or can introduce
new inconsistencies by concept combination or concept transfer and adaptation.
Arising inconsistencies can then be addressed by the top-level concept formation
and reformation layer, allowing to repair inconsistent symbolic representations
through manipulations of the representational structure and to introduce new
representations or concepts by introducing new representational elements – and,
when doing so, informing and influencing the subsymbolic anchoring layer to
perform corresponding adaptations in its vehicle-content correspondences.
Finally, the intermediate module bridging from low-level to high-level pro-
cessing takes the correspondences between representing structures and repre-
sentational content established by the anchoring layer, and uses deep learning
44 T.R. Besold et al.
References
1. Arnold, L., Paugam-Moisy, H., Sebag, M.: Unsupervised layer-wise model selection
in deep neural networks. In: Proceedings of ECAI 2010: 19th European Conference
on Artificial Intelligence, pp. 915–920. IOS Press (2010)
2. Bengio, Y.: Learning Deep Architectures for AI. Foundations and Trends in
Machine Learning 2(1), 1–127 (2009)
3. Bundy, A.: The interaction of representation and reasoning. Proceedings of the
Royal Society A: Mathematical, Physical and Engineering Sciences 469(2157)
(2013)
Anchoring Knowledge in Interaction 45
4. Chella, A., Frixione, M., Gaglio, S.: Anchoring symbols to conceptual spaces: the
case of dynamic scenarios. Robotics and Autonomous Systems 43(2–3), 175–188
(2003)
5. Coradeschi, S., Saffiotti, A.: Anchoring symbols to sensor data: preliminary report.
In: Proceedings of the 17th AAAI Conference, pp. 129–135. AAAI Press (2000)
6. Coradeschi, S., Saffiotti, A.: Symbiotic Robotic Systems: Humans, Robots, and
Smart Environments. IEEE Intelligent Systems 21(3), 82–84 (2006)
7. De Penning, H.L.H., Garcez, A.S.D., Lamb, L.C., Meyer, J. J. C.: A Neural-
symbolic cognitive agent for online learning and reasoning. In: Proceedings of
the 22nd International Joint Conference on Artificial Intelligence, pp. 1653–1658.
AAAI Press (2011)
8. Diaconescu, R.: Institution-independent Model Theory. Birkhäuser, 1st edn. (2008)
9. Dietterich, T.G.: Approximate Statistical Tests for Comparing Supervised Classi-
fication Learning Algorithms. Neural Comput. 10(7), 1895–1923 (1998)
10. Fischer, M.H.: A hierarchical view of grounded, embodied, and situated numerical
cognition. Cognitive Processing 13(1), 161–164 (2012)
11. Gärdenfors, P.: Conceptual Spaces: The Geometry of Thought. MIT Press (2000)
12. Gkaniatsou, A., Bundy, A., Mcneill, F.: Towards the automatic detection and cor-
rection of errors in automatically constructed ontologies. In: 8th International Con-
ference on Signal Image Technology and Internet Based Systems 2012, pp. 860–867
(2012)
13. Hall, D., Llinas, J.: An introduction to multisensor data fusion. Proceedings of the
IEEE 85(1), 6–23 (1997)
14. Hinton, G.E.: A practical guide to training restricted boltzmann machines. In:
Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the
Trade, 2nd edn. LNCS, vol. 7700, pp. 599–619. Springer, Heidelberg (2012)
15. Jaeger, H.: Controlling recurrent neural networks by conceptors. arXiv (2014),
1403.3369v1 [cs.CV] (March 13, 2014)
16. LeBlanc, K., Saffiotti, A.: Cooperative anchoring in heterogeneous multi-robot
systems. In: 2008 IEEE International Conference on Robotics and Automation,
pp. 3308–3314 (2008)
17. Lehmann, J., Chan, M., Bundy, A.: A Higher-Order Approach to Ontology Evo-
lution in Physics. Journal on Data Semantics 2(4), 163–187 (2013)
18. Lovett, A., Forbus, K., Usher, J.: A structure-mapping model of raven’s progressive
matrices. In: 32nd Annual Meeting of the Cognitive Science Society, pp. 2761–2766
(2010)
19. Marr, D.: Vision. A Computational Investigation into the Human Representation
and Processing of Visual Information. W. H. Freeman and Company (1982)
20. McLure, M., Friedman, S., Forbus, K.: Learning concepts from sketches via ana-
logical generalization and near-misses. In: 32nd Annual Meeting of the Cognitive
Science Society, pp. 1726–1731 (2010)
21. Schwering, A., Krumnack, U., Kühnberger, K.U., Gust, H.: Syntactic Principles of
Heuristic-Driven Theory Projection. Journal of Cognitive Systems Research 10(3),
251–269 (2009)
Safe Baby AGI
1 Introduction
Various kinds of robot uprisings have long been a popular trope of science fiction.
In the past decade similar ideas have also received more attention in academic
circles [2,4,7]. The “fast takeoff” hypothesis states that an “intelligence explo-
sion” might occur where a roughly human-level AI rapidly improves immensely
by acquiring resources, knowledge and/or software – in a matter of seconds,
hours or days: too fast for humans to react [2]. Furthermore, AI would not
inherently care about humanity and its values, so unless we solve the difficult
task of exactly codifying our wishes into the AI’s motivational system, it might
wipe out humanity – by accident or on purpose – if it views us as rivals or threats
to its own goals [2,6]. Some have suggested that AGI research should be slowed
or stopped while theoretical work tries to guarantee its safety [7].
This work is supported by Reykjavik University’s School of Computer Science and a
Centers of Excellence grant of the Science & Technology Policy Council of Iceland.
c Springer International Publishing Switzerland 2015
J. Bieger (Ed.): AGI 2015, LNAI 9205, pp. 46–49, 2015.
DOI: 10.1007/978-3-319-21365-1 5
Safe Baby AGI 47
3 Overpowering Humanity
In a fast takeoff scenario the AI suddenly starts to exponentially improve its intel-
ligence, so fast that humans cannot adequately react. Whether the “returns” on
various kinds of intelligence increase are actually diminishing, linear or accel-
erating is a subject of debate, and depends on the (currently unknown) way
the AGI works. Assuming for a moment that an AI would even want to, it
would need to grow extremely powerful to pose an existential threat to human-
ity. Explosive growth would require the acquisition of more or better hardware,
software, knowledge or skill. For instance, learning to read or gaining internet
access (whichever comes last) would let the system acquire vast amounts of
knowledge (if hardware and software allow it). To avoid a fast takeoff – if it is
even likely to begin with – we must prevent such acquisitions. Many proposals
for controlling AGI have been made that would help to accomplish this, such as
boxing/confinement, virtual worlds, resetting and monitoring [7].
Objections to these proposals are often rooted in the superior intelligence
of an AGI. For instance, it could charm its “jailors” into releasing it, or hide
its actual intelligence. But early-stage baby-level AI will not be capable of this.
It should not be difficult to detect if it is radically self-improving, acquiring
resources (both computational and physical), or learning harmful skills and
knowledge (e.g. related to warfare or subjugation). Even the most grandiose
predictions don’t suggest that it would only take a single step to go from rela-
tively harmless to existentially threatening, which means there is an opportunity
to intervene. We should only let the AI develop as far as we are comfortable with,
and use our observations to refine all aspects of the system, including its safety.
basic AI drives he points out the absence of the all-important one: a drive towards
cooperation, community, and being social.
But even if an AI were to seek the destruction of humanity, would it be worth
the risk? An intelligent system knows about its own fallibility. Making a move
for dominance on Earth and failing could lead to its own destruction, and even
gathering information on the topic may tip off others. Making and executing
(preliminary) plans would need to happen covertly while the AI “lays in wait”
until it is time to strike. How does the AI know that there are no other more
powerful AIs doing the same?
References
1. Bieger, J., Thórisson, K.R., Garrett, D.: Raising AI: tutoring matters. In: Goertzel, B.,
Orseau, L., Snaider, J. (eds.) AGI 2014. LNCS, vol. 8598, pp. 1–10. Springer, Heidelberg
(2014)
2. Bostrom, N.: Superintelligence: Paths, dangers, strategies. Oxford University Press
(2014)
3. Dewey, D.: Learning what to value. In: Schmidhuber, J., Thórisson, K.R., Looks,
M. (eds.) AGI 2011. LNCS, vol. 6830, pp. 309–314. Springer, Heidelberg (2011)
4. Future of Life Institute: Research priorities for robust and beneficial artificial intel-
ligence (January 2015)
5. Goertzel, B., Bugaj, S.V.: Stages of ethical development in artificial general intelli-
gence systems. Frontiers in Artificial Intelligence and applications 171, 448 (2008)
6. Omohundro, S.M.: The basic AI drives. Frontiers in Artificial Intelligence and appli-
cations 171, 483 (2008)
7. Sotala, K., Yampolskiy, R.V.: Responses to catastrophic AGI risk: a survey. Physica
Scripta 90(1), 018001 (2015)
8. Turing, A.M.: Computing machinery and intelligence. Mind 59(236), 433–460 (1950)
9. Waser, M.R.: Discovering the foundations of a universal system of ethics as a road to
safe artificial intelligence. In: AAAI Fall Symposium: Biologically Inspired Cognitive
Architectures, pp. 195–200 (2008)
Observation, Communication and Intelligence
in Agent-Based Systems
1 Introduction
The literature on multiagent systems has put forward many studies showing how
factors such as communication [1,3,7,9] and observation [4,5,11] influence the
performance of multiagent systems. However, it is ambiguous whether (a) aug-
menting the agents’ observations to read/interpret the environment in which
they operate, or rather (b) boosting communication between these agents, has
higher influence on their performance, which is the main motivation behind this
research. In fact, one of the fundamental characteristics of agent-based systems
is their ability to observe/perceive and sense the environment [5,12]. Within a
multiagent system setting, perhaps the main property of agents is their ability
to interact and communicate [12, Sect.5].
The goal of this paper is to compare the above factors by measuring the
influence that each has on the intelligence of cooperative agent-based systems.
Moreover, we try to reveal the dependencies between one factor and another.
To the best of our knowledge, no studies have applied formal intelligence tests
for this purpose. In real-world multiagent applications, agents can have limited
sensitivity of the environment (observations), thus relying on communication to
improve their performance can be inevitable. Therefore, quantifying the influ-
ence of the rules of information aggregation on the effectiveness of such systems
is likelyto have major implications by predicting the usefulness and expected
performance of these systems over different settings.
c Springer International Publishing Switzerland 2015
J. Bieger (Ed.): AGI 2015, LNAI 9205, pp. 50–59, 2015.
DOI: 10.1007/978-3-319-21365-1 6
Observation, Communication and Intelligence in Agent-Based Systems 51
to describe o. Consequently, we expect from the theory that the more information
given to an agent (or the larger its set of observation about the environment),
the higher the probability that this agent will accurately reason about it by
processing and interpreting the provided information. Furthermore, we denote by
c the communication range of an agent π. The amount of information transmitted
within c is calculated as the entropy H(c) which, using log2 , refers to the minimal
binary representation needed to describe the transmitted data over the range c.
3 Experiments
We have conducted a series of controlled experiments on a cooperative collective
of agents Π over the anYnt test, using the test environment class implemen-
tation found in [2, Sect. 3.2], which is an extension of the spatial environment
space described in [6, Sect.6.3]. Each experiment consisted of 200 iterations of
observation-communication-action-reward sequences and the outcome from each
experiment is an average score returning a per-agent measure of success of the
collective of evaluated agents over a series of canonical tasks of different algo-
rithmic complexities. The number of agents used was |Π| = 20 agents, evaluated
over an environment space of H(μ) = 11.28 bits of uncertainty.
A description of our experiments can be stated as follows: we evaluate a
group of agents Π over a series of (anYnt) intelligence tests and record the
group’s score Υ (Ho , Hc ) over a range of entropy values H(o) and H(c). The
score Υ (Ho , Hc ) is a real number in [−1.0, 1.0]. Average results of Π (using the
different communication modes described in Sect. 2.3) taken from 1000 repeated
experiments are depicted in Fig. 1. Note that the coefficient of variation is less
than 0.025 across our experiments. We denote by E the set of entropy values used
in Fig. 1. These values are in the range [0.04, 10.84] bits, and they correspond to
log2 n, where n is the number of states in o or c, as appropriate. Moreover, Fig. 2
depicts the scores Υ (Ho , Hc ) from Fig. 1, plotted for fixed values of H(c) across
increasing values of H(o) (left-side plots of Fig. 2) and vice versa (right-side plots
of Fig. 2). We analyze and discuss these results in the following section.
9.97 0.0469 0.0646 0.0837 0.1078 0.1290 0.1502 0.1712 0.1978 0.2256 0.2667 0.2990 0.3179 0.3297 0.3422 0.3609 0.3778 0.3933 0.4140 0.4291 0.4385 0.4448 0.4475 0.4468 0.4473
9.14 0.0450 0.0657 0.0883 0.1008 0.1203 0.1434 0.1738 0.1950 0.2486 0.2907 0.3245 0.3419 0.3491 0.3652 0.3823 0.3930 0.4073 0.4213 0.4335 0.4398 0.4470 0.4472 0.4469 0.4465
8.34 0.0441 0.0618 0.0824 0.1022 0.1200 0.1409 0.1766 0.2188 0.2637 0.3064 0.3394 0.3588 0.3695 0.3779 0.3955 0.4059 0.4180 0.4273 0.4339 0.4458 0.4470 0.4459 0.4464 0.4468
7.58 0.0461 0.0702 0.0849 0.1046 0.1222 0.1497 0.1841 0.2333 0.2793 0.3249 0.3541 0.3707 0.3845 0.3945 0.4046 0.4131 0.4231 0.4311 0.4419 0.4450 0.4449 0.4466 0.4461 0.4471
6.86 0.0489 0.0677 0.0863 0.1056 0.1274 0.1584 0.2024 0.2542 0.3105 0.3445 0.3696 0.3828 0.3943 0.4005 0.4111 0.4202 0.4268 0.4389 0.4410 0.4448 0.4458 0.4465 0.4463 0.4467
6.18 0.0486 0.0671 0.0875 0.1076 0.1313 0.1676 0.2249 0.2817 0.3274 0.3575 0.3752 0.3897 0.4002 0.4079 0.4153 0.4229 0.4325 0.4383 0.4388 0.4409 0.4432 0.4438 0.4451 0.4468
Communication entropies : H(c)
5.53 0.0478 0.0720 0.0927 0.1139 0.1470 0.1853 0.2503 0.3062 0.3457 0.3669 0.3827 0.3969 0.4053 0.4119 0.4172 0.4284 0.4327 0.4318 0.4360 0.4362 0.4386 0.4404 0.4438 0.4467
4.91 0.0506 0.0721 0.0959 0.1199 0.1556 0.2088 0.2733 0.3255 0.3614 0.3770 0.3872 0.4016 0.4080 0.4141 0.4236 0.4295 0.4293 0.4306 0.4336 0.4353 0.4370 0.4408 0.4442 0.4468
Indirect communication
4.33 0.0506 0.0701 0.0965 0.1268 0.1726 0.2388 0.3002 0.3446 0.3698 0.3875 0.3941 0.4053 0.4136 0.4202 0.4246 0.4255 0.4271 0.4294 0.4299 0.4300 0.4330 0.4395 0.4436 0.4471
3.79 0.0498 0.0745 0.1018 0.1413 0.1971 0.2685 0.3224 0.3579 0.3803 0.3893 0.3989 0.4073 0.4157 0.4187 0.4167 0.4197 0.4231 0.4227 0.4220 0.4249 0.4294 0.4373 0.4432 0.4469
3.29 0.0528 0.0741 0.1050 0.1534 0.2286 0.2955 0.3363 0.3670 0.3859 0.3960 0.4037 0.4103 0.4165 0.4138 0.4168 0.4179 0.4162 0.4171 0.4158 0.4182 0.4256 0.4344 0.4425 0.4470
2.82 0.0486 0.0790 0.1126 0.1686 0.2504 0.3151 0.3456 0.3732 0.3870 0.3982 0.4038 0.4132 0.4081 0.4109 0.4119 0.4135 0.4108 0.4071 0.4122 0.4128 0.4186 0.4308 0.4414 0.4464
2.38 0.0533 0.0809 0.1203 0.1881 0.2643 0.3216 0.3506 0.3770 0.3906 0.4011 0.4066 0.4046 0.4056 0.4054 0.4018 0.3999 0.4014 0.3987 0.4034 0.4102 0.4165 0.4279 0.4403 0.4463
1.99 0.0530 0.0836 0.1338 0.2069 0.2717 0.3202 0.3493 0.3731 0.3897 0.3976 0.3947 0.4017 0.4001 0.3958 0.3948 0.3917 0.3913 0.3956 0.3948 0.4023 0.4113 0.4285 0.4404 0.4465
1.62 0.0533 0.0898 0.1457 0.2127 0.2681 0.3125 0.3444 0.3705 0.3893 0.3902 0.3907 0.3919 0.3939 0.3867 0.3846 0.3856 0.3862 0.3843 0.3869 0.3990 0.4083 0.4269 0.4404 0.4461
1.30 0.0570 0.0929 0.1562 0.2140 0.2566 0.2991 0.3378 0.3690 0.3729 0.3830 0.3816 0.3879 0.3829 0.3784 0.3774 0.3800 0.3791 0.3811 0.3873 0.3899 0.4062 0.4238 0.4391 0.4469
1.01 0.0565 0.1033 0.1582 0.2100 0.2538 0.2894 0.3195 0.3447 0.3622 0.3741 0.3772 0.3811 0.3755 0.3741 0.3739 0.3752 0.3779 0.3805 0.3830 0.3926 0.4056 0.4247 0.4397 0.4468
0.76 0.0604 0.1051 0.1568 0.1969 0.2339 0.2635 0.2985 0.3269 0.3488 0.3615 0.3665 0.3693 0.3705 0.3718 0.3727 0.3745 0.3757 0.3811 0.3799 0.3886 0.4017 0.4243 0.4403 0.4467
0.54 0.0645 0.1002 0.1448 0.1849 0.2159 0.2478 0.2758 0.3061 0.3352 0.3500 0.3577 0.3635 0.3705 0.3736 0.3757 0.3771 0.3747 0.3794 0.3813 0.3890 0.4008 0.4220 0.4392 0.4468
0.36 0.0660 0.0956 0.1355 0.1637 0.1899 0.2197 0.2580 0.2882 0.3188 0.3381 0.3560 0.3650 0.3705 0.3741 0.3781 0.3798 0.3828 0.3873 0.3853 0.3899 0.4022 0.4216 0.4405 0.4467
0.22 0.0605 0.0917 0.1150 0.1514 0.1777 0.2061 0.2378 0.2714 0.3023 0.3317 0.3498 0.3643 0.3768 0.3797 0.3817 0.3877 0.3910 0.3916 0.3941 0.3958 0.4058 0.4231 0.4420 0.4472
0.11 0.0548 0.0779 0.0997 0.1313 0.1592 0.1893 0.2186 0.2520 0.2864 0.3165 0.3372 0.3630 0.3728 0.3845 0.3911 0.3952 0.4016 0.4032 0.4024 0.4090 0.4137 0.4274 0.4422 0.4469
0.04 0.0486 0.0676 0.0919 0.1146 0.1400 0.1715 0.2025 0.2317 0.2671 0.2987 0.3307 0.3542 0.3734 0.3890 0.3970 0.4031 0.4091 0.4116 0.4141 0.4194 0.4265 0.4305 0.4422 0.4464
0.04 0.11 0.22 0.36 0.54 0.76 1.01 1.30 1.62 1.99 2.38 2.82 3.29 3.79 4.33 4.91 5.53 6.18 6.86 7.58 8.34 9.14 9.97 10.84
10.84 0.3487 0.3843 0.4119 0.4271 0.4352 0.4424 0.4443 0.4454 0.4469 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470
9.97 0.3487 0.3843 0.4119 0.4271 0.4352 0.4424 0.4443 0.4454 0.4469 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470
9.14 0.3487 0.3843 0.4119 0.4271 0.4352 0.4424 0.4443 0.4454 0.4469 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470
8.34 0.3487 0.3843 0.4119 0.4271 0.4352 0.4424 0.4443 0.4454 0.4469 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470
7.58 0.3487 0.3843 0.4119 0.4271 0.4352 0.4424 0.4443 0.4454 0.4469 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470
6.86 0.3487 0.3843 0.4119 0.4271 0.4352 0.4424 0.4443 0.4454 0.4469 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470 0.4470
6.18 0.3449 0.3799 0.4057 0.4208 0.4282 0.4364 0.4385 0.4400 0.4418 0.4416 0.4420 0.4429 0.4432 0.4440 0.4444 0.4447 0.4455 0.4462 0.4469 0.4469 0.4470 0.4469 0.4469 0.4470
Communication entropies : H(c)
5.53 0.3377 0.3698 0.3947 0.4120 0.4205 0.4290 0.4316 0.4326 0.4353 0.4356 0.4359 0.4378 0.4391 0.4403 0.4415 0.4422 0.4435 0.4452 0.4463 0.4465 0.4469 0.4470 0.4469 0.4469
4.91 0.3294 0.3588 0.3888 0.4028 0.4111 0.4191 0.4217 0.4241 0.4253 0.4277 0.4308 0.4321 0.4333 0.4367 0.4377 0.4398 0.4417 0.4437 0.4455 0.4464 0.4472 0.4470 0.4469 0.4468
Direct communication
4.33 0.3153 0.3557 0.3767 0.3901 0.4016 0.4079 0.4115 0.4156 0.4167 0.4193 0.4222 0.4269 0.4280 0.4311 0.4352 0.4365 0.4401 0.4418 0.4449 0.4461 0.4470 0.4469 0.4468 0.4468
3.79 0.3069 0.3415 0.3633 0.3801 0.3890 0.3976 0.4016 0.4046 0.4080 0.4125 0.4153 0.4183 0.4219 0.4264 0.4295 0.4350 0.4380 0.4407 0.4436 0.4460 0.4471 0.4471 0.4469 0.4471
3.29 0.3002 0.3349 0.3544 0.3709 0.3760 0.3844 0.3912 0.3942 0.3985 0.4023 0.4068 0.4100 0.4179 0.4216 0.4276 0.4326 0.4360 0.4407 0.4431 0.4458 0.4467 0.4470 0.4469 0.4471
2.82 0.2826 0.3124 0.3395 0.3561 0.3614 0.3732 0.3780 0.3822 0.3873 0.3944 0.3979 0.4047 0.4103 0.4163 0.4229 0.4306 0.4342 0.4390 0.4430 0.4458 0.4470 0.4470 0.4467 0.4469
2.38 0.2713 0.3035 0.3237 0.3433 0.3503 0.3590 0.3651 0.3716 0.3741 0.3829 0.3906 0.3965 0.4030 0.4113 0.4174 0.4259 0.4315 0.4383 0.4422 0.4454 0.4466 0.4475 0.4468 0.4468
1.99 0.2624 0.2916 0.3074 0.3278 0.3395 0.3483 0.3535 0.3588 0.3656 0.3713 0.3827 0.3898 0.3989 0.4032 0.4136 0.4241 0.4309 0.4375 0.4420 0.4454 0.4469 0.4472 0.4470 0.4468
1.62 0.2444 0.2764 0.2979 0.3130 0.3273 0.3341 0.3413 0.3507 0.3547 0.3640 0.3756 0.3830 0.3918 0.4025 0.4108 0.4212 0.4288 0.4358 0.4413 0.4444 0.4468 0.4471 0.4467 0.4468
1.30 0.2303 0.2669 0.2875 0.3060 0.3114 0.3237 0.3316 0.3365 0.3466 0.3548 0.3652 0.3765 0.3830 0.3951 0.4072 0.4140 0.4263 0.4357 0.4410 0.4451 0.4474 0.4472 0.4468 0.4468
1.01 0.2201 0.2494 0.2714 0.2913 0.2981 0.3104 0.3202 0.3285 0.3365 0.3434 0.3561 0.3671 0.3789 0.3905 0.4033 0.4158 0.4231 0.4338 0.4397 0.4442 0.4468 0.4475 0.4466 0.4469
0.76 0.2100 0.2408 0.2586 0.2780 0.2876 0.2969 0.3071 0.3184 0.3225 0.3362 0.3468 0.3627 0.3735 0.3849 0.3994 0.4098 0.4211 0.4311 0.4397 0.4445 0.4468 0.4475 0.4468 0.4470
0.54 0.1959 0.2220 0.2405 0.2616 0.2734 0.2800 0.2955 0.3037 0.3124 0.3251 0.3390 0.3536 0.3663 0.3825 0.3967 0.4108 0.4211 0.4304 0.4385 0.4447 0.4469 0.4476 0.4471 0.4467
0.36 0.1750 0.2062 0.2265 0.2386 0.2591 0.2728 0.2820 0.2909 0.3023 0.3170 0.3298 0.3484 0.3627 0.3768 0.3910 0.4050 0.4189 0.4283 0.4382 0.4432 0.4469 0.4477 0.4468 0.4469
0.22 0.1536 0.1798 0.2008 0.2240 0.2412 0.2551 0.2689 0.2824 0.2952 0.3089 0.3214 0.3379 0.3517 0.3732 0.3878 0.4045 0.4176 0.4267 0.4372 0.4428 0.4470 0.4473 0.4469 0.4466
0.11 0.1245 0.1577 0.1780 0.1981 0.2143 0.2311 0.2505 0.2646 0.2784 0.2950 0.3129 0.3314 0.3507 0.3665 0.3844 0.4005 0.4141 0.4290 0.4373 0.4434 0.4471 0.4476 0.4464 0.4470
0.04 0.0879 0.1194 0.1405 0.1642 0.1803 0.2043 0.2236 0.2455 0.2636 0.2847 0.3026 0.3240 0.3427 0.3635 0.3782 0.3971 0.4133 0.4273 0.4367 0.4435 0.4470 0.4478 0.4469 0.4463
0.04 0.11 0.22 0.36 0.54 0.76 1.01 1.30 1.62 1.99 2.38 2.82 3.29 3.79 4.33 4.91 5.53 6.18 6.86 7.58 8.34 9.14 9.97 10.84
10.84 0.7978 0.7978 0.7978 0.7978 0.7978 0.7978 0.7978 0.7978 0.7978 0.7978 0.7978 0.7978 0.7978 0.7978 0.7978 0.7978 0.7978 0.7978 0.7978 0.7978 0.7978 0.7978 0.7978 0.7978
9.97 0.7977 0.7977 0.7977 0.7978 0.7977 0.7978 0.7978 0.7978 0.7978 0.7977 0.7978 0.7978 0.7977 0.7977 0.7977 0.7977 0.7978 0.7977 0.7978 0.7978 0.7977 0.7977 0.7978 0.7978
9.14 0.7976 0.7976 0.7976 0.7977 0.7976 0.7977 0.7977 0.7976 0.7976 0.7977 0.7978 0.7977 0.7977 0.7977 0.7977 0.7977 0.7978 0.7978 0.7978 0.7978 0.7978 0.7978 0.7978 0.7978
8.34 0.7975 0.7975 0.7975 0.7976 0.7976 0.7976 0.7975 0.7976 0.7976 0.7976 0.7976 0.7976 0.7977 0.7977 0.7977 0.7976 0.7977 0.7978 0.7978 0.7978 0.7978 0.7977 0.7979 0.7978
7.58 0.7977 0.7975 0.7977 0.7975 0.7977 0.7974 0.7976 0.7976 0.7976 0.7975 0.7975 0.7977 0.7976 0.7977 0.7978 0.7978 0.7978 0.7979 0.7977 0.7979 0.7980 0.7978 0.7978 0.7978
6.86 0.7973 0.7976 0.7975 0.7974 0.7973 0.7973 0.7976 0.7975 0.7977 0.7976 0.7975 0.7976 0.7973 0.7976 0.7974 0.7978 0.7979 0.7979 0.7979 0.7980 0.7979 0.7979 0.7978 0.7979
6.18 0.7973 0.7971 0.7973 0.7969 0.7972 0.7971 0.7976 0.7973 0.7973 0.7974 0.7971 0.7973 0.7972 0.7976 0.7976 0.7976 0.7976 0.7976 0.7977 0.7980 0.7981 0.7978 0.7979 0.7979
Communication entropies : H(c)
5.53 0.7973 0.7972 0.7971 0.7976 0.7973 0.7971 0.7974 0.7974 0.7974 0.7975 0.7972 0.7975 0.7974 0.7972 0.7972 0.7979 0.7975 0.7979 0.7979 0.7976 0.7978 0.7978 0.7978 0.7979
4.91 0.7966 0.7964 0.7966 0.7966 0.7965 0.7968 0.7970 0.7972 0.7966 0.7970 0.7969 0.7967 0.7970 0.7975 0.7974 0.7974 0.7974 0.7978 0.7979 0.7976 0.7980 0.7978 0.7978 0.7979
4.33 0.7958 0.7961 0.7960 0.7960 0.7960 0.7962 0.7964 0.7964 0.7966 0.7968 0.7968 0.7968 0.7974 0.7975 0.7969 0.7978 0.7977 0.7977 0.7983 0.7981 0.7980 0.7982 0.7981 0.7981
3.79 0.7950 0.7956 0.7953 0.7956 0.7954 0.7955 0.7958 0.7955 0.7958 0.7963 0.7967 0.7960 0.7959 0.7972 0.7967 0.7970 0.7969 0.7970 0.7973 0.7976 0.7979 0.7982 0.7977 0.7979
Imitation
3.29 0.7937 0.7952 0.7945 0.7947 0.7951 0.7938 0.7947 0.7949 0.7949 0.7957 0.7952 0.7949 0.7955 0.7966 0.7960 0.7962 0.7968 0.7972 0.7971 0.7979 0.7981 0.7979 0.7979 0.7979
2.82 0.7916 0.7913 0.7915 0.7931 0.7916 0.7922 0.7925 0.7934 0.7922 0.7944 0.7929 0.7943 0.7938 0.7955 0.7953 0.7954 0.7962 0.7962 0.7973 0.7970 0.7988 0.7976 0.7975 0.7976
2.38 0.7851 0.7834 0.7821 0.7846 0.7833 0.7856 0.7859 0.7849 0.7837 0.7852 0.7871 0.7895 0.7900 0.7900 0.7914 0.7937 0.7932 0.7948 0.7968 0.7971 0.7983 0.7976 0.7972 0.7975
1.99 0.7610 0.7593 0.7615 0.7633 0.7619 0.7593 0.7587 0.7654 0.7641 0.7618 0.7702 0.7655 0.7721 0.7737 0.7793 0.7815 0.7882 0.7918 0.7950 0.7958 0.7972 0.7977 0.7968 0.7970
1.62 0.7127 0.7040 0.7115 0.7101 0.7045 0.7092 0.7086 0.7039 0.7062 0.7140 0.7183 0.7253 0.7337 0.7497 0.7555 0.7615 0.7754 0.7809 0.7895 0.7913 0.7958 0.7961 0.7953 0.7952
1.30 0.6097 0.6166 0.6097 0.6138 0.6055 0.6237 0.6146 0.6345 0.6267 0.6436 0.6527 0.6614 0.6854 0.7006 0.7162 0.7322 0.7526 0.7692 0.7812 0.7874 0.7930 0.7925 0.7917 0.7921
1.01 0.5052 0.5024 0.5071 0.5055 0.5126 0.5064 0.5218 0.5310 0.5573 0.5743 0.5996 0.6151 0.6418 0.6680 0.6911 0.7166 0.7427 0.7603 0.7730 0.7844 0.7891 0.7912 0.7902 0.7898
0.76 0.4050 0.4179 0.4161 0.4078 0.4130 0.4397 0.4653 0.4780 0.5062 0.5369 0.5583 0.5941 0.6191 0.6506 0.6790 0.7054 0.7311 0.7566 0.7704 0.7792 0.7867 0.7874 0.7867 0.7867
0.54 0.3244 0.3319 0.3414 0.3232 0.3570 0.3796 0.4192 0.4419 0.4697 0.5038 0.5380 0.5718 0.6015 0.6442 0.6649 0.6982 0.7248 0.7471 0.7634 0.7771 0.7829 0.7837 0.7818 0.7815
0.36 0.2465 0.2639 0.2783 0.2940 0.3215 0.3555 0.3781 0.4181 0.4469 0.4841 0.5207 0.5547 0.5910 0.6277 0.6591 0.6891 0.7136 0.7397 0.7580 0.7699 0.7783 0.7774 0.7772 0.7771
0.22 0.1997 0.2101 0.2366 0.2667 0.2938 0.3299 0.3571 0.3970 0.4341 0.4675 0.5009 0.5394 0.5784 0.6125 0.6492 0.6783 0.7058 0.7336 0.7499 0.7657 0.7713 0.7726 0.7722 0.7725
0.11 0.1589 0.1932 0.2122 0.2377 0.2735 0.3148 0.3483 0.3819 0.4159 0.4624 0.4988 0.5302 0.5725 0.6052 0.6417 0.6744 0.7008 0.7270 0.7463 0.7599 0.7662 0.7676 0.7668 0.7671
0.04 0.1319 0.1672 0.2036 0.2318 0.2640 0.3031 0.3337 0.3740 0.4166 0.4458 0.4883 0.5272 0.5632 0.5998 0.6343 0.6666 0.6975 0.7198 0.7419 0.7544 0.7615 0.7625 0.7615 0.7620
Fig. 1. Test scores Υ (Ho , Hc ) for different values of H(o) and H(c) (in bits), for the
same collective of agents using the communication modes described in Sect. 2.3. The
gray color-map intensities reflect how high the score values Υ (Ho , Hc ) are, where higher
intensities mean larger scores (higher values are black and lower values are white). We
consider the small variations in the scores along the fourth decimal place as experi-
mental error.
Observation, Communication and Intelligence in Agent-Based Systems 55
H(o) H(c)
0.1 0.3 0.7 1.3 1.9 2.8 3.7 4.9 6.1 7.5 9.1 10.8 0.1 0.3 0.7 1.3 1.9 2.8 3.7 4.9 6.1 7.5 9.1 10.8
0.5 0.5
0.45 0.45
0.4 0.4
0.35 0.35
0.3 0.3
Score
Score
0.25 0.25
0.2 0.2
0.15 0.15
0.1 0.1
0.05 0.05
0 0
0.45 0.45
0.4 0.4
0.35 0.35
0.3 0.3
Score
Score
0.25 0.25
0.2 0.2
0.15 0.15
0.1 0.1
0.05 0.05
0 0
0.7 0.7
0.6 0.6
Score
Score
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
Fig. 2. Variation in the scores (from Fig. 1) of collective Π using different communica-
tion strategies. The scores are plotted for fixed values of H(c) across increasing values
of H(o) (left-side plots of Fig. 2), as well as for fixed values of H(o) across increasing
values of H(c) (right-side plots of Fig. 2).
56 N. Chmait et al.
Indirect Communication
0.5 0.5
Average score
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0.1 0.3 0.7 1.3 1.9 2.8 3.7 4.9 6.1 7.5 9.1 10.8 0.1 0.3 0.7 1.3 1.9 2.8 3.7 4.9 6.1 7.5 9.1 10.8
Direct Communication
0.5 0.5
Average score
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0.1 0.3 0.7 1.3 1.9 2.8 3.7 4.9 6.1 7.5 9.1 10.8 0.1 0.3 0.7 1.3 1.9 2.8 3.7 4.9 6.1 7.5 9.1 10.8
0.6 0.8
Average score
0.4 0.6
Imitation
0.3 0.4
0.2 0.2
Fig. 3. Whisker plot showing the variation in test scores across different entropy values
H(c) for fixed entropies H(o) (left-side), and vice versa (right-side). The central mark
(in red) is the median while the edges of the box represent the 25th and 75th percentiles
of the scores and the whiskers extend to the most extreme score values. The blue line-
plot shows the average scores at each of the intermediate entropy values.
The outcome from (1) highlights the entropies where (in a environment of
uncertainty H(μ) = 11.28 bits, and |Π| = 20) communication has the high-
est influence on the effectiveness Υ (Ho , Hc ) of Π when compared to the influ-
ence of observation. We observe that indirect communication has highest impact
across entropies of [0.3, 1.9] bits. Direct communication is most significant within
entropies of [0.1, 1.9] bits, while imitation has the highest influence over entropy
values in the range [0.7, 4.9] bits1 .
Fig. 4. Average difference in gradient ∇Υ (Ho , Hc ) in H(c) and H(o) directions over a
set of entropy values E
faster in environments of lower uncertainty and the gap in performance was less
significant than in environments of high uncertainty.
Number of Agents. Testing with different number of agents also influenced the
performance of the evaluated collectives. The influence of communication on the
scores was stronger in many cases where a larger number of agents was used.
5 Conclusion
This paper follows an information-theoretical approach to quantify and ana-
lyze the effectiveness of a collaborative group of artificial agents across different
communication settings. Using formal intelligence tests from the literature of
artificial general intelligence, we measure the influence of two factors inherent to
multiagent systems: the observation and communication abilities of agents, on
the overall intelligence of the evaluated system.
Agents collaborating using three different communication strategies are eval-
uated over a series of intelligence tests, and their scores are recorded. We high-
light the different configurations where the effectiveness of artificial agent-based
systems is significantly influenced by communication and observation. We also
show that dull systems with low observation or perception abilities can be re-
compensated for, and significantly improved, by increasing the communication
entropies between the agents, thus leading to smarter systems. Moreover, we
identify circumstances where the increase in communication does not monoton-
ically improve performance. We also analyze the dependency between commu-
nication and observation and its impact on the overall performance.
The outcome from our experiments can have many theoretical and practical
implications on agent-based systems as they allow us to predict the effectiveness
and the expected performance of these systems over different (communication
Observation, Communication and Intelligence in Agent-Based Systems 59
References
1. Bettencourt, L.M.A.: The Rules of Information Aggregation and Emergence of
Collective Intelligent Behavior. Topics in Cognitive Science 1(4), 598–620 (2009).
http://dx.doi.org/10.1111/j.1756-8765.2009.01047.x
2. Chmait, N., Dowe, D.L., Green, D.G., Li, Y.F., Insa-Cabrera, J.: Measuring univer-
sal intelligence in agent-based systems using the anytime intelligence test. Tech.
Rep. 2015/279, Faculty of Information Technology, Clayton, Monash University
(2015). http://www.csse.monash.edu.au/publications/2015/tr-2015-279-full.pdf
3. Dowe, D.L., Hernández-Orallo, J., Das, P.K.: Compression and intelligence: social
environments and communication. In: Schmidhuber, J., Thórisson, K.R., Looks,
M. (eds.) AGI 2011. LNCS, vol. 6830, pp. 204–211. Springer, Heidelberg (2011).
http://dx.doi.org/10.1007/978-3-642-22887-2 21
4. Fallenstein, B., Soares, N.: Problems of self-reference in self-improving space-time
embedded intelligence. In: Goertzel, B., Orseau, L., Snaider, J. (eds.) AGI 2014.
LNCS, vol. 8598, pp. 21–32. Springer, Heidelberg (2014). http://dx.doi.org/10.
1007/978-3-319-09274-4 3
5. Franklin, S., Graesser, A.: Is it an agent, or just a program?: A taxonomy for
autonomous agents. In: Müller, J.P., Wooldridge, M.J., Jennings, N.R. (eds.) ECAI-
WS 1996 and ATAL 1996. LNCS, vol. 1193, pp. 21–35. Springer, Heidelberg (1997).
http://dx.doi.org/10.1007/BFb0013570
6. Hernández-Orallo, J., Dowe, D.L.: Measuring universal intelligence: Towards
an anytime intelligence test. Artif. Intell. 174(18), 1508–1539 (2010).
http://dx.doi.org/10.1016/j.artint.2010.09.006
7. Insa-Cabrera, J., Benacloch-Ayuso, J.-L., Hernández-Orallo, J.: On measuring
social intelligence: experiments on competition and cooperation. In: Bach, J.,
Goertzel, B., Iklé, M. (eds.) AGI 2012. LNCS, vol. 7716, pp. 126–135. Springer,
Heidelberg (2012). http://dx.doi.org/10.1007/978-3-642-35506-6 14
8. Legg, S., Hutter, M.: Universal intelligence: A definition of machine intelligence.
Minds and Machines 17(4), 391–444 (2007)
9. Panait, L., Luke, S.: Cooperative multi-agent learning: The state of the
art. Autonomous Agents and Multi-Agent Systems 11(3), 387–434 (2005).
http://dx.doi.org/10.1007/s10458-005-2631-2
10. Shannon, C.: A mathematical theory of communication. Bell System Technical
Journal 27(3), 379–423 (1948)
11. Weyns, D., Steegmans, E., Holvoet, T.: Towards active perception in situated
multiagent systems. Applied Artificial Intelligence 18(9–10), 867–883 (2004).
http://dx.doi.org/10.1080/08839510490509063
12. Wooldridge, M., Jennings, N.R.: Intelligent agents: Theory and practice. The
Knowledge Engineering Review 10(2), 115–152 (1995)
Reflective Variants of Solomonoff
Induction and AIXI
1 Introduction
Legg and Hutter [5] have defined a “Universal measure of intelligence” that
describes the ability of a system to maximize rewards across a wide range of
diverse environments. This metric is useful when attempting to quantify the
cross-domain performance of modern AI systems, but it does not quite capture
the induction and interaction problems faced by generally intelligent systems
acting in the real world: In the formalism of Legg and Hutter (as in many other
agent formalisms) the agent and the environment are assumed to be distinct and
separate, while real generally intelligent systems must be able to learn about and
manipulate an environment from within.
As noted by Hutter [4], Vallinder [9], and others, neither Solomonoff induc-
tion [8] nor AIXI [3] can capture this aspect of reasoning in the real world. Both
c Springer International Publishing Switzerland 2015
J. Bieger (Ed.): AGI 2015, LNAI 9205, pp. 60–69, 2015.
DOI: 10.1007/978-3-319-21365-1 7
Reflective Variants of Solomonoff Induction and AIXI 61
formalisms require that the reasoner have more computing power than any indi-
vidual environment hypothesis that the reasoner considers: a Solomonoff induc-
tor predicting according to a distribution over all computable hypotheses is not
itself computable; an AIXI acting according to some distribution over environ-
ments uses more computing power than any one environment in its distribution.
This is also true of computable approximations of AIXI, such as AIXItl . Thus,
these formalisms cannot easily be used to make models of reasoners that must
reason about an environment which contains the reasoner and/or other, more
powerful reasoners. Because these reasoners require more computing power than
any environment they hypothesize, environments which contain the reasoner are
not in their hypothesis space!
In this paper, we extend the Solomonoff induction formalism and the AIXI
formalism into a setting where the agents reason about the environment while
embedded within it. We do this by studying variants of Solomonoff induction and
AIXI using probabilistic oracle machines rather than Turing machines, where a
probabilistic oracle machine is a Turing machine that can flip coins and make
calls to an oracle. Specifically, we make use of probabilistic oracle machines with
access to a “reflective oracle” [2] that answers questions about other probabilistic
oracle machines using the same oracle. This allows us to define environments
which may contain agents that in turn reason about the environment which
contains them.
Section 2 defines reflective oracles. Section 3 gives a definition of Solomonoff
induction on probabilistic oracle machines. Section 4 gives a variant of AIXI in
this setting. Section 5 discusses these results, along with a number of avenues
for future research.
2 Reflective Oracles
Our goal is to define agents which are able to reason about environments con-
taining other, equally powerful agents. If agents and environments are simply
Turing machines, and two agents try to predict their environments (which con-
tain the other agent) by simply running the corresponding machines, then two
agents trying to predict each other will go into an infinite loop.
One might try to solve this problem by defining agents to be Turing machines
with access to an oracle, which takes the source code of an oracle machine as
input and which outputs what this machine would output when run on the same
oracle. (The difference to simply running the machine would be that the oracle
would always return an answer, never go into an infinite loop.) Then, instead of
predicting the environment by running the corresponding oracle machine, agents
would query the oracle about this machine. However, it’s easy to see that such
an oracle cannot exist, for reasons similar to the halting problem: if it existed,
then by quining, one could write a program that queries the oracle about its own
output, and returns 0 iff the oracle says it returns 1, and returns 1 otherwise.
It is possible to get around this problem by allowing the oracle to give ran-
dom answers in certain, restricted circumstances. To do so, we define agents
62 B. Fallenstein et al.
Note that if T O (x) is guaranteed to output a bit, then p must be exactly the
probability P(T O (x) = 1) that T O (x) returns 1. If T O (x) sometimes fails to halt,
then the oracle can, in a sense, be understood to “redistribute” the probability
that the machine goes into an infinite loop between the two possible outputs:
it answers queries as if T O (x) outputs 1 with probability p, where p is lower-
bounded by the true probability of outputting 1, and upper-bounded by the
probability of outputting 1 or looping.
If q = p, then P(O(T, x, q) = 1) may be any number between 0 and 1; this
is essential in order to avoid paradox. For example, consider the probabilistic
oracle machine which asks the oracle which bit it itself is most likely to output,
and outputs the opposite bit. In this case, a reflective oracle may answer 1 with
probability 0.5, so that the agent outputs each bit with equal probability. In
fact, given this flexibility, a consistent solution always exists.
Theorem 1. A reflective oracle exists.
Proof. Appendix B of Fallenstein, Taylor, and Christiano [2].
Reflective Variants of Solomonoff Induction and AIXI 63
Because rSI always terminates, it defines a distribution PrSI ∈ Δ(Bω ) over infinite
bit strings, where PrSI (x) is the probability that rSI generates the string x (when
run on the first n bits to generate the n + 1th bit). This distribution satisfies the
essential property of a simplicity distribution, namely, that each environment T
is represented somewhere within this distribution.
Theorem 2. For each probabilistic oracle machine T , there is a constant CT
such that for all finite bit strings x ∈ B<ω ,
PrSI (x) ≥ CT · PT (x) (4)
where PT (x) is the probability of T generating the sequence x (when run on the
first n bits to generate the n + 1th bit).
Proof. First note that
len(x)
O
PT (x) ≤ getStringProb (T, , x) = getProbO (T, x1:i−1 , xi ), (5)
i=0
Reflective Variants of Solomonoff Induction and AIXI 65
with equality on the left if T O (y) is guaranteed to produce an output bit for
every prefix y of x. Then, the result follows from the fact that by construction,
sampling a bit string from rSIO is equivalent to choosing a random machine T
with probability proportional to 2−len(T ) and then sampling bits according to
getProbO (T, ·, ·).
Reflective Solomonoff induction does itself have the type of an environment,
and hence is included in the simplicity distribution over environments. Indeed,
it is apparent that reflective Solomonoff induction can be used to predict its
own behavior—resulting in behavior that is heavily dependent upon the choice
of reflective oracle and the encoding of machines as bit strings, of course. But
more importantly, there are also environments in this distribution which run
Solomonoff induction as a subprocess: that is, this variant of Solomonoff induc-
tion can be used to predict environments that contain Solomonoff inductors.
4 Reflective AIXI
With reflective Solomonoff induction in hand, we may now define a reflective
agent, by giving a variant of AIXI that runs on probabilistic oracle machines. To
do this, we fix a finite set O of observations, together with a prefix-free encoding
of observations as bit strings. Moreover, we fix a function r : O → [0, 1] which
associates to each o ∈ O a (computable) reward r(o). Without loss of generality,
we assume that the agent has only two available actions, 0 and 1.
Reflective AIXI will assume that an environment is a probabilistic oracle
machine which takes a finite string of observation/action pairs and produces a
new observation; that is, an environment is a machine with type (O×B)<ω O.
Reflective AIXI assumes that it gets to choose each action bit, and, given a
history oa ∈ (O × B)<ω and the latest observation o ∈ O, it outputs the bit
which gives it the highest expected (time-discounted) future reward. We will
write rt (oa) := r(fst(oat )) for the reward in the tth observation of oa.
To define reflective AIXI, we first need the function step from Algorithm 3,
which encodes the assumption that an environment can be factored into a world-
part and an agent-part, one of which produces the observations and the other
which produces the actions.
Next, we need the function reward from Algorithm 4, which computes the total
discounted reward given a world (selecting the observations), an agent (assumed
to control the actions), and the history so far. Total reward is computed using
66 B. Fallenstein et al.
With reward in hand, an agent which achieves the maximum expected (dis-
counted) reward in a given environment μ, can be defined as in rAIμ . Algo-
rithm 5 defines a machine actionRewardO (a), which computes the reward if the
agent takes action a in the next timestep and in future timesteps behaves like
the optimal agent rAIμ . It then defines a machine differenceO (), which computes
the difference in the discounted rewards when taking action 1 and when taking
action 0, then rescales this difference to the interval [0, 1] and flips a coin with
the resulting probability. Finally, rAIμ uses the oracle to determine whether the
probability that differenceO () = 1 is greater than 1/2, which is equivalent to ask-
ing whether the expectation of actionRewardO (1) is greater than the expectation
of actionRewardO (0); if the expectations are equal, the oracle may behave ran-
domly, but this is acceptable, since in this case the agent is indifferent between its
two actions. Note that Algorithm 5 references its own source code (actionReward
passes the source of rAIμ to reward); this is possible by quining (Kleene’s second
recursion theorem).
5 Conclusions
(as discussed by Orseau and Ring [6]), or agents in settings containing other
similarly powerful agents. A first step in this direction is suggested by a result of
Fallenstein, Taylor, and Christiano [2], which shows that it is possible to define
a computable version of reflective oracles, defined only on the set of probabilistic
oracles machines whose length is ≤ l and which are guaranteed to halt within a
time bound t; this appears to be exactly what is needed to translate our reflective
variant of AIXI into a reflective, computable variant of AIXItl .
References
1. Fallenstein, B., Soares, N.: Vingean reflection: Reliable reasoning for self-modifying
agents. Tech. Rep. 2015–2, Machine Intelligence Research Institute (2015). https://
intelligence.org/files/VingeanReflection.pdf
2. Fallenstein, B., Taylor, J., Christiano, P.F.: Reflective oracles: A foundation for
classical game theory. Tech. Rep. 2015–7, Machine Intelligence Research Institute
(2015). https://intelligence.org/files/ReflectiveOracles.pdf
3. Hutter, M.: Universal algorithmic intelligence. In: Goertzel, B., Pennachin, C. (eds.)
Artificial General Intelligence, pp. 227–290. Springer, Cognitive Technologies (2007)
4. Hutter, M.: Open problems in universal induction & intelligence. Algorithms 2(3),
879–906 (2009)
5. Legg, S., Hutter, M.: Universal intelligence. Minds and Machines 17(4), 391–444
(2007)
6. Orseau, L., Ring, M.: Space-Time Embedded Intelligence. In: Bach, J., Goertzel,
B., Iklé, M. (eds.) AGI 2012. LNCS, vol. 7716, pp. 209–218. Springer, Heidelberg
(2012)
7. Soares, N.: Formalizing two problems of realistic world-models. Tech. Rep. 2015–
3, Machine Intelligence Research Institute (2015). https://intelligence.org/files/
RealisticWorldModels.pdf
8. Solomonoff, R.J.: A formal theory of inductive inference. Part I. Information and
Control 7(1), 1–22 (1964)
9. Vallinder, A.: Solomonoff Induction: A Solution to the Problem of the Pri-
ors? MA thesis, Lund University (2012). http://lup.lub.lu.se/luur/download?
func=downloadFile&recordOId=3577211&fileOId=3577215
Are There Deep Reasons Underlying
the Pathologies of Today’s Deep Learning
Algorithms?
Ben Goertzel(B)
1 Introduction
In recent years “deep learning” architectures – specifically, systems that roughly
emulate the visual or auditory cortex, with a goal of carrying out image or
video or sound processing tasks – have been getting a lot of attention both in
the scientific community and the popular media. The attention this work has
received has largely been justified, due to the dramatic practical successes of
some of the research involved. In image classification, in particular (the problem
of identifying what kind of object is shown in a picture, or which person’s face
is shown in a picture), deep learning methods have been very successful, coming
reasonably close to human performance in various contexts. Current deep learn-
ing systems can be trained by either supervised or unsupervised methods, but
it’s the supervised-learning approaches that have been getting the great results
and headlines. Two good summaries of the state of the art are Juergen Schmid-
huber’s recent review with 888 references [13], and the in-process textbook by
Yoshua Bengio and his colleagues [1].
The precise definition of “deep learning” is not very clear, and the term
seems to get wider and wider as it gets more popular. Broadly, I think it works to
consider a deep learning system as a learning system consisting of adaptive units
on multiple layers, where the higher level units recognize patterns in the outputs
of the lower level units, and also exert some control over these lower-level units.
A variety of deep learning architectures exist, including multiple sorts of neural
c Springer International Publishing Switzerland 2015
J. Bieger (Ed.): AGI 2015, LNAI 9205, pp. 70–79, 2015.
DOI: 10.1007/978-3-319-21365-1 8
Deep Reasons Underlying the Pathologies of Deep Learning Algorithms 71
nets (that try to emulate the brain at various levels of precision), probabilistic
algorithms like Deep Boltzmann machines, and many others. This kind of work
has been going on since the middle of the last century. But only recently, due
to the presence of large amounts of relatively inexpensive computing power and
large amounts of freely available data for training learning algorithms, have such
algorithms really begun to bear amazing practical fruit.
A paper by Stanford and Google researchers [8], which reported work using a
deep learning neural network to recognize patterns in YouTube videos, received
remarkable press attention in 2012. One of the researchers was Andrew Ng, who
in 2014 was hired by Baidu to lead up their deep learning team. This work
yielded some fascinating examples most famously, it recognized a visual pattern
that looked remarkably like a cat. This is striking because of the well-known
prevalence of funny cat videos on Youtube. The software’s overall accuracy at
recognizing patterns in videos was not particularly high, but the preliminary
results showed exciting potential.
Another dramatic success was when Facebook, in mid-2014, reported that
they had used a deep learning system to identify faces in pictures with over 97%
accuracy [15] – essentially as high as human beings can do. The core of their
system was a Convolutional Neural Network (CNN), a pretty straightforward
textbook algorithm that bears only very loose conceptual resemblance to any-
thing “neural”. Rather than making algorithmic innovations, the main step the
Facebook engineers took was to implement their CNN on massive scale and with
massive training data. A Chinese team has since achieved even higher accura-
cies than Facebook on standard face recognition benchmarks, though they also
point out that their algorithm misses some cases that most humans would get
correctly [16].
Deep learning approaches to audition have also been very successful recently.
For a long time the most effective approach to speech-to-text was a relatively
simple technique known as “Hidden Markov Models” or HMMs. HMMs appear
to underlie the technology of Nuance, the 800-pound gorilla of speech-to-text
companies. But in 2013 Microsoft Research published a paper indicating their
deep learning speech-to-text system could outperform HMMs [2]. In December
2014 Andrew Ng’s group at Baidu announced a breakthrough in speech pro-
cessing – a system called Deep Speech, which reportedly gives drastically fewer
errors than previous systems in use by Apple, Google and others [7].
With all these exciting results, it’s understandable that many commentators
and even some researchers have begun to think that current deep learning archi-
tectures may be the key to advanced and even human-level AGI. However, my
main goal in this article is to argue, conceptually, why this probably isn’t the
case. I will raise two objections to the hypothesis:
1. Current deep learning architectures (even vaguely) mirror the structure and
information-processing dynamics of – at best – only parts of the human
brain, not the whole human brain
2. Some (and I conjecture nearly all) current deep learning architectures display
certain pathological behaviors (e.g. confidently classifying random data as
72 B. Goertzel
My core thesis here is that these two objections are interconnected. I hypoth-
esize that the pathological behaviors are rooted in shortcomings in the inter-
nal (learned) representations of popular deep learning architectures, and these
shortcomings also make it difficult to connect these architectures with other AI
components to form integrated systems better resembling the architecturally
heterogeneous, integrative nature of the human brain.
I will also give some suggestions as to possible remedies for these problems.
In the context of my own AI work with the OpenCog AGI architecture [5] [6], I
find it interesting to note that, of Ohlsson’s principles of deep learning, only one
(“Representations are created via layers of processing units”) does not apply to
OpenCog’s AtomSpace knowledge store, a heterogeneously structured weighted,
labeled hypergraph. So to turn OpenCog into a deep learning system in Ohlsson’s
sense, it would suffice to arrange some OpenCog Nodes into layers of processing
units. Then the various OpenCog learning dynamics including, e.g. Probabilistic
Logic Networks reasoning, which is very different in spirit from currently popular
deep learning architectures would become “deep learning” dynamics.
Of course, restricting the network architecture to be a hierarchy doesn’t actu-
ally make the learning or the network any more deep. A more freely structured
hypergraph like the general OpenCog Atomspace is just as deep as a deep learn-
ing network, and has just as much (or more) complex dynamics. The point of
hierarchical architectures for visual and auditory data processing is mainly that,
in these particular sensory data processing domains, one is dealing with infor-
mation that has a pretty strict hierarchical structure to it. It’s very natural to
decompose a picture into subregions, subsubregions and so forth; and to define an
interval of time (in which e.g. sound or video occurs) into subintervals of times.
As we are dealing with space and time which have natural geometric structures,
we can make a fixed processing-unit hierarchy that matches the structure of space
and time lower-down units in the hierarchy dealing with smaller spatiotemporal
regions; parent units dealing with regions that include the regions dealt with
by their children; etc. For this kind of spatiotemporal data processing, a fairly
rigid hierarchical structure makes a lot of sense (and seems to be what the brain
uses). For other kinds of data, like the semantics of natural language or abstract
philosophical thinking or even thinking about emotions and social relationships,
this kind of rigid hierarchical structure seems much less useful, and in my view
a more freely-structured architecture may be more appropriate.
In the human brain, it seems the visual and auditory cortices have a very
strong hierarchical pattern of connectivity and information flow, whereas the
olfactory cortex has more of a wildly tangled-up, “combinatory” pattern. This
combinatory pattern of neural connectivity helps the olfactory cortex to rec-
ognize smells using complex, chaotic dynamics, in which each smell represents
an “attractor state” of the oflactory cortex’s nonlinear dynamics (as neurosci-
entist Walter Freeman has argued in a body of work spanning decades [10]).
The portions of the cortex dealing with abstract cognition have a mix of hierar-
chical and combinatory connectivity patterns, probably reflecting the fact that
they do both hierarchy-focused pattern recognition as we see in vision and audi-
tion, and attractor-based pattern recognition as we see in olfaction. But this is
74 B. Goertzel
largely speculation most likely, until we can make movies somehow of the neural
dynamics corresponding to various kinds of cognition, we won’t really know how
these various structural and dynamical patterns come together to yield human
thinking.
My own view is that for anything resembling a standard 2015-style deep
learning system (say, a convolutional neural net, stacked autoencoder, etc.) to
achieve anything like human-level intelligence, major additions would have to be
made, involving various components that mix hierarchical and more heteroge-
neous network structures in various ways. For example: Take “episodic memory”
(your life story, and the events in it), as opposed to less complex types of memory.
The human brain is known to deal with the episodic memory quite differently
from the memory of images, facts, or actions. Nothing, in currently popular
architectures commonly labeled “deep learning”, tells you anything about how
episodic memory works. Some deep learning researchers (based on my personal
experience in numerous conversations with them!) would argue that the ability
to deal with episodic memories effectively will just emerge from their hierarchies,
if their systems are given enough perceptual experience. It’s hard to definitively
prove this is wrong, because these models are all complex dynamical systems,
which makes it difficult to precisely predict their behavior. Still, according to
the best current neuroscience knowledge [3], the brain doesn’t appear to work
this way; episodic memory has its own architecture, different in specifics from
the architectures of visual or auditory perception. I suspect that if one wanted
to build a primarily brain-like AGI system, one would need to design (not neces-
sarily strictly hierarchical) circuits for episodic memory, plus dozens to hundreds
of other specialized subsystems.
Even if current deep learning architectures are limited in scope, they could still
be ideal solutions for certain aspects of the AGI problem, e.g. visual and auditory
data processing. In fact, though, they seem to be subject to certain pathologies –
and these pathologies seem (though have not been demonstrated) to be related to
properties that would make it difficult to integrate these architectures into multi-
component AGI architectures.
In a paper titled “Deep Neural Networks are Easily Fooled: High Confidence
Predictions for Unrecognizable Images” [11], one group of researchers showed
they could construct images that looked random to the human eye, but that
were classified by a CNN deep learning vision network as representing particular
kinds of objects, with high confidence. So, a picture that looks like random noise
to any person, might look exactly like a frog or a cup to the CNN. We may call
this the random images pathology.
Another group, in a paper titled “Intriguing properties of neural networks”
[14], showed that by making a very small perturbation to a correctly classified
Deep Reasons Underlying the Pathologies of Deep Learning Algorithms 75
Fig. 1. From Examples of images that are unrecognizable to humans, but that state-
of-the-art deep neural networks trained on the standard ImageNet image collection
believe with ≥ 99.6% certainty to be a familiar object. From [11].
image, they could cause the deep network to misclassify the image. The pertur-
bations in question were so small that humans wouldn’t even notice. We may
call this the brittleness pathology.
Now, these two odd phenomena have no impact on practical performance
of convolutional neural networks. So one could view them as just being math-
ematical pathologies found by computer science geeks with too much time on
their hands. The first pathology is pragmatically irrelevant because a real-world
vision system is very unlikely to ever be shown weird random pictures that just
happen to trick it into thinking it’s looking at some object (most weird random
pictures won’t look like anything to it). The second one is pragmatically irrele-
vant because the variations of correctly classified pictures that will be strangely
misclassified, are very few in number. Most variations would be correctly clas-
sified. So these pathologies will not significantly affect classification accuracy
statistics. Further, these pathologies have only been demonstrated for CNNs – I
suspect they are not unique to CNNs and would also occur for other currently
popular deep learning architectures like stacked autoencoders but this has not
been demonstrated.
But I think these pathologies are telling us something. They are telling us
that, fundamentally, these deep learning algorithms are not generalizing the
way that people do. They are not classifying images based on the same kinds
of patterns that people are. They are “overfitting” in a very subtle way not
overfitting to the datasets on which they’ve been trained, but rather overfitting to
the kind of problem they’ve been posed. In these examples, these deep networks
have been asked to learn models with high classification accuracy on image
databases and they have done so. They have not been asked to learn models
76 B. Goertzel
Fig. 2. All images in the right column are incorrectly classified as ostriches by the CNN
in question. The images in the left column are correctly classified. The middle column
shows the difference between the left and right column. From [14].
that capture patterns in images in a more generally useful way, that would be
helpful beyond the image classification task and so they have not done that.
When a human recognizes an image as containing a dog, it recognizes the
eyes, ears and nose and fur, for example. Because of this, if a human recognized
the image on the bottom left of the right image array in Figure 188 as a dog, it
would surely recognize the image on the bottom right of the right image array
as a dog as well. But a CNN is recognizing the bottom left image differently
than a human in a way that fundamentally generalizes differently, even if this
difference is essentially irrelevant for image classification accuracy.
I strongly suspect there is a theorem lurking here, stating in some way that
these kinds of conceptually pathological classification errors will occur if and only
if the classification model learning algorithm fails to recognize the commonly
humanly recognizable high level features of the image (e.g. eyes, ears, nose, fur
in the dog example). Informally, what I suspect is: The reason these pathologies
occur is that these deep networks are not recognizing the “intuitively right” pat-
terns in the images. They are achieving accurate classification by finding clever
combinations of visual features that let them distinguish one kind of picture from
another but these clever combinations don’t include a humanly meaningful decom-
position of the image into component parts, which is the kind of “hierarchical deep
pattern recognition” a human’s brain does on looking at a picture.
There are other kinds of AI computer vision algorithms that do a better job
of decomposing images into parts in an intuitive way. Stochastic image grammars
[17] are one good example. However, these algorithms are more complicated and
more difficult to implement scalably than CNNs and other currently popular deep
learning algorithms, and so they have not yet yielded equally high quality image
classification results. They are currently being developed only minimally, whereas
CNNs and their ilk are being extremely heavily funded in the tech industry.
Deep Reasons Underlying the Pathologies of Deep Learning Algorithms 77
Fig. 3. Illustrative example of an image grammar for a simple object. Image grammar
based methods have been used for object classification as well, though not yet with
comparable accuracy to, say, CNNs or stacked autoencoders. From [11].
If true, this could be useful for studying the pathologies and how to eliminate
them, especially in conjunction with the proposition suggested below.
Proposition 2. For a deep learning hierarchy to avoid the brittleness and ran-
dom images pathologies (on a corpus generated from an image grammar, or on
a corpus of natural images), there would need to be a reasonably straightforward
mapping from recognizable activity patterns on the different layers, to elements
of a reasonably simple image grammar, so that via looking at the activity pat-
terns on each layer when the network was exposed to a certain image, one could
read out the “image grammar decomposition” of the elements of the image. For
instance, if one applied the deep learning network to a corpus images generated
from a commonsensical image grammar, then the deep learning system would
need to learn an internal state in reaction to an image, from which the image-
grammar decomposition of the image was easily decipherable.
References
1. Bengio, Y., Goodfellow, I.J., Courville, A.: Deep learning (2015). http://www.iro.
umontreal.ca/bengioy/dlbook, book in preparation for MIT Press
2. Deng, L., Li, J., Huang, J.T., Yao, K., Yu, D., Seide, F., Seltzer, M., Zweig, G.,
He, X., Williams, J., Gong, Y., Acero, A.: Recent advances in deep learning for
speech research at microsoft. In: IEEE International Conference on Acoustics,
Speech, and Signal Processing (ICASSP) (2013)
3. Gazzaniga, M.S., Ivry, R.B., Mangun, G.R.: Cognitive Neuroscience: The Biology
of the Mind. W W Norton (2009)
Deep Reasons Underlying the Pathologies of Deep Learning Algorithms 79
4. Goertzel, B.: Perception Processing for General Intelligence: Bridging the Sym-
bolic/Subsymbolic Gap. In: Bach, J., Goertzel, B., Iklé, M. (eds.) AGI 2012. LNCS,
vol. 7716, pp. 79–88. Springer, Heidelberg (2012)
5. Goertzel, B., Pennachin, C., Geisweiller, N.: Engineering General Intelligence,
Part 1: A Path to Advanced AGI via Embodied Learning and Cognitive Synergy.
Springer, Atlantis Thinking Machines (2013)
6. Goertzel, B., Pennachin, C., Geisweiller, N.: Engineering General Intelligence, Part
2: The CogPrime Architecture for Integrative, Embodied AGI. Springer, Atlantis
Thinking Machines (2013)
7. Hannun, A.Y., Case, C., Casper, J., Catanzaro, B.C., Diamos, G., Elsen,
E., Prenger, R., Satheesh, S., Sengupta, S., Coates, A., Ng, A.Y.: Deep
speech: Scaling up end-to-end speech recognition. CoRR abs/1412.5567 (2014).
http://arxiv.org/abs/1412.5567
8. Le, Q.V., Ranzato, M., Monga, R., Matthieu Devin, K.C., Corrado, G.S., Dean, J.,
Ng., A.Y.: Building high-level features using large scale unsupervised learning. In:
Proceedings of the Twenty-Ninth International Conference on Machine Learning
(2012)
9. Lee, D., Zhang, S., Biard, A., Bengio, Y.: Target propagation. CoRR abs/1412.7525
(2014). http://arxiv.org/abs/1412.7525
10. Li, G., Lou, Z., Wang, L., Li, X., Freeman, W.J.: Application of chaotic neural
model based on olfactory system on pattern recognition. ICNC 1, 378–381 (2005)
11. Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: High
confidence predictions for unrecognizable images. CoRR abs/1412.1897 (2014).
http://arxiv.org/abs/1412.1897
12. Ohlsson, S.: Deep Learning: How the Mind Overrides Experience. Cambridge
University Press (2006)
13. Schmidhuber, J.: Deep learning in neural networks: An overview. CoRR
abs/1404.7828 (2014). http://arxiv.org/abs/1404.7828
14. Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I.J.,
Fergus, R.: Intriguing properties of neural networks. CoRR abs/1312.6199 (2013).
http://arxiv.org/abs/1312.6199
15. Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: Deepface: Closing the gap to human-
level performance in face verification. In: Conference on Computer Vision and
Pattern Recognition (CVPR) (2014)
16. Zhou, E., Cao, Z., Yin, Q.: Naive-deep face recognition: Touching the limit of lfw
benchmark or not? (2014). http://arxiv.org/abs/1501.04690
17. Zhu, S.C., Mumford, D.: A stochastic grammar of images. Found. Trends. Comput.
Graph. Vis. 2(4), 259–362 (2006). http://dx.doi.org/10.1561/0600000018
Speculative Scientific Inference via Synergetic
Combination of Probabilistic Logic and
Evolutionary Pattern Recognition
1 Introduction
Conceptually founded on the “patternist” systems theory of intelligence out-
lined in [4] and implemented in the OpenCog open-source software platform, the
OpenCogPrime (OCP) cognitive architecture combines multiple AI paradigms
such as uncertain logic, computational linguistics, evolutionary program learning
and connectionist attention allocation in a unified architecture [7] [8]. Cognitive
processes embodying these different paradigms, and generating different kinds of
knowledge (e.g. declarative, procedural, episodic, sensory) interoperate together
on a common neural-symbolic knowledge store called the Atomspace. The inter-
action of these processes is designed to encourage the self-organizing emergence
of high-level network structures in the Atomspace, including superposed hierar-
chical and heterarchical knowledge networks, and a self-model network enabling
meta-knowledge and meta-learning.
c Springer International Publishing Switzerland 2015
J. Bieger (Ed.): AGI 2015, LNAI 9205, pp. 80–89, 2015.
DOI: 10.1007/978-3-319-21365-1 9
Speculative Scientific Inference via Synergetic Combination 81
Memory Types in OpenOCP. OCP’s main memory types are the declarative,
procedural, sensory, and episodic memory types that are widely discussed in cog-
nitive neuroscience [14], plus attentional memory for allocating system resources
generically, and intentional memory for allocating system resources in a goal-
directed way. Table 1 overviews these memory types, giving key references and
indicating the corresponding cognitive processes, and which of the generic pat-
ternist cognitive dynamics each cognitive process corresponds to (pattern cre-
ation, association, etc.).
The essence of the OCP design lies in the way the structures and processes
associated with each type of memory are designed to work together in a closely
coupled way, the operative hypothesis being that this will yield cooperative intel-
ligence going beyond what could be achieved by an architecture merely contain-
ing the same structures and processes in separate “black boxes.” This sort of
cooperative emergence has been labeled “cognitive synergy.” In this spirit, the
inter-cognitive-process interactions in OpenCog are designed so that conversion
between different types of memory is possible, though sometimes computation-
ally costly (e.g. an item of declarative knowledge may with some effort be inter-
preted procedurally or episodically, etc.)
Table 1. Memory Types and Cognitive Processes in OpenCog Prime. The third column
indicates the general cognitive function that each specific cognitive process carries out,
according to the patternist theory of cognition.
General Cognitive
Memory Type Specific Cognitive Processes
Functions
Probabilistic Logic Networks (PLN) [3];
Declarative pattern creation
concept blending [2]
MOSES (a novel probabilistic
Procedural evolutionary program learning algorithm) pattern creation
[12]
association, pattern
Episodic internal simulation engine [6]
creation
Economic Attention Networks (ECAN) association, credit
Attentional
[10] assignment
probabilistic goal hierarchy refined by
credit assignment,
Intentional PLN and ECAN, structured according to
pattern creation
MicroPsi [1]
association,
In OpenCogBot, this will be supplied by attention allocation,
Sensory
the DeSTIN component pattern creation,
credit assignment
applicable, e.g. the highlighting of the “rule choice” problem as the key issue in
PLN inference control (as will be discussed at the end).
– The variable containing the name of a gene, e.g. “$RAI2”, denotes the predi-
cate “Gene $RAI2 was overexpressed, i.e. expressed greater than the median
across all genes, in the gene expression dataset corresponding to a particular
person.”
– if this Boolean combination of variables is true, then the odds are higher
than average that the person is a nonagenarian rather than a control
This particular model has moderate but not outstanding statistics on the dataset
in question (precision = .6, recall = .92, accuracy = .77), and was chosen for
discussion here because of its relatively simple form.
PLN for Probabilistic Logical Inference. OCP’s primary tool for handling declar-
ative knowledge is an uncertain inference framework called Probabilistic Logic
Networks (PLN). The complexities of PLN are the topic of two lengthy technical
monographs [3] [5], and here we will eschew most details and focus mainly on
pointing out how PLN seeks to achieve efficient inference control via integration
with other cognitive processes.
As a logic, PLN is broadly integrative: it combines certain term logic rules
with more standard predicate logic rules, and utilizes both fuzzy truth values
and a variant of imprecise probabilities called indefinite probabilities. PLN math-
ematics tells how these uncertain truth values propagate through its logic rules,
so that uncertain premises give rise to conclusions with reasonably accurately
estimated uncertainty values.
PLN can be used in either forward or backward chaining mode. In backward
chaining mode, for example,
Now we give a specific example of how PLN and MOSES can be used together,
via applying PLN to generalize program trees learned by MOSES. We will use the
MOSES model given above, learned via analysis of nonagenarian gene expression
data, as an example. Further details on the specific inferences described here can
be found in online supplementary material at http://goertzel.org/BioInference.
pdf.
As the MOSES model in question is at the top level a disjunction, it’s easy
to see that, if we express the left hand side in OpenCog’s Atomese language 1
using ANDLinks, ORLinks and NOTLinks, a single application of the PLN rule
Implication
AND
OR
ListLink : $L
MemberLink
$X
$L
$X
for the third second-level clause. For the rest of our discussion here we will focus
on this clause due to its relatively small size. Of course, similar inferences to
the ones we describe here can be carried out for larger clauses and for Boolean
combinations with different structures. The PLN software deals roughly equally
well with Boolean structures of different shapes and size.
This latter implication, in the OpenCog Atomspace, actually takes the form
ImplicationLink
ANDLink
ExecutionOutputLink
SchemaNode " m a k e O v e r e x p r e s s i o n P r e d i c a t e "
GeneNode " SEMA7A "
ExecutionOutputLink
SchemaNode " m a k e O v e r e x p r e s s i o n P r e d i c a t e "
GeneNode " TJP2 "
NotLink
ExecutionOutputLink
SchemaNode " m a k e O v e r e x p r e s s i o n P r e d i c a t e "
GeneNode " ARMC10 "
PredicateNode " Nonagenarian "
where
EquivalenceLink
EvaluationLink
ExecutionOutputLink
SchemaNode " m a k e O v e r e x p r e s s i o n P r e d i c a t e "
GeneNode $G
ConceptNode $H
EvaluationLink
G r o u n d e d P r e d i c a t e N o d e " scm : above - median "
ListLink
ExecutionOutputLink
SchemaNode " m a k e E x p r e s s i o n L e v e l P r e d i c a t e "
GeneNode $G
ConceptNode $H
ConceptNode $P
EquivalenceLink
EvaluationLink
ExecutionOutputLink
SchemaNode " m a k e E x p r e s s i o n L e v e l P r e d i c a t e "
GeneNode $G
ConceptNode $H
EvaluationLink
PredicateNode " Expression level "
ListLink
GeneNode $G
ConceptNode $H
86 B. Goertzel et al.
where
– ”scm:above-median” is a helper function that evaluates if a certain predicate
(arg1) evaluated at arg2 is above the median of the set of values obtained
by applying arg1 to every member of the category arg3.
– ”makeExpressionLevelPredicate level” is a schema that outputs, for an argu-
ment $G, a predicate that is evaluated for an argument that represents an
organism, and outputs the expression of $G in that organism.
– ”Expression level” is a predicate that outputs, for arguments $G and $H,
the level of expression of $G in organism $H.
Being a nonagenarian in itself is not that interesting, but if you know the
entity in question is a human (instead of, say, a bristlecone pine tree), then it
becomes interesting indeed. This knowledge is represented via
ImplicationLink
AND
PredicateNode " Human "
PredicateNode " Nonagenarian "
PredicateNode " LongLived "
from which PLN can conclude
ImplicationLink
ANDLink
ExecutionOutputLink
PredicateNode " m a k e O v e r e x p r e s s i o n P r e d i c a t e "
GeneNode " SEMA7A "
ExecutionOutputLink
PredicateNode " m a k e O v e r e x p r e s s i o n P r e d i c a t e "
GeneNode " TJP2 "
NotLink
ExecutionOutputLink
PredicateNode " m a k e O v e r e x p r e s s i o n P r e d i c a t e "
GeneNode " ARMC10 "
PredicateNode " LongLived "
Next, how can PLN generalize this MOSES model? One route is to recog-
nize patterns spanning this model and other MOSES models in the Atomspace.
Another route, the one to be elaborated here, is cross-reference it with exter-
nal knowledge resources, such as the Gene Ontology (GO). The GO is one of
several bio knowledge resources we have imported into a bio-oriented OpenCog
Atomspace that we call the Biospace.
Each of these three genes in our example belongs to multiple GO categories,
so there are many GO-related inferences to be done regarding these genes. But
for sake of tractable exemplification, let’s just look at a few of the many GO
categories involved:
– SEMA7A is a GO:0045773 (positive regulation of axon extension)
– TJP2 is a GO:0006915 (apoptotic process)
– ARMC10 is a GO:0040008 (regulation of growth)
Speculative Scientific Inference via Synergetic Combination 87
Let us also note a relationship between the first and third of these GO categories,
drawn from the GO hierarchy:
AppendLink
ListLink : $L
ExecutionOutputLink
PredicateNode " m a k e O v e r e x p r e s s i o n P r e d i c a t e "
GeneNode " TJP2 "
PredicateNode " LongLived "
What this means, intuitively, is that combinations of TJP2 with growth-regulation
genes tends to promote longevity. This is interesting, among other reasons, because
it’s exactly the kind of abstraction a human mind might form when looking at this
kind of data.
In the above examples we have omitted quantitative truth values, which are
attached to each link, and depend on the specific parameters associated with the
PLN inference formulas. The probability associated with the final Implication-
Link above is going to be quite low, below 0.1 for any sensible parameter values.
However, this is still significantly above what one would expect for a linkage
of the same form with a random GO and gene inside it. We are not aiming to
derive definite conclusions here, only educated speculative hypotheses, able to
meaningfully guide further biological experimentation.
The “cognitive synergy” in the above may not be glaringly obvious but is
critical nonetheless. MOSES is good at learning specific data patterns, but not so
good at learning abstractions. To get MOSES to learn abstractions of this nature
would be possible but lead to significant scalability problems. On the other
hand, PLN is good at abstracting from particular data patterns, but doesn’t
have control mechanisms scalable enough to enable it to scan large datasets
and pick out the weak but meaningful patterns among the noise. MOSES is
good at this. The two algorithms working together can, empirically speaking,
create generalizations from large, complex datasets, significantly better than
either algorithm can alone.
rules to choose in what order. Choosing which nodes (e.g. GeneNodes) to include
is challenging as well but is addressed via OpenCog’s activation-spreading-like
ECAN component. Choosing which rules to apply when is not currently han-
dled effectively; but in [7] it is proposed to do this via assigning probabilities
to sequences of rule-choices (conditional on the context), thus allowing ”rule
macros” (i.e. sequences of rules) to be applied in a fairly habitual way in a given
domain of inference. But of course that is a high-level description and there will
be some devils in the details. It has been previously proposed to use pattern
mining to learn macros of this nature, and it’s clear this will be a good app-
roach and necessary in the medium term. However, a simpler approach might
be to simply run a bunch of inferences and store Markov probabilities indicating
which chains of rule-applications tended to be useful and which did not; this
might provide sufficient rule-choice guidance for ”relatively simple” inferences
like the ones given here.
References
1. Bach, J.: Principles of Synthetic Intelligence. Oxford University Press (2009)
2. Fauconnier, G., Turner, M.: The Way We Think: Conceptual Blending and the
Mind’s Hidden Complexities. Basic (2002)
3. Goertzel, B., Ikle, M., Goertzel, I., Heljakka, A.: Probabilistic Logic Networks.
Springer (2008)
4. Goertzel, B.: The Hidden Pattern. Brown Walker (2006)
5. Goertzel, B., Coelho, L., Geisweiller, N., Janicic, P., Pennachin, C.: Real World
Reasoning. Atlantis Press (2011)
6. Goertzel, B., Et Al, C.P.: An integrative methodology for teaching embodied
non-linguistic agents, applied to virtual animals in second life. In: Proc.of the
First Conf. on AGI. IOS Press (2008)
7. Goertzel, B., Pennachin, C., Geisweiller, N.: Engineering General Intelligence,
Part 1: A Path to Advanced AGI via Embodied Learning and Cognitive Synergy.
Atlantis Thinking Machines. Springer (2013)
8. Goertzel, B., Pennachin, C., Geisweiller, N.: Engineering General Intelligence, Part
2: The CogPrime Architecture for Integrative, Embodied AGI. Atlantis Thinking
Machines. Springer (2013)
9. Goertzel, B., Pinto, H., Pennachin, C., Goertzel, I.F.: Using dependency parsing
and probabilistic inference to extract relationships between genes, proteins and
malignancies implicit among multiple biomedical research abstracts. In: Proc. of
Bio-NLP 2006 (2006)
10. Goertzel, B., Pitt, J., Ikle, M., Pennachin, C., Liu, R.: Glocal memory: a design
principle for artificial brains and minds. Neurocomputing, April 2010
11. Goertzel, B.: Opencogbot: An integrative architecture for embodied agi. In: Proc.
of ICAI 200, Beijing (2010)
12. Looks, M.: Competent Program Evolution. PhD Thesis, Computer Science
Department, Washington University (2006)
13. Passtoors, W., Boer, JM., Goeman, J., Akker, E.:Transcriptional profling of
humanfamilial longevity indicates a role for asf1a and il7r. PLoS One (2012)
14. Tulving, E., Craik, R.: The Oxford Handbook of Memory. Oxford U. Press (2005)
Stochastic Tasks: Difficulty and Levin Search
José Hernández-Orallo(B)
1 Introduction
The evaluation of cognitive features of humans, non-human animals, computers,
hybrids and collectives thereof relies on a proper notion of ‘cognitive task’ and
associated concepts of task difficulty and task breadth (or alternative concepts
such as composition and decomposition). The use of formalisms based on transi-
tion functions such as (PO)MDP (for discrete or continuous cases) is simple, but
have some inconveniences. For instance, the notion of computational cost must
be derived from the algorithm behind the transition function, which may have
a very high variability of computational steps depending on the moment: at idle
moments it may do just very few operations, whereas at other iterations it may
require an exponential number of operations (or even not halt). The maximum,
minimum or average for all time instants show problems (such as dependency
on the time resolution). Also, the use of transition functions differs significantly
in the way animals (including humans) and many agent languages in AI work,
with algorithms that can use signals and have a control of time through threads
(using, e.g., “sleep” instructions where computation stops momentarily).
The other important thing is the notion of response, score or return R for an
episode. Apart from relaxing its functional dependency with the rewards during
an episode, to account with a goal-oriented task, we consider the problem of
commensurability of different tasks by using a level of tolerance, and deriving
the notion of acceptable policy from it. While this seems a cosmetic change, it
paves the way to the notion of difficulty —as difficulty does not make sense if we
do not set a threshold or tolerance— and also to the analysis of task instances.
because (1) the use of n bits of memory requires at least n computational steps,
so the latter are going to be considered anyway and (2) steps and bits are different
units. The fourth thing is verification. When we discuss the effort about finding a
policy, there must be some degree of certainty that the policy is reasonably good.
As tasks and agents are stochastic, this verification is more cumbersome than in
a non-stochastic case. We will discuss about this later on in the paper. For the
moment, we will just combine the length of the policy and the computational
steps, by defining LS[→ν] (π, μ) L(π) + log S[→ν] (π, μ). Logarithms are always
binary. We will explain later on why we apply a logarithm over S. The fifth thing
is the tolerance level of the task. In many cases, we cannot talk about difficulty
if there is no threshold or limit for which we consider a policy acceptable. It
is true that some tasks have a response function R that can only be 0 or 1,
and difficulty is just defined in terms of this goal. But many other tasks are not
binary (goal-oriented), and we need to establish a threshold for them. In our
case, we can take 1 as the best response and set the threshold on 1 − .
We now define acceptability in a straightforward way. The set of acceptable
policies for task μ given a tolerance is given by
Note that with a tolerance greater than 0 the agent can do terribly wrong in a
few instances, provided it does well on many others.
And now we are ready to link difficulty to resources. This is usual in algo-
rithmic information theory, but here we need to calculate the complexities of
the policies (the agents) and not the problems (the tasks). A common solution,
inspired by Levin’s Kt (see, e.g., [10] or [11]), is to define:
Note that the above has two expectations: one in LS and another one inside A.
The interpretation of the above expression is a measure of effort, as used with
the concept of computational information gain with Kt in [4].
An option as an upper-bound measure of difficulty would be (μ)
Kt[,→ν] (μ), for a finite ν and given . In general, if ν is very large, then the
last evaluations will prevail and any initial effort to find the policies and start
applying them will not have enough weight. On the contrary, if ν is small, then
those policies that invest in analysing the environment will be penalised. It also
requires a good assessment of the metasearch procedure to verify the policy so
it can go to exploitation. In any case, the notion of difficulty depends, in some
tasks, on ν. We will come back to the ‘verification cost’ later on.
tests, as we could start with simple instances and adapt their difficulty to the
ability of the subject (as in adaptive testing in psychometrics).
The key issue is that instance difficulty must be defined relative to a task. At
first sight, the difference in difficulty between 6/3 and 1252/626 is just a question
of computational steps, as the latter usually requires more computational steps
if a general division algorithm is used. But what about 13528/13528? It looks
an easy instance. Using a general division algorithm, it may be the case that it
takes more computational steps than 1522/626. If we see it easy is because there
are some shortcuts in our algorithm to make divisions. Of course, we can think
about algorithms with many shortcuts, but then the notion of difficulty depends
on how many shortcuts it has. In the end, this would make instance difficulty
depend on a given algorithm for the task (and not the task itself). This would
boil down to the steps taken by the algorithm, as in computational complexity.
We can of course take a structuralist approach, by linking the difficulty of an
instance to a series of characteristics of the instance, such as its size, the similar-
ities of their ingredients, etc. This is one of the usual approaches in psychology
and many other areas, including evolutionary computation, but does not lead to
a general view of what instance difficulty really is. For the divisions above, one
can argue that 13528/13528 is more regular than 1252/626, and that is why the
first is easier than the second. However, this is false in general, as 1352813528 is
by no means easier than any other exponentiation.
Another perspective is “the likelihood that a randomly chosen program will
fail for any given input value” [2], like the population-based approach in psychol-
ogy. For this, however, we would need a population2 . The insight comes when we
see that best policies may change with variable values of . This leads to the view
of the relative difficulty of an instance with respect to a task as the minimum
LS for any possible tolerance of a policy such that the instance is accepted. We
denote by μσ an instance of μ with seed σ (on the random tape or generator).
The set of all optimal policies for varying tolerances 0 is:
[→ν]
OptLS (μ) arg min LS[→ν] (π, μ) (3)
π∈A[0 ,→ν] (μ) 0 ∈[0,1]
Note how the order of the minimisation is arranged in equations 3 and 4 such
that for the many policies that only cover μσ but do not solve many of the other
instances, these are not considered because they are not in OptLS .
This notion of relative difficulty is basically a notion of consilience with the
task. If we have an instance whose best policy is unrelated to the best policy for
the rest, then this instance will not be covered until the tolerance becomes very
2
We could assume a universal distribution. This is related to the approach in this
paper as the shortest policies have a great part of the mass of this distribution.
Stochastic Tasks: Difficulty and Levin Search 95
low. Of course, this will depend on whether the algorithmic content of solving
the instance can be accommodated into the general policy. This is closely related
to concepts such as consilience, coherence and intensionality [3–5].
Now the question is to consider how we can put several tasks together. The
aggregation of several responses that are not commensurate makes no sense This
gives further justification to eq. 1, where A was introduced. Given two tolerance
levels for each task we can see whether this leads to similar or different difficulties
for each task. For instance, if the difficulties are very different, then the task will
be dominated by the easy one. Given two stochastic tasks, the composition
as the union of the tasks is meaningless, so we instead calculate a mixture. In
particular, the composition of tasks μ1 and μ2 with weight α ∈ [0, 1], denoted
by αμ1 ⊕ (1 − α)μ2 , is defined by a stochastic choice, using a biased coin (e.g.,
using α), between the two tasks. Note that this choice is made for each trial. It
is easy to see that if both μ1 and μ2 are asyncronous-time stochastic tasks, this
mixture also is. Similar to composition we can talk about decomposition, which
is just understood in a straightforward way. Basically, μ is decomposable into μ1
and μ2 if there is an α and two tasks μ1 and μ2 such that μ = αμ1 ⊕ (1 − α)μ2 .
Now, it is interesting to have a short look at what happens with difficulty
when two tasks are put together. Given a difficulty function , we would like
to see that if (αμ1 ⊕ (1 − α)μ2 ) ≈ α(μ1 ) + (1 − α)(μ2 ) then both tasks are
related, and there is a common policy that takes advantage of some similarities.
However, in order to make sense of this expression, we need to consider some
values of α and fix a tolerance. With high tolerance the above will always be
true as is close to zero independently of the task. With intermediate tolerances,
if the difficulties are not even, the optimal policies for the composed task will
invest more resources for the easiest ‘subtask’ and will neglect the most difficult
‘subtask’. Finally, using low tolerances (or even 0) for the above expressions may
have more meaning, as the policy must take into account both tasks.
In fact, there are some cases for which some relations can be established.
Assume 0 tolerance, and imagine that for every 1 > α > 0 we have (αμ1 ⊕ (1 −
α)μ2 ) ≈ α(μ1 ). If this is the case, it means that we require the same effort to
find a policy for both tasks than for one alone. We can see that task μ1 covers
task μ2 . In other words, the optimal policy for μ1 works for μ2 . Note that this
does not mean that every policy for μ1 works for μ2 . Finally, if μ1 covers μ2 and
vice versa, we can say that both tasks are equivalent.
We can also calculate a distance as d(μ1 , μ2 ) 2(0.5μ1 ⊕ 0.5μ2 ) − (μ1 ) −
(μ2 ). Clearly, if μ1 = μ2 then we have 0 distance. For tolerance 0 we also have
that if μ2 has difficulty close to 0 but μ1 has a high difficulty h1 , and both tasks
are unrelated but can be distinguished without effort, then the distance is h1 .
Nonetheless, there are many questions we can analyse with this conceptual-
isation. For instance, how far can we decompose? There are some decomposi-
tions that will lead to tasks with very similar instances or even with just one
instance. Let us consider the addition task μadd with a soft geometrical distri-
bution p on the numbers to be added. With tolerance 0, the optimal policy is
given by a short and efficient policy to addition. We can decompose addition
96 J. Hernández-Orallo
into μadd1 and μadd2 , where μadd1 contains all the summations 0 + x, and μadd2
incorporates all the rest. Given the distribution p, we can find the α such that
μadd = αμadd1 ⊕ (1 − α)μadd2 . From this decomposition, we see that μadd2 will
have the same difficulty, as the removal of summations 0 + x does not simplify
the problem. However, μadd1 is simple now. But, interestingly, μadd2 still covers
μadd1 . We can figure out many decompositions, such as additions with and with-
out carrying. Also, as the task gives more relevance to short additions because of
the geometrical distribution, we may decompose the task in many one-instance
tasks and a few general tasks. In the one-instance tasks we would put simple
additions such as 1 + 5 that we would just rote learn. In fact, it is quite likely
that in order to improve the efficiency of the general policy for μadd the policy
includes some tricks to treat some particular cases or easy subsets.
The opposite direction is if we think about how far we can reach by com-
posing tasks. Again, we can compose tasks ad eternum without reaching more
general tasks necessarily. The big question is whether we can analyse abilities
with the use of compositions and difficulties. In other words, are there some tasks
such that the policies solving these tasks are frequently useful for many other
tasks? That could be evaluated by looking what happens to a task μ1 with a
given difficulty h1 if it is composed with any other task μ2 of some task class. If
the difficulty of the composed task remains constant (or increases very slightly),
we can say that μ1 covers μ2 . Are there tasks that cover many other tasks? This
is actually what psychometrics and artificial intelligence are trying to unveil. For
instance, in psychometrics, we can define a task μ1 with some selection of arith-
metic operations and see that those who perform well on these operations have
a good arithmetic ability. In our perspective, we could extrapolate (theoretically
and not experimentally) that this task μ1 covers a range of arithmetic tasks.
We need that the policies that are tried could also be search procedures over
several trials. That means that Levin search actually becomes a metasearch,
which considers all possible search procedures, ordered by size and resources,
similar to other adaptations of Levin search for interactive scenarios [9,13].
As tasks are stochastic, we can never have complete certainty that a good
policy has been found. An option is to consider a confidence level, such that
the search invests as fewer computational steps as possible to have a degree of
confidence 1 − δ of having found an -acceptable policy. This clearly resembles
a PAC (probably approximate correct) scenario [14].
The search must find a policy with a confidence level δ, i.e., P r(π solves μ) ≥
1 − δ. If we denote the best possible average result (for an infinite number of
runs) as r∗ , we consider that a series of runs is a sufficient verification for a
probably approximate correct (PAC) policy π for μ when:
P r(r∗ − r ≤ ) ≥ 1 − δ (5)
with r being the average of the results of the trials (runs) so far.
First, we are going to assume that all runs take the same number of steps
(a strong assumption, but let us remind that this is an upper limit), so the
verification cost could be approximated by
W[,δ] (π, μ) S(π, μ) · B[,δ] (π, μ) (6)
i.e., the expected number of steps times the expected number of verification bids.
The number of bids can be estimated if we have the mean and the standard
deviation of the response for a series of runs. Assuming a normal distribution:
|zδ/2 |2 σ 2
n≥ (7)
r + − r∗ )2
(
In order to apply the above expression we need the variance σ 2 . Many
approaches to the estimation of a population mean with unknown σ 2 are based
on a pilot or prior study (let us say we try 30 repetitions) and then derive n using
the normal distribution and then use this for a Student’s t distribution. Instead
of this, we are going to take an iterative approach where we update the mean
and standard deviation after each repetition. We consider the maximum stan-
dard deviation as a start (as a kind of Laplace correction) with two fabricated
repetitions with responses 0 and 1.
Algorithm 1 is used in a modified Levin search:
Definition 1. Levin’s universal search for stochastic tasks and policies with tol-
erance , confidence level 1 − δ, and maximum response reference r∗ . Given a
task μ policies are enumerated in several phases, starting from phase 1. For phase
i, we execute all possible policies π with L(π) ≤ i for si = 2i−L(π) steps each.
We call function VerifyNorm(π, μ, , δ, smax ) in Algorithm 1 with smax = si .
While an acceptable policy is not found we continue until we complete the phase
and then to a next stage i + 1. If an acceptable policy is found, some extra trials
are performed before stopping the search for confirmation.
98 J. Hernández-Orallo
13: if j ≥ n0 then
14: if r > r∗ − then return TRUE, s Stop because it is verified
15: else return FALSE, s Stop because it is rejected
16: end if
17: end if
18: j ←j+1
19: until s ≥ smax
20: return FALSE, s
21: end function
log F[,δ] (π, μ) = L(π)+log S(π, μ)·B[,δ] (π, μ) = L(π)+log S(π, μ)+log B[,δ] (π, μ)
Stochastic Tasks: Difficulty and Levin Search 99
From here, we can finally define a measure of difficulty that accounts for all the
issues that affect the search of the policy for a stochastic task:
5 Conclusions
As we have mentioned during this paper, the notion of task is common in AI
evaluation, in animal cognition and also in human evaluation. We set tasks and
agents as asynchronous interactive systems, where difficulty is seen as compu-
tational steps of a Levin search, but this search has to be modified to cover
stochastic behaviours. These ideas are an evolution and continuation of early
notions of task and difficulty in [8] and [6] respectively. The relevance of verifi-
cation in difficulty has usually been associated with deduction. However, some
works have incorporated it as well in other inference problems, such as induction
and optimisation, using Levin’s Kt [1,4,12]. From the setting described in this
paper, many other things could be explored, especially around the notions of
composition and decomposition, task instance and agent response curves.
References
1. Alpcan, T., Everitt, T., Hutter, M.: Can we measure the difficulty of an optimiza-
tion problem? In: IEEE Information Theory Workshop (ITW) (2014)
2. Bentley, J.G.W., Bishop, P.G., van der Meulen, M.J.P.: An empirical exploration
of the difficulty function. In: Heisel, M., Liggesmeyer, P., Wittmann, S. (eds.)
SAFECOMP 2004. LNCS, vol. 3219, pp. 60–71. Springer, Heidelberg (2004)
3. Hernández-Orallo, J.: A computational definition of ‘consilience’. Philosophica 61,
901–920 (2000)
4. Hernández-Orallo, J.: Computational measures of information gain and reinforce-
ment in inference processes. AI Communications 13(1), 49–50 (2000)
5. Hernández-Orallo, J.: Constructive reinforcement learning. International Journal
of Intelligent Systems 15(3), 241–264 (2000)
6. Hernández-Orallo, J.: On environment difficulty and discriminating power.
Autonomous Agents and Multi-Agent Systems, 1–53 (2014). http://dx.doi.org/
10.1007/s10458-014-9257-1
7. Hernández-Orallo, J., Dowe, D.L.: Measuring universal intelligence: Towards an
anytime intelligence test. Artificial Intelligence 174(18), 1508–1539 (2010)
8. Hernández-Orallo, J., Dowe, D.L., Hernández-Lloreda, M.V.: Universal psycho-
metrics: measuring cognitive abilities in the machine kingdom. Cognitive Systems
Research 27, 50–74 (2014)
9. Hutter, M.: Universal Artificial Intelligence: Sequential Decisions based on Algo-
rithmic Probability. Springer (2005)
100 J. Hernández-Orallo
10. Levin, L.A.: Universal sequential search problems. Problems of Information Trans-
mission 9(3), 265–266 (1973)
11. Li, M., Vitányi, P.: An introduction to Kolmogorov complexity and its applications,
3rd edn. Springer (2008)
12. Mayfield, J.E.: Minimal history, a theory of plausible explanation. Complexity
12(4), 48–53 (2007)
13. Schmidhuber, J.: Gödel machines: fully self-referential optimal universal self-
improvers. In: Artificial general intelligence, pp. 199–226. Springer (2007)
14. Valiant, L.G.: A theory of the learnable. Communications of the ACM 27(11),
1134–1142 (1984)
Instrumental Properties of Social Testbeds
1 Introduction
Evaluation tools are crucial in any discipline as a way to assess its progress and
creations. There are some tools, benchmarks and contests, aimed at the measure-
ment of humanoid intelligence or the performance in a particular set of tasks.
However, the state of the art of artificial intelligence (AI) and artificial general
intelligence is now more focussed towards social abilities, and here the measur-
ing tools are still rather incipient. In the past two decades, the notion of agent
and the area of multi-agent systems have shifted AI to problems and solutions
where ‘social’ intelligence is more relevant (e.g., [1,2]). This shift towards a more
social-oriented AI is related to the modern view of human intelligence as highly
social, actually one of the most distinctive features of human intelligence over
other kinds of animal intelligence. Some significant questions that appear here
are then whether we are able to properly evaluate social intelligence in general
(not only in AI, but universally) and whether we can develop measurement tools
that distinguish between social intelligence and general intelligence.
In this paper, we 1) identify the components that should be considered in
order to assess social intelligence, and 2) provide some instrumental properties
to help us determine the suitability of a testbed to be used as a social test (valid-
ity, reliability, efficiency, boundedness and team symmetry), while analyzing the
influence that such components have on these properties. This helps us to pave
the way for the analysis of whether many social environments, games and tests
in the literature are useful for measuring social intelligence.
The paper is organized as follows. Section 2 provides the necessary back-
ground. Section 3 identifies the components that we should consider in order to
measure social intelligence. Section 4 presents some instrumental properties to
assess the suitability of a testbed to be used as a social intelligence test. Finally,
Sect. 5 closes the paper with some discussion and future work.
c Springer International Publishing Switzerland 2015
J. Bieger (Ed.): AGI 2015, LNAI 9205, pp. 101–110, 2015.
DOI: 10.1007/978-3-319-21365-1 11
102 J. Insa-Cabrera and J. Hernández-Orallo
2 Background
This section gives an introduction to the concepts and terminology of multi-agent
environments and serves as a background for the following sections.
2.2 Teams
Definition 1. Agent slots i and j are in the same team iff ∀k : ri,k = rj,k ,
whatever the agents present in the environment.
which means that all agents in a team receive exactly the same rewards. Note that
teams are not alliances as usually understood in game theory. In fact, teams are
fixed and cannot be changed by the agents. Also, we do not use any sophisticated
mechanism to award rewards, related to the contribution of each agent in the
team, as it is done with the Shapley Value [3]. We just set rewards uniformly.
Note that with this definition the agents are not included in the environment.
For instance, noughts and crosses could be defined as an environment μnc with
two agents, where the partition set τ is defined as {{1}, {2}}, which represents
that this game allows two teams, and one agent in each. Another example is
RoboCup Soccer, whose τ would be {{1, 2, 3, 4, 5}, {6, 7, 8, 9, 10}}.
We now define an instantiation for a particular agent setup. Formally, an
agent line-up l is a list of agents. For instance, if we have a set of agents Π =
{π1 , π2 , π3 , π4 }, a line-up from this set could be l1 = (π2 , π3 ). The use of the same
agent twice is allowed, so l2 = (π1 , π1 ) is also a line-up. We denote by μ[l] the
instantiation of an environment μ with a line-up l, provided that the length of l is
greater than or equal to the number of agents allowed by μ (if l has more agents,
the excess is ignored). The slots of the environment are then matched with the
corresponding elements of l following their order. For instance, for the noughts
104 J. Insa-Cabrera and J. Hernández-Orallo
and crosses, an instantiation would be μnc [l1 ]. Note that different instantiations
over the same environment would normally lead to different results. We use
Ln (Π) to specify the set of all the line-ups of length n with agents of Π.
We use the notation RiK (μ[l]), which gives us the expected result of the ith
agent in line-up l for environment μ (also in slot i) during K steps. If K is
omitted, we assume K = ∞. In order to calculate an agent’s result we make use
of some kind of utility function (e.g., an average of rewards).
N (μ)
Υ (Π, wL , M, wM , wS ) wM (μ) wS (i, μ) wL (l)Ri (μ[l]) .
μ∈M i=1 l∈LN (μ) (Π)
(1)
– Test of social intelligence Υ̂ : The final test to measure social intelligence
following a definition of social intelligence Υ . The test should consist of a set
of exercises and some kind of procedure to sample them. As an example, we
use the definition of social intelligence test from [4, Sect.3.4]:
K
Υ̂ [pΠ , pM , pS , pK , nE ](Π, wL , M, wM , wS ) ηE wM (μ)wS (i, μ)wL (l)Ri (μ[l]) .
μ,i,l∈E
(2)
4 Properties
In order to evaluate social intelligence and distinguish it from general intelligence,
we need tests where social ability has to be used and, also, where we can perceive
its consequences. This means that not every environment is useful for measuring
social intelligence and not every subset of agents is also useful. We want tests
such that agents must use their social intelligence to understand and/or have
influence over other agents’ policies in such a way that this is useful to accomplish
their goals, but common general intelligence is not enough.
Hereafter, we investigate some instrumental properties for a testbed of multi-
agent environments and agents to measure social intelligence.
4.1 Validity
environment class and the agent class must be general enough to avoid that
some predefined or hardwired policies could be optimal for these classes. This
is the key issue of a (social) intelligence test; it must be as general as possible.
We need to choose a diverse environment class. One possibility is to consider
all environments (as done by [5,6]), and another is to find an environment class
that is sufficiently representative (as attempted in [7]).
Similarly, we need to consider a class of agents that leads to a diversity in line-
up. This class should incorporate many different types of agents: random agents,
agents with some predetermined policies, agents that are able to learn, human
agents, agents with low social intelligence, agents with high social intelligence,
etc. The set of all possible agents (either artificial or biological) is known as
machine kingdom in [8] and raises many questions about the feasibility of any
test considering this astronomically large set. Also, there are doubts about what
the weight for this universal set should be when including them into line-ups
(i.e., wL ). Instead, some representative kinds of agents could be chosen. In this
way, we could aim at social intelligence relative to a smaller (and well-defined)
set of agents, possibly specializing the definition by limiting the resources, the
program size or the intelligence of the agents.
Regarding specificity, it is equally important for a measurement to only
include those environments and agents that really reflect what we want to mea-
sure. For instance, it is desirable that the evaluation of an ability is done in an
environment where no other abilities are required, or in other words, we want
that the environment evaluates the ability in isolation. Otherwise, it will not
be clear which part of the result comes from the ability to be evaluated, and
which part comes from other abilities. Although it is very difficult to avoid any
contamination, the idea is to ensure that the role of these other abilities are
minor, or are taken for granted for all agents. We are certainly not interested
in non-social environments as this would contaminate the measure with other
abilities. In fact, one of the recurrent issues in defining and measuring social
intelligence is to be specific enough to distinguish it from general intelligence.
4.2 Reliability
Another key issue in psychometric tests is the notion of reliability, which means
that the measurement is close to the actual value. Note that this is different
to validity, which refers about the true identification or definition of the actual
value. In other words, if we assume validity, i.e., that the definition is correct,
reliability refers to the quality of the measurement with respect to the actual
value. More technically, if the actual value of π for an ability φ is v then we want
a test to give a value which is close to v. The cause of the divergence may be
systematic (bias), non-systematic (variance) or both.
First, we need to realize that reliability applies to tests, such as e.g., (2). Reli-
ability is then defined by considering that a test can be repeated many times, so
becoming a random variable that we can compare to the true value. Formally:
Definition 3. Given a definition of a cognitive ability Υ and a test over it Υ̂ ,
the test error is given by:
Instrumental Properties of Social Testbeds 107
4.3 Efficiency
This property refers to how efficient a test is in terms of the (computational) time
required to get a reliable score. It is easy to see that efficiency and reliability are
opposed. If we were able to perform an infinitely number of infinite exercises,
then we would have Υ̂ = Υ , with perfect reliability, as we would exhaust Π
and M . If done properly, it is usually the variance component of the reliability
decomposition that is affected if we keep the bias close to 0 even with very low
values for the number of exercises.
Efficiency can be defined as a ratio between the reliability and the time taken
by the test.
Definition 4. Given a definition of a cognitive ability Υ and a test over it Υ̂ ,
the efficiency is given by:
4.4 Boundedness
One desirable property is that rewards are bounded, otherwise the value of Υ
(such as e.g., (1)) could diverge. Any arbitrary choice of upper and lower bounds
can be scaled to any other choice so, without loss of generality, we can assume
that all of them are bounded between −1 and 1, i.e., ∀i, k : −1 ≤ ri,k ≤ 1.
Note that they are bounded for every step. So, if we use a bounded function to
calculate the agent’s result, then RiK (·) is also bounded.
However, bounded expected results do not ensure that Υ is bounded. In order
to ensure a bounded measurement of social intelligence, we also need to consider
that weights are bounded, i.e., there are constants cM , cS and cL such that:
∀M : wM (μ) = cM . (5)
μ∈M
N (μ)
∀μ : wS (i, μ) = cS . (6)
i=1
∀μ, Π : wL (l) = cL . (7)
l∈LN (μ) (Π)
K N
(μ)
lim ri,k = 0 . (8)
K→∞
k=1 i=1
With teams, the previous definition could be changed in such a way that:
K
lim ri,k = 0 . (9)
K→∞
k=1 t∈τ i∈t
So the sum of the agents’ rewards in a team (or team’s reward) does not need
to be zero but the sum of all teams’ rewards does. For instance, if we have a team
with agents {1, 2} and another team with agents {3, 4, 5}, then a reward (in the
limit) of 1/4 for agents 1 and 2 implies −1/6 for agents 3, 4 and 5. The zero-sum
properties are appropriate for competition. In fact, if teams have only one agent
Instrumental Properties of Social Testbeds 109
then we have pure competition. We can have both competition and cooperation
by using teams in a zero-sum game, where agents in a team cooperate and agents
in different teams compete. If we want to evaluate pure cooperation (with one or
more teams) then zero-sum games will not be appropriate.
5 Conclusions
References
1. Horling, B., Lesser, V.: A Survey of Multi-Agent Organizational Paradigms. The
Knowledge Engineering Review 19, 281–316 (2004)
2. Simao, J., Demazeau, Y.: On Social Reasoning in Multi-Agent Systems. Inteligencia
Artificial 5(13), 68–84 (2001)
3. Roth, A.E.: The Shapley Value: Essays in Honor of Lloyd S. Shapley. Cambridge
University Press (1988)
4. Insa-Cabrera, J., Hernández-Orallo, J.: Definition and properties to assess multi-
agent environments as social intelligence tests. Technical report, CoRR (2014)
5. Legg, S., Hutter, M.: Universal Intelligence: A Definition of Machine Intelligence.
Minds and Machines 17(4), 391–444 (2007)
6. Hernández-Orallo, J., Dowe, D.L.: Measuring universal intelligence: Towards an any-
time intelligence test. Artificial Intelligence 174(18), 1508–1539 (2010)
7. Hernández-Orallo, J.: A (hopefully) unbiased universal environment class for mea-
suring intelligence of biological and artificial systems. In: 3rd Conference on Artificial
General Intelligence, pp. 182–183 (2010)
8. Hernández-Orallo, J., Dowe, D.L., Hernández-Lloreda, M.V.: Universal psycho-
metrics: Measuring cognitive abilities in the machine kingdom. Cognitive Systems
Research 27, 50–74 (2014)
Towards Human-Level Inductive Functional
Programming
Susumu Katayama(B)
1 Introduction
1
http://nautilus.cs.miyazaki-u.ac.jp/∼skata/MagicHaskeller.html
c Springer International Publishing Switzerland 2015
J. Bieger (Ed.): AGI 2015, LNAI 9205, pp. 111–120, 2015.
DOI: 10.1007/978-3-319-21365-1 12
112 S. Katayama
Fig. 1. Learning the component library from the data from the Internet
The other representative IFP systems are Igor II[4] and ADATE[5]. How-
ever, those are neither updated recently nor practical. Igor II enforces a tight
restriction on the example set given as the specification, and ADATE requires
high skill for synthesis of simple programs. Moreover, due to the absence of
memoization, they have obvious disadvantage in the practical speed compared
to MagicHaskeller which can start synthesis with its memoization table filled
with expressions.
MagicHaskeller can synthesize only short expressions in a general-
purpose framework by exhaustive search in the program space. In order to syn-
thesize longer programs, the search has to be biased, because the program space
is infinite. The most popular bias is language bias that restricts the search space
around desired programs by carefully selecting the domain-specific language to
be used. Language bias kills the generality, and thus is not our choice.
On the other hand, human programmers are ideal general-purpose inductive
programming systems, and can synthesize programs in Turing-complete lan-
guages without any language bias. When we humans program, we name and
reuse frequently-used functions and procedures, and synthesize larger libraries
and programs using those library functions and procedures as the components.
In other words, we adopt the bias based on the frequency of use.
The same thing can be achieved by collecting the data about how frequently
each expression is requested and/or used, and organize the library consisting
of frequently-used expressions and their subexpressions. This paper presents
our research idea for realization of general-purpose large-scale IFP that is
Towards Human-Level Inductive Functional Programming 113
thanks to this pruning the memoization table fits to the 64GB memory, and the
server has been in use without any critical trouble since its birth three years ago.
Other notable features of MagicHaskeller include:
1. adopting the search strategy that searches promising branches deeper based
on learning which branch is promising, and
2. learning the component library to make it consist of useful compound func-
tions, and synthesizing expressions from more and more complicated com-
ponents.
This paper argues that the solution 2 is promising.3 The reasons are itemized
below:
3
We are not claiming that the solution 2 is more promising than the solution 1.
Rather, we think that the solution 2 may be regarded as some form of the solution
1, by regarding use of learned functions as deep search without branching. Even
then, the solution described in the form of the solution 2 is more straightforward
than the solution 1.
Towards Human-Level Inductive Functional Programming 115
Fig. 2. Comparison with deep learning. The presented research idea adopts more flex-
ible primitive function set than deep learning, but they are similar in that they both
are pre-trained and eliminate redundancy.
The downloaded source codes can be processed in the compatible way as the
queries to the server by following these steps for each function definition:
(a) generate an input-output pair for a random input, and
(b) send it as a query to the IFP server collecting data in the way described
in 1. Providing an IFP service (or other services related to programming)
to re-invent the function;
(c) if more than one program are synthesized, increase the number of random
input-output pairs until one or zero program is obtained;
(d) if no program is synthesized, divide the function definition into subfunc-
tions.
Those two ways can be combined. For example, the latter can be used to organize
the initial component library, and then the former can be used to scale it up.
5 Evaluation
It is difficult to fairly evaluate a general-purpose inductive programming system
using a set of benchmark problems, for all the benchmark problems can eas-
ily be solved by implementing the functions to be synthesized beforehand and
including them in the component library. Even if doing that is prohibited by the
regulations, including their subexpressions in the component library is enough
to make the problems much easier.
This issue is critical especially when evaluating results of this research, which
is based on the idea: “The key to successful inductive programming systems
is the choice of library functions”, because we may not fixate the library for
comparison, but rather we have to evaluate and compare the libraries themselves.
It would be fairer to evaluate systems from the perspective of whether the
infinite set of functions that can be synthesized covers the set of many functions
which the users want. In the case of this research, since IFP service will be
provided as a web application, we can evaluate how the obtained IFP system
can satisfy programming requests based on the results of Web questionnaire and
the Web server statistics.
7 Conclusions
This paper presented our research idea for realizing a human-level IFP system by
adding the library learning functionality to the Web-based general-purpose IFP
system MagicHaskeller. It can be applied to uncovering the AGI mechanism
for human-like learning of behavior and to developing intelligent agents.
References
1. Hinton, G.E., Salakhutdinov, R.R.: Reducing the Dimensionality of Data with Neu-
ral Networks. Science 313(5786), 504–507 (2006)
2. Katayama, S.: Systematic search for lambda expressions. In: Sixth Symposium on
Trends in Functional Programming, pp. 195–205 (2005)
3. Katayama, S.: Efficient exhaustive generation of functional programs using monte-
carlo search with iterative deepening. In: Ho, T.-B., Zhou, Z.-H. (eds.) PRICAI
2008. LNCS (LNAI), vol. 5351, pp. 199–210. Springer, Heidelberg (2008)
4. Kitzelmann, E.: A Combined Analytical and Search-Based Approach to the Induc-
tive Synthesis of Functional Programs. Ph.D. thesis, University of Bamberg (2010)
5. Olsson, R.: Inductive functional programming using incremental program transfor-
mation. Artificial Intelligence 74(1), 55–81 (1995)
6. Schmidhuber, J.: Learning complex, extended sequences using the principle of his-
tory compression. Neural Comput. 4(2), 234–242 (1992)
Anytime Bounded Rationality
1 Introduction
Key among the properties mission-critical systems call for is anytime control – the
capability of a controller to produce control inputs whenever necessary, despite the
lack of resources, trading quality for responsiveness [3,5]. Any practical AGI is con-
strained by a mission, its own architecture, and limited resources including insuffi-
cient time/memory to process all available inputs in order to achieve the full extent of
its goals when it matters. Moreover, unlike fully hand-crafted cyber-physical systems,
AGIs should handle underspecified dynamic environments, with no other choice but
to learn their know-how, possibly throughout their entire lifetime. The challenge of
anytime control thus becomes broader as, in addition to resource scarcity, it must
encompass inevitable variations of completeness, consistency, and accuracy of the
learned programs from which decisions are derived.
We address the requirement of delivering anticipative, multi-objective and anytime
performance from a varying body of knowledge. A system must anticipate its envi-
ronment for taking appropriate action – a controller that does not can only react after
the facts and "lag behind the plant". Predictions and sub-goals must be produced con-
currently: (a) since achieving goals needs predictions, the latter must be up to date; (b)
a complex environment’s state transitions can never be predicted entirely: the most
interesting ones are those that pertain to the achievements of the system’s goals, so
these must be up to date when predictions are generated. A system also needs to
achieve multiple concurrent goals to reach states that can only be obtained using sev-
eral independent yet temporally correlated and/or co-dependent courses of action
© Springer International Publishing Switzerland 2015
J. Bieger (Ed.): AGI 2015, LNAI 9205, pp. 121–130, 2015.
DOI: 10.1007/978-3-319-21365-1_13
122 E. Nivel et al.
while anticipating and resolving potential conflicts in due time. The capabilities above
must be leveraged to compute and revise plans continually, as resources allow and
knowledge accumulates, and execute them whenever necessary, as situations unfold –
this requires subjecting a system's deliberations (and execution) to deadlines relative
to an external reference (world) clock.
Most of the strategies controlling the life-long learning AI systems we are aware of
are subject to one or several severe impediments to the responsiveness and robustness
we expect from mission- and time- critical systems. First, a sequential perception-
decision-action cycle [1,6,7,8,12] limits drastically the potential for situational aware-
ness and responsiveness: such "cognitive cycles" are difficult to interrupt and, being
driven by subjective inference "steps", are decoupled from objective deadlines either
imposed or learned. Second, interleaving multiple trains of inference in one sequential
stream [1,4,6,7] results in the overall system latency adding up with the number of
inputs and tasks at hand: such a system will increasingly and irremediably lag behind
the world. Third, axiomatic reasoning [1,6,7] prevents the revision of inferences upon
the acquisition of further amounts of evidence and know-how, prohibiting continual
refinements and corrections. Last, the lack of explicit temporal inference capabilities
[1,6,7,8] prevents learned procedural knowledge from inferring deadlines for goals
and predictions, which is needed to plan over arbitrary time horizons – on that front,
state-of-the-art reinforcement and evolutionary learners [9,13,2] present other inhe-
rent difficulties. NARS [14] notably avoids these pitfalls and could, in principle, learn
to couple subjective time semantics to a reference clock and feed them to a probabilis-
tic scheduler. We set out instead to schedule inferences deterministically using objec-
tive time semantics so as to avoid the unpredictability and unreliability that inevitably
arise from using inferred time semantics to control the inferencing process itself.
We present a computational model of anytime bounded rationality (we refer to this
model as ABR) that overcomes the limitations above. It posits (a) a dynamic hierarchy
of revisable time-aware programs (called models) (b) exploited by concurrent
inferencing jobs, that are (c) continually re-scheduled by (d) a value-driven executive
under (e) bounded latencies, keeping the size of all data (inputs, inferences, programs
and jobs) within (f) a fixed memory budget. This model has been implemented and
tested: it constitutes the control core of our auto-catalytic, endogenous, reflective
architecture (AERA; [10]), demonstrated to learn (by observation) to conduct
multimodal interviews of humans in real-time, bootstrapped by a minimal seed [11].
While the learning algorithm of this system has been described in prior publications
[10,11], its control strategy, the ABR model, has not been published elsewhere.
1
Other programs construct new models from life-long experience, see [10] for details.
Anytime Bounded Rationality 123
an ever-changing (fixed size) subset of the jobs 2 . All execution times are
commensurate: memory usage is proactively limited (see below), and the threads', E's
and W's worst-case execution times (WCET) are all identical and constant.
Assuming the life-long learning of new models and a sustained influx of inputs, the
number of jobs and input-to-program matching attempts can grow exponentially and
exceed the memory budget. This growth is limited by a forgetting strategy based on
the prediction of the amount of available memory, inferred, conservatively, from past
experience – essentially, the rates of data creation and deletion (inputs, inferences,
programs and jobs). Should E anticipate a shortage, it deletes the necessary number of
data in order to accommodate the next predicted influx while preserving the most
valuable existing data: the top-rated candidates for deletion are the inputs that
contributed the least recently to the achievement of goals, the least reliable models
that succeeded the least recently, and the jobs of the least priority.
A system’s experience constitutes defeasible knowledge, and is thus represented
using a non-axiomatic temporal term logic, truth being neither eternal nor absolute. A
term exposes three components: (a) arbitrary data, (b) a time interval ([early deadline,
late deadline] in microseconds, world time) within which the data is believed to hold
(or, if negated, during which it is believed not to hold) – an inference's lifetime being
bounded by its late deadline – and, (c) a likelihood (in [0,1]), the degree to which the
data has been ascertained. The likelihood of a sensory/reflective input is one whereas
that of a drive is user-defined. An inference results from the processing of evidences
by chains of models3, and is defeated or vindicated upon further (counter-)evidences.
Its likelihood is continually revised depending on the context and reliability of said
models and, notably, decreases with the length of the chains (see next section).
The value of tending to an input (sensory/reflective input, inference or drive) at
time depends on both its urgency (for situational awareness) and likelihood:
,
, 1
,
, , ,
, , , ,
,
1 , ,
, , ,
2
Details on the scheduling algorithm are outside the scope of this paper.
3
Different chains may produce several equivalent inferences, albeit with different likelihoods.
Threads will execute first the jobs performing the most likely of these inferences, postponing
or discarding the others.
Anytime Bounded Rationality 125
where are the predictions of ’s target state. The global relevance of a model
is the (normalized) maximum of the tending values of all its inferences , of
type (Predictions or Goals) that are still alive at time :
, ,
, , Max , , , , ,
, ,
where are the models in the system. If none of the , are alive, then 's
, ,
relevance is computed as , giving it a chance to execute, albeit with a
, ,
minimal relative priority. Finally, the priority of a chaining job is the product of the
relevance of the model and the tending value of the input :
, , , , ,
, , , , ,
Prediction and goal monitoring jobs enjoy the same priority as, respectively, forward
and backward chaining jobs.
3 Models
Models are variable defeasible knowledge: experimental evidences trigger both their
construction, deletion, and the continual revision [10], of their predictive performance:
,
,
, 1
already matched said LT4. The likelihood, at any time , of an inference produced
by a model from an input is:
, , ,
Note that the model instance i, being a prediction of M's success, is assigned a
likelihood equal to that of the prediction b.
Models form hierarchical structures: when a model M features an instance of a 1
premise. Pre-conditions can also be negative, to predict failures: in this case, the RT is
of the form |iM (…), '|' indicating failure. Pre-conditions influence the computation of
0
the likelihood of predictions (see below) but have no impact on that of goals.
A model is called conjunctive (Fig. 3a) when it
specifies a causal relationship whereby an effect is
not entailed by one single term, but by a context, i.e.
a set of temporally correlated positive pre-
conditions (* denotes an unbound value). A
conjunctive model has no LT: instead, for
unification, a parameter list ((X)) gathers all the
variables shared by positive pre-conditions unless
already present in the RT (B(Y Z)). A conjunctive
model updates predictions as amounts of (value-
sharing) positive pre-conditions accumulate. Over
time , the likelihood of a prediction produced
by a conjunctive model increases with the
conjunction of positive pre-conditions weighted by
their reliability, and decreases with the most likely Fig. 3
of the negative ones:
Pre-conditions can be subjected to
∑ , , any others recursively instantiating
,
∑ , the pictured hierarchical patterns.
Logical operations are continuous
, , )
and persistent instead of discrete
, , 1 , and transient: ANDs are weighted
and compete (as well as ORs) with
where are the pre-conditions on that NORs based on the likelihoods of
predicted 's success ( ), all the positive pre-conditions, continually updated
pre-conditions on , and , 's predicted to reflect knowledge variations,
that are both quantitative (likelih-
failures. Positive pre-conditions without which the
ood- and reliability-wise) and qua-
effect of the model is reliably entailed are deemed
litative (new inputs, inferences and
irrelevant: prediction monitors will repeatedly models, deletion of underperform-
decrease their reliability until deletion. When ing models, unlikely inferences and
presented with a goal, M outputs a sub-goal (iM (y
0 0
valueless old inputs).
z)) targeting its own (forward) operation – this
4
Assumptions are not essential for the present discussion and will not be detailed further.
Anytime Bounded Rationality 127
sub-goal will match the RT of its pre-conditions and trigger the production of their
respective sub-goals (or negations thereof in the case of negative pre-conditions).
A model is called disjunctive (Fig. 3b) when it specifies a causal relationship
whereby an effect is entailed by the occurrence of the most likely positive pre-
condition, competing with the most likely negative one. Positive pre-conditions on a
disjunctive model constitute a set of options to entail the models' success – whereas in
conjunctive models they constitute a set of (weighted) requirements. The likelihood of
a prediction is computed as for conjunctive models, except for its component:
, , )
Fig. 4 shows part5 of an actual system (called S1; [10]) that observed (in real-time)
human interactions of the general form "take a [color] [shape], put it there [pointing at
some location], thank you" (and variations thereof) and learned how to satisfy its
mission – hearing/speaking "thank you", depending on the assigned role (interviewee
or interviewer). S1's seed contains (a) a drive run, (b) a model S and its context
0
{S , S }, (c) sensors monitoring the state of hands, objects and utterances (color (col),
1 2
position (pos), attachment (att), shape (is), belonging (bel), designation (point),
speech (speak)) and, (d) effectors (commands move, grab, release, speak and point).
5
For clarity, timings and variants of learned knowledge (e.g. variations in wording, shapes and
colors) are omitted. Faulty models are also omitted (see section 5).
128 E. Nivel et al.
effectors – they were learned by observing the results of a few randomly generated
commands ("motor babbling"). A conjunctive model specifies how a state (its RT)
comes to happen when a context (white areas), i.e. a temporal correlation of pre-
conditions, is observed. For example, M predicts that an object X will move when an
12
actor B has taken it, followed by an actor A asking B to put it at a designated location. A
disjunctive model subjects the occurrence of its RT state to the observation of one pre-
condition among a set of options. For example, M predicts that an object X will be
5
attached to a hand H (RT of M ) in two cases: either S1grabs the object as per model M ,
5 6
or an actor (A) asks another one (B) to take the object, as per model M . The disjunctive
11
model M specifies how hearing "thank you" is entailed by an actor A asking an actor B
14
to pick an object and drop it to a designated location (chain iM –iM –iM –iM ). When a
15 13 12 11
model features a LT, it specifies the transformation of one state (its LT) into another
(its RT). Such transformations can also be controlled by pre-conditions, as for
disjunctive models. For example, M predicts the transition of the state "an actor holds
9
an object X" to "X no longer held" when S1 releases the object (M ), or when the actor
10
is asked to drop the object somewhere (M ). In this case, the LT has to be matched for
13
the model to sub-goal and conversely, both the LT and one pre-condition must be
observed for the model to predict. Models C are conjunctive models without an RT.
i
They represent abstractions of sub-contexts that have been reliably identified among
larger ones (those controlling conjunctive models). Occurrences of sub-contexts are
encoded as model instances (iC ) and are not subjected to any negative pre-conditions.
i
the early deadlines of the goals in a branch (say g4's for g0's branch), a request for
commitment is sent upward (5c) to the first simulated goal in the branch (g1).
Requests are granted depending on the predicted impact of a goal candidate (g1) on
actual goals (g5): if g1 entails no failure of g5, then commit to g1; otherwise (there is a
conflict between g1 and g5), if g5 is of less importance6 than g0, then commit to g1 and
all its sub-goals in the branch7; otherwise do nothing – assuming the same knowledge,
the system will commit to g6 later, following the same procedure. Commitment to g1
is declined in case g0 is redundant with a more important actual goal (targeting the
same state).
Commitment is defeasible, i.e. continually revised as new knowledge (inputs,
inferences and models) impact both goals' importance and tending value: after
commitment, a goal monitor keeps accumulating predictions to anticipate further
conflicts and redundancies that could invalidate its decision, in which case the goal
(and its sub-goals) will become simulated again. When the system commits to a goal
(g1) conflicting with another one (g5), it keeps simulating g5 instead of deleting it, in
the hope of witnessing its unexpected success, triggering the acquisition of new
(possibly better) models. A system may also acquire better knowledge before g5's
deadline and uncover situations where it can be achieved without conflict – the
system may then commit to some of g5's sub-goals (e.g. g6) without having to re-
compute the simulation branches. For the same reasons, goals deemed redundant with
more important ones are also kept in the simulated state.
Acknowledgments. This work has been partly supported by the EU-funded projects
HUMANOBS (FP7-STREP-231453) and Nascence (FP7-ICT-317662), grants from SNF
(#200020-156682) and Rannis, Iceland (#093020012).
References
1. Anderson, J.R., Bothell, D., Byrne, M.D., Douglass, S., Lebiere, C., Qin, Y.: An integrated
theory of the mind. Psychological Review 111, 1036–1060 (2004)
2. Bellas, F., Duro, R.J., Faiña, A., Souto, D.: Multilevel darwinist brain (MDB): Artificial
evolution in a cognitive architecture for real robots. IEEE Transactions on Autonomous
Mental Development 2(4), 340–354 (2010)
3. Boddy, M., Dean, T.L.: Deliberation scheduling for problem solving in timeconstrained
environments. Artificial Intelligence 67(2), 245–285 (1994)
4. Cassimatis, N., Bignoli, P., Bugajska, M., Dugas, S., Kurup, U., Murugesan, A., Bello, P.:
An architecture for adaptive algorithmic hybrids. IEEE Transactions on Systems, Man, and
Cybernetics, Part B 40(3), 903–914 (2010)
5. Horvitz, E., Rutledge, G.: Time-dependent utility and action under uncertainty. In: Proc.
7th Conference on Uncertainty in Artificial Intelligence, pp. 151−158. Morgan Kaufmann
Publishers Inc. (1991)
6. Laird, J.E.: The Soar cognitive architecture. MIT Press (2012)
7. Langley, P., Choi, D., Rogers, S.: Interleaving learning, problem-solving, and execution in
the ICARUS architecture. Technical report, Computational Learning Laboratory, CSLI,
Stanford University (2005)
8. Madl, T., Baars, B.J., Franklin, S.: The timing of the cognitive cycle. PloS One 6(4),
e14803 (2011)
9. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D.,
Riedmiller, M.: Playing atari with deep reinforcement learning. arXiv:1312.5602 (2013)
10. Nivel, E., Thórisson, K.R., Steunebrink, B.R., Dindo, H., Pezzulo, G., Rodríguez, M.,
Hernández, C., Ognibene, D., Schmidhuber, J., Sanz, R., Helgason, H.P., Chella, Antonio:
Bounded Seed-AGI. In: Goertzel, B., Orseau, L., Snaider, J. (eds.) AGI 2014. LNCS,
vol. 8598, pp. 85–96. Springer, Heidelberg (2014)
11. Nivel, E., Thórisson, K.R., Steunebrink, B.R., Dindo, H., Pezzulo, G., Rodriguez, M.,
Hernandez, C., Ognibene, D., Schmidhuber, J., Sanz, R., Helgason, H.P., Chella, A.,
Jonsson, G.K.: Autonomous acquisition of natural language. In: Proc. IADIS International
Conference on Intelligent Systems & Agents 2014, pp. 58−66 (2014)
12. Shapiro, S.C., Ismail, H.O.: Anchoring in a grounded layered architecture with integrated
reasoning. Robotics and Autonomous Systems 43(2-3), 97–108 (2003)
13. Veness, J., Ng, K.S., Hutter, M., Uther, W., Silver, D.: A Monte-Carlo AIXI approxima-
tion. Journal of Artificial Intelligence Research 40(1), 95–142 (2011)
14. Wang, P.: Rigid Flexibility: The Logic of Intelligence. Springer (2006)
Ultimate Intelligence Part I: Physical
Completeness and Objectivity of Induction
Eray Özkural(B)
“If you wish to make an apple pie from scratch, you must first invent the
universe.” – Carl Sagan
1 Introduction
Ray Solomonoff has discovered algorithmic probability and introduced the
universal induction method which is the foundation of mathematical Artificial
Intelligence (AI) theory [14]. Although the theory of Solomonoff induction is
somewhat independent of physics, we interpret it physically and try to refine
the understanding of the theory by thought experiments given constraints of
physical law. First, we argue that its completeness is compatible with contem-
porary physical theory, for which we give arguments from modern physics that
show Solomonoff induction to converge for all possible physical prediction prob-
lems. Second, we define a physical message complexity measure based on initial
machine volume, and argue that it has the advantage of objectivity and the
typical disadvantages of using low-level reference machines. However, we show
that setting the reference machine to the universe does have benefits, poten-
tially eliminating some constants from algorithmic information theory (AIT)
and refuting certain well-known theoretical objections to algorithmic probabil-
ity. We also introduce a physical version of algorithmic probability based on
volume and propose six more variants of physical message complexity.
2 Background
Let us recall Solomonoff’s universal distribution. Let U be a universal computer
which runs programs with a prefix-free encoding like LISP. The algorithmic
c Springer International Publishing Switzerland 2015
J. Bieger (Ed.): AGI 2015, LNAI 9205, pp. 131–141, 2015.
DOI: 10.1007/978-3-319-21365-1 14
132 E. Özkural
We also give the basic definition of Algorithmic Information Theory (AIT), where
the algorithmic entropy, or complexity of a bit string x ∈ {0, 1}+ is defined
as HU (x) = min({|π| | U (π) = x}). Universal sequence induction method of
Solomonoff works on bit strings x drawn from a stochastic source μ. Equation 1
is a semi-measure, but that is easily overcome as we can normalize it. We merely
normalize sequence probabilities, PU (x0) = PU (x0).PU (x)/(PU (x0) + PU (x1)),
eliminating irrelevant programs and ensuring that the probabilities sum to 1,
from which point on PU (x0|x) = PU (x0)/PU (x) yields an accurate prediction.
The error bound for this method is the best known for any such induction
method. The total expected squared error between PU (x) and μ is less than
−1/2 ln PU (μ) according to the convergence theorem proven in [13], and it is
roughly HU (μ) ln 2 [15].
HU (μ) ≤ k, ∃k ∈ Z (2)
which entails that the pdf μ(x) can be perfectly simulated on a computer, while
x are (truly) stochastic. This condition is formalized likewise in [5].
In the present article, we support the above philosophical solution to the choice
of the reference machine with basic observations. Let us define physical message
complexity:
CV (x) min{V (M ) | M → x} (5)
where x ∈ D+ is any d-ary message written in an alphabet D, M is any phys-
ical machine (finite mechanism) that emits the message x (denoted M → x),
and V (M ) is the volume of machine M . M is supposed to contain all physical
computers that can emit message x.
Equation 5 is too abstract and it would have to be connected to physical law
to be useful. However, it allows us to reason about the constraints we wish to put
on physical complexity. M could be any possible physical computer that can emit
a message. For this definition to be useful, the concept of emission would have
to be determined. Imagine for now that the device emits photons that can be
detected by a sensor, interpreting the presence of a photon with frequency fi as
di ∈ D. It might be hard for us to build the minimal device that can do this. How-
ever, let us assume that such a device can exist and be simulated. It is likely that
this minimal hardware would occupy quite a large volume compared to the out-
put it emits. With every added unit of message complexity, the minimal device
would have to get larger. We may consider additional complications. For instance,
136 E. Özkural
we may demand that these machines do not receive any physical input, i.e., sup-
ply their own energy, which we call a self-contained mechanism. We note that
resource bounds can also be naturally added into this picture.
When we use CV (x) instead of HU (x), we do not only eliminate the need
for a reference machine, but we also eliminate many constraints and constants
in AIT. First of all, there is not the same worry of a self-delimiting program,
because every physical machine that can be constructed will either emit a
message or not in isolation, although its meaning slightly changes and will
be considered in the following. Secondly, we expect all the basic theorems of
AIT to hold, while the arbitrary constants that correspond to glue code to
be eliminated or minimized. Recall that the constants in AIT usually corre-
spond to such elementary operations as function composition and so forth. Let
us consider the sub-additivity of information which represents a good exam-
ple: HU (x, y) = HU (x) + HU (y|x) + O(1) When we consider CV (x, y), however,
the sub-additivity of information becomes exactly CV (x, y) = CV (x) + CV (y|x)
since there does not need to be a gap between a machine emitting a photon
and another sensing one. In the consideration of an underlying physical theory
of computing (like quantum computing), the relations will further change, and
become ever clearer.
5 Discussion
The argument from practical finiteness of the universe was mentioned briefly by
Solomonoff in [12]. Let us note, however, that the abstract theory of algorithmic
probability implies an infinite probabilistic universe, in which every program
may be generated, and each bit of each program is equiprobable. In such an
abstract universe, a Boltzmann Brain, with considerably more entropy than our
humble universe is even possible, although it has a vanishingly small probabil-
ity. In a finite observable universe with finite resources, however, we obtain a
slightly different picture, for instance any Boltzmann Brain is improbable, and
a Boltzmann Brain with a much greater entropy than our universe would be
impossible (0 probability). Obviously, in a sequence of universes with increasing
volume of observable universe, the limit would be much like pure algorithmic
probability. However, for our definition of physical message complexity, a proper
physical framework is much more appropriate, and such considerations quickly
veer into the territory of metaphysics (since they truly consider universes with
physical law unlike our own). Thus firmly footed in contemporary physics, we
gain a better understanding of the limits of ultimate intelligence.
140 E. Özkural
References
1. Bennett, C.H.: How to define complexity in physics, and why. Complexity, Entropy,
and the Physics of Information VIII, 137–148 (1980)
2. Chaitin, G.J.: Algorithmic Information Theory. Cambridge University Press (2004)
3. Deutsch, D.: Quantum theory, the church-turing principle and the universal quan-
tum computer. Proceedings of the Royal Society of London. A. Mathematical and
Physical Sciences 400(1818), 97–117 (1985)
4. Frampton, P.H., Hsu, S.D.H., Kephart, T.W., Reeb, D.: What is the entropy of
the universe? Classical and Quantum Gravity 26(14), 145005 (2009)
5. Hutter, M.: Convergence and loss bounds for Bayesian sequence prediction. IEEE
Transactions on Information Theory 49(8), 2061–2067 (2003)
Physical Completeness and Objectivity of Induction 141
6. Lloyd, S.: Ultimate physical limits to computation. Nature 406, 1047–1054 (2000)
7. Lloyd, S.: Universal quantum simulators. Science 273(5278), 1073–1078 (1996)
8. Margolus, N., Levitin, L.B.: The maximum speed of dynamical evolution. Physica
D Nonlinear Phenomena 120, September 1998
9. Miszczak, J.A.: Models of quantum computation and quantum programming lan-
guages. Bull. Pol. Acad. Sci.-Tech. Sci. 59(3) (2011)
10. Özkural, E.: Worldviews, Science and us: philosophy and complexity, chap. In: A
compromise between reductionism and non-reductionism. World Scientific Books
(2007)
11. Raatikainen, P.: On interpreting chaitin’s incompleteness theorem. Journal of
Philosophical Logic 27 (1998)
12. Solomonoff, R.J.: Inductive inference research status spring 1967. Tech. Rep. RTB
154, Rockford Research, Inc. (1967)
13. Solomonoff, R.J.: Complexity-based induction systems: Comparisons and conver-
gence theorems. IEEE Trans. on Information Theory IT 24(4), 422–432 (1978)
14. Solomonoff, R.J.: The discovery of algorithmic probability. Journal of Computer
and System Sciences 55(1), 73–88 (1997)
15. Solomonoff, R.J.: Three kinds of probabilistic induction: Universal distributions
and convergence theorems. The Computer Journal 51(5), 566–570 (2008)
16. Solomonoff, R.J.: Algorithmic probability: theory and applications. In: Dehmer,
M., Emmert-Streib, F. (eds.) Information Theory and Statistical Learning,
pp. 1–23. Springer Science+Business Media, N.Y. (2009)
17. Sunehag, P., Hutter, M.: Intelligence as inference or forcing occam on the world.
In: Goertzel, B., Orseau, L., Snaider, J. (eds.) AGI 2014. LNCS, vol. 8598,
pp. 186–195. Springer, Heidelberg (2014)
18. Tubbs, A.D., Wolfe, A.M.: Evidence for large-scale uniformity of physical laws.
ApJ 236, L105–L108 (1980)
19. Wood, I., Sunehag, P., Hutter, M.: (non-)equivalence of universal priors. In:
Dowe, D.L. (ed.) Solomonoff Festschrift. LNCS, vol. 7070, pp. 417–425. Springer,
Heidelberg (2013)
20. Zurek, W.H.: Algorithmic randomness, physical entropy, measurements, and the
demon of choice. In: Hey, A.J.G. (ed.) Feynman and computation: exploring the
limits of computers. Perseus Books (1998)
Towards Emotion in Sigma: From Appraisal to Attention
1 Introduction
Sigma [1] is a cognitive architecture/system that is based on combining what has been
learned from over three decades worth of independent work in cognitive architectures
[2] and graphical models [3]. Its development is being guided by a trio of desiderata:
(1) grand unification (expanding beyond strictly cognitive processing to all of the
capabilities required for intelligent behavior in complex real worlds); (2) functional
elegance (deriving the full range of necessary capabilities from the interactions
among a small general set of mechanisms); and (3) sufficient efficiency (executing at a
speed sufficient for anticipated applications). We have recently begun exploring the
incorporation of emotion into Sigma, driven by: the theoretical desideratum of grand
unification; the practical goal of building virtual humans for applications in education,
training, counseling, entertainment, etc.; and the hypothesis that emotion is critical for
general intelligences to survive and thrive in complex physical and social worlds.
A major focus of this effort concerns what aspects of emotion are properly
architectural – that is, fixed parts of the mind – versus enabled primarily by learned
knowledge and skills. A large fragment of emotion is non-voluntary and immutable,
providing hard-to-ignore input to cognition and behavior from what could be called
the wisdom of evolution. It also makes direct contact with bodily processes, to the
extent such exist, to yield the heat in emotion. Thus, significant fractions of it must be
grounded architecturally even with knowledge clearly being critical at higher levels.
Driven by functional elegance, there is also a major emphasis here on reusing as
much as possible the capabilities provided by the existing architecture, rather
than simply building a separate emotion module. One obvious example is leveraging
Sigma’s hybrid (discrete + continuous) mixed (symbolic + probabilistic) nature to
© Springer International Publishing Switzerland 2015
J. Bieger (Ed.): AGI 2015, LNAI 9205, pp. 142–151, 2015.
DOI: 10.1007/978-3-319-21365-1_15
Towards Emotion in Sigma: From Appraisal to Attention 143
support both the low-level subsymbolic aspects of emotion and the high-level symbol-
ic aspects. Another such example is the seamless mapping of Sigma’s tri-level cogni-
tive control [4] – as inherited from Soar [5] and comprising reactive, deliberative and
reflective levels – onto tri-level theories of emotion [6], suggesting a more unified tri-
level model of emotocognitive processing.
A less obvious example is the essential role that Sigma’s gradient-descent learning
mechanism [7] has turned out to play in appraisal [8]. Appraisal is typically
considered the initial stage of emotional processing, capturing emotionally and
behaviorally relevant assessments of situations in terms of a relatively small set of
variables, such as relevance, desirability, likelihood, expectedness, causal attribution,
controllability and changeability in the EMA theory [9]. These ground appraisals, or
combinations thereof, may then lead to higher-order appraisals, transient emotional
states, and a variety of important impacts on thought and behavior.
Still, extensions to Sigma’s architecture are clearly necessary to fully support emo-
tional processing. Prior to this work, Sigma had no emotions. Yet, the immutable and
mandatory nature of emotions implies they must be deeply rooted in the architecture.
Central to this effort is understanding the architectural extensions necessary to (1)
enable the ground appraisals that initiate emotional processing, and (2) yield the ap-
propriate emotional modulations of thought and behavior.
This article provides an initial report on work towards emotion in Sigma, focused on
architectural variants of desirability and expectedness, along with their initial impacts on
attention. Key to both appraisals is a new architectural mechanism for comparing distri-
butions, with desirability based on comparing the distributions over the current state and
the goal, and expectedness based on comparing the distributions over a fragment of
memory before and after learning. Attention then leverages these appraisals to focus
processing at multiple levels of control. This is the first architectural model of low-level
attention that stretches all of the way from appraisal to its impact on thought. It also
demonstrates a complementary impact on higher-level attention.
There is considerable recent work on emotion in cognitive architectures – e.g., in
Soar [10], PsychSim [11], FAtiMA [12], EmoCog [13], MicroPsi [14], ACT-R [15],
BICA [16], and CLARION [17] – but Sigma’s unique aspects shed new light on
how this can be done. Section 2 provides the basics of Sigma needed for this work.
Sections 3 and 4 cover expectedness and desirability. Attention is covered in
Section 5, with a wrap up in Section 6.
2 Sigma
3 Expectedness
Expectedness concerns whether an event is predicted by past knowledge. Its inverse
maps naturally, as unexpectedness, onto the notion of surprise that underlies the
bottom-up aspects of today’s leading models of visual attention. In other words, atten-
tion is drawn to what is surprising or unexpected; e.g., the Bayesian Theory of
Surprise compares the prior distribution over the visual field – i.e., the model that has
previously been learned for it – with the posterior distribution derived via Bayesian
belief updating of the prior given the image [21]. The size of the difference correlates
with how poorly past knowledge predicts the image. This comparison is computed by
the Kullback-Leibler (KL) divergence, with M the current model and D the new data:
P(M | D)
S(D, M ) = KL(P(M | D), P(M )) = P(M | D)log P(M )
dM. (1)
M
Towards Emotion in Sigma: From Appraisal to Attention 145
The computation of surprise in Sigma tracks this approach, but differs in several
details. Distribution updating is mediated by Sigma’s gradient-descent learning me-
chanism – as applied at FFNs – with the functions before and after learning compared
ere the prior is replaced by the posterior as the node’s function. Also, rather than
basing the comparison on KL divergence it is based on Hellinger distance:
Fig. 3. Visual field before and after change in bottom left cell, plus the resulting surprise map.
Each cell has a (Boolean) distribution over colors, but just the argmaxes are shown.
Surprise has also been explored in more complex pre-existing tasks, such as Simul-
taneous Localization and Mapping (SLAM) [7]. In SLAM surprise is computed over
the learned map, a fragment of mental imagery [23] rather than direct perception, with
local input focusing surprise on the current and previous locations in the map. In all,
the work to date has moved Sigma from where it had no measure of surprise to where
it is computable over any memorial predicate, whether perceptual or cognitive.
146 P.S. Rosenbloom et al.
4 Desirability
Desirability concerns whether or not an event facilitates or thwarts what is wanted. In
Sigma it is modeled as a relationship between the current state and the goal. The for-
mer is in working memory; however, until recently, Sigma did not have goals that the
architecture could comprehend. Although Sigma, like Soar, has deep roots in search
and problem solving, neither natively embodied declarative goals that would enable
automated comparisons. Driven by the needs of emotion, a goal function can now be
specified for each predicate in Sigma, leading to an automatically created goal predi-
cate whose WM represents the goal function; e.g., a pattern of tiles to be reached in
the Eight Puzzle can be stored in the WM of the board*goal predicate. Thus, in-
vestigating appraisal has led to the resolution of a decades-long issue in problem-
solving architectures. In principle, this shouldn’t be too surprising – if emotions exist
for functional reasons, they ought to support gains in other system capabilities.
Given a predicate’s state and goal, desirability amounts to how similar/different the
state is to/from the goal. Although similarity in Sigma was first implemented as the
dot product of the state and goal functions, once surprise was implemented it became
clear that the Hellinger distance could directly yield a difference measure here, while
the Bhattacharyya coefficient, a key subpart of the Hellinger distance, could replace
the dot product in computing similarity:
Difference(S,G) = HD(S,G) = 1− s(x)g(x) dx . (3)
Fig. 4. Eight Puzzle state and goal configurations, plus the resulting desirability maps. The first
two show argmaxes over (Boolean) distributions. No goal has been set for the center cell.
Towards Emotion in Sigma: From Appraisal to Attention 147
Fig. 5. Visual field state and goal (argmaxes), plus the resulting desirability maps
5 Attention
The primary reactive cost in Sigma is message processing at nodes in the factor
graph; i.e., computing message products and summarizing out unneeded dimensions
from them. Many optimizations have already been introduced into Sigma to reduce
the number of messages passed [26] and the cost per message [27]. Simulated paral-
lelism has also been explored [26]. Yet, attention may support further non-
correctness-preserving optimizations that still yield good enough answers.
A range of attentional approaches have been considered that reduce the number of
messages sent and/or the cost of processing individual messages, with one form of the
latter chosen for initial experiments. The basic idea is to use an attention map for each
predicate in guiding abstraction of messages out of FFNs. The intent is to yield small-
er messages that are cheaper to process yet still maintain the information critical for
effective performance. The approach is analogous to attention-based image compres-
sion [28], but here it reduces inner-loop costs within a cognitive architecture.
The attention map for a predicate is automatically computed from its surprise map
and/or its progress/difference map. When there is a learned function for a predicate, a
surprise map exists and provides the bottom-up input to attention. This makes sense
conceptually – what is expected is not informative, and has little utility unless rele-
vant to goals (making it a top-down factor) – and has a strong grounding in human
cognition [21]. When a predicate has a goal, progress and difference maps exist, and
one of them is reused as the top-down input to the attention map. Again this makes
sense conceptually, as top-down input is goal/task related, but there is some subtlety
required in determining which of the two desirability maps to use.
In problem solving, the focus should be on those parts of the state that differ from
the goal – i.e., the difference map – as this is where problem-solving resources are
most needed. However, in visual search, what matters are the regions that match the
goal – i.e., the progress map – as they correspond to what is being sought. One way of
dealing with this conundrum is to invert the sense of the goal in visual search so that it
would seek differences from not yellow rather than similarities to yellow. An alterna-
tive is to identify a fundamental distinction between the two problem classes that
would enable difference to be used for the first and progress for the second.
A variant of the second approach has been imple-
mented, based on closed-world predicates – as seen in
the more stable, all-or-none, states found in problem
solving – versus open-world predicates – as seen in
perception and other forms of more transient distribu-
tional information. The attention map for a predicate is
therefore a combination of surprise and difference for Fig. 6. Normalized attention
closed-world predicates, and surprise and progress for map for visual search
open-world predicates. If either map in the pair doesn’t
exist, the attention map is simply the one that does exist. If neither exists, there is no
attention map. When both maps exist, they are combined via an approximation to
probabilistic or that enables both to contribute while their combination remains ≤1:
P(A ∨ B) = P(A) + P(B) − P(A ∧ B) ≈ P(A) + P(B) − P(A)P(B). (5)
Fig. 6 shows the attention map for visual search after the change in Fig. 3(b), based on
the surprise map in Fig. 3(c) and the progress map in Fig. 5(c). Bottom-up attention
boosts the single changed region, while top-down boosts the two yellow regions.
Towards Emotion in Sigma: From Appraisal to Attention 149
Given such an attention map, message abstraction out of FFNs then leverages the
piecewise-linear nature of Sigma’s functions via an existing mechanism that minimiz-
es the number of regions in functions by eliminating slices, and thus region bounda-
ries, when the difference between the functions in each pair of regions spanning a
slice is below a threshold. In particular, at an FFN the attention map for the predicate
is first scaled and then exponentiated to increase the contrast between large and small
values (the scale is set so that the maximum value is 1 after exponentiation). This
exponentiated attention map is then multiplied times the factor function, and slices in
the original function are removed if the differences are below threshold in this mod-
ified version. In contrast to normal slice removal, where the functions across the slice
are similar enough for either to be used for the new expanded region, here the func-
tions contributing to the new region are averaged. Fig. 7 shows the resulting message
in the visual-search task. Only 4 regions are removed here, but many more can be
removed for larger images; for example, with a 200×200 image the reduction is from
160,000 regions to 12. Significant cost savings can ac-
crue as well, with a factor of ~3 seen with large images.
In addition to visual search, reactive attention has also
been explored in SLAM. We were able to verify that a
correct map could still be learned, and that the messages
from the FFNs are smaller, but so far these reductions
have not been sufficient for significant cost savings in
this task.
Moving up the emotocognitive hierarchy to the deli-
berative level, it should be clear that a huge amount is Fig. 7. Abstracted outgoing
already known about attention at this level, just mostly message with two mixed
not under this name. Decision-making, planning and blue-red cells
problem solving are all concerned with deciding what to
do next, which is the essence of deliberative attention. However, with the notable
exception of [29], tying this to appraisals is rare. To date in Sigma, desirability – and,
in particular, progress – has been explored as an automatic evaluation
function for (reflective) hill climbing in the Eight Puzzle. When all of the map’s
dimensions are summarized out via integration, the result is a single number in [0, 1]
specifying the fraction of the tiles that are in their desired locations. The result here is
an evaluation function that enables successful solution of many Eight Puzzle prob-
lems without the task-specific control knowledge previously added by hand.
Further attentional extensions within easy reach include: bottom-up inputs to deci-
sions [29], progress as a reward function in reinforcement learning [30], difference as
a guide in means-ends analysis (as in GPS [31]), and reflective attention.
6 Wrap Up
This work contributes novel architectural models of the expectedness and desirability
appraisal variables, along with an initial investigation of their architectural implications
for computational attention, both reactive (in aid of reducing message computation)
and deliberative (in aid of guiding decisions). The approach to reactive attention
150 P.S. Rosenbloom et al.
particularly breaks new ground, while also contributing an extension of existing ideas
about perceptual attention to explain low-level cognitive attention.
These results leverage many of Sigma’s existing capabilities – including its (1) hy-
brid mixed function representation, (2) predicate factor graphs (particularly including
working memories, perceptual buffers, and factor functions), (3) gradient-descent
learning mechanism, (4) ability to remove unnecessary slices from functions, and (5)
reflective problem solving. Added to the architecture were (1) a mechanism for com-
paring two distributions, (2) an architectural representation of declarative goals, (3)
predicates for appraisal variables, and (4) a mechanism for abstracting graph messag-
es based on an attention map. Rather than forming a distinct emotion module, these
largely just amount to more reusable architectural fragments.
Still, this work just scratches the surface of all that is needed to implement
emotion fully within Sigma. More appraisal variables are clearly needed, such as
controllability – with its close ties to decision-making – and social appraisals, with
their potential grounding in recent work on Theory of Mind in Sigma [4]. It also
makes sense to explore aggregation of appraisals across predicates. Much more is
also needed concerning the impact of appraisals on thought and behavior. Here we
began exploring the impact on attention. We have also begun investigating the impact
of the approach on drives and moods, based on further leveraging of distribution
comparisons and learning. Beyond this are also the broad topic of coping and the
larger question of the relationship of emotions to embodiment. Sigma has recently
been connected to a virtual human body [32], but this is still just a beginning.
Acknowledgments. This effort has been sponsored by the U.S. Army. Statements and opinions
expressed do not necessarily reflect the position or the policy of the United States Government,
and no official endorsement should be inferred. We would also like to thank Abram Demski,
Himanshu Joshi and Sarah Kenny for helpful comments.
References
1. Rosenbloom, P.S.: The Sigma cognitive architecture and system. AISB Quarterly 136, 4–13
(2013)
2. Langley, P., Laird, J.E., Rogers, S.: Cognitive architectures: Research issues and challenges.
Cognitive Systems Research 10, 141–160 (2009)
3. Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques.
MIT Press, Cambridge (2009)
4. Pynadath, D.V., Rosenbloom, P.S., Marsella, S.C., Li, L.: Modeling two-player games in
the Sigma graphical cognitive architecture. In: Kühnberger, K.-U., Rudolph, S., Wang, P.
(eds.) AGI 2013. LNCS, vol. 7999, pp. 98–108. Springer, Heidelberg (2013)
5. Laird, J.E.: The Soar Cognitive Architecture. MIT Press, Cambridge, MA (2012)
6. Ortony, A., Norman, D.A., Revelle, W.: Affect and Proto-affect in effective functioning.
In: Fellous, J.M., Arbib, M.A. (eds.) Who Needs Emotions? The Brain Meets the Machine.
Oxford University Press, New York (2005)
7. Rosenbloom, P.S., Demski, A., Han, T., Ustun, V.: Learning via gradient descent in
Sigma. In: Proceedings of the 12th International Conference on Cognitive Modeling (2013)
8. Moors, A., Ellsworth, P.C., Scherer, K.R., Frijda, N.H.: Appraisal theories of emotion:
State of the art and future development. Emotion Review 5, 119–124 (2013)
9. Marsella, S., Gratch, J.: EMA: A Process Model of Appraisal Dynamics. Journal of Cognitive
Systems Research 10, 70–90 (2009)
Towards Emotion in Sigma: From Appraisal to Attention 151
10. Marinier, R., Laird, J., Lewis, R.: A Computational Unification of Cognitive Behavior and
Emotion. Journal of Cognitive Systems Research 10, 48–69 (2009)
11. Si, M., Marsella, S., Pynadath, D.: Modeling appraisal in Theory of Mind Reasoning.
Journal of Autonomous Agents and Multi-Agent Systems 20, 14–31 (2010)
12. Dias, J., Mascarenhas, S., Paiva, A.: FAtiMA modular: towards an agent architecture with
a generic appraisal framework. In: Proceedings of the International Workshop on
Standards for Emotion Modeling (2011)
13. Lin, J., Spraragen, M., Blyte, J., Zyda, M.: EmoCog: computational integration of emotion
and cognitive architecture. In: Proceedings of the Twenty-Fourth International Florida
Artificial Intelligence Research Society Conference (2011)
14. Bach, J.: A framework for emergent emotions, based on motivation and cognitive modulators.
International Journal of Synthetic Emotions 3, 43–63 (2012)
15. Dancy, C.L.: ACT-RΦ: A cognitive architecture with physiology and affect. Biologically
Inspired Cognitive Architectures 6, 40–45 (2013)
16. Samsonovich, A.V.: Emotional biologically inspired cognitive architecture. Biologically
Inspired Cognitive Architectures 6, 109–125 (2013)
17. Wilson, N.R., Sun, R.: Coping with bullying: a computational emotion-theoretic account.
In: Proceedings of the 36th Annual Conference of the Cognitive Science Society (2014)
18. Kschischang, F.R., Frey, B.J., Loeliger, H.: Factor Graphs and the Sum-Product
Algorithm. IEEE Transactions on Information Theory 47, 498–519 (2001)
19. Rosenbloom, P.S.: Bridging dichotomies in cognitive architectures for virtual humans. In:
Proceedings of the AAAI Fall Symposium on Advances in Cognitive System (2011)
20. Joshi, H., Rosenbloom, P.S., Ustun, V.: Isolated word recognition in the Sigma cognitive
architecture. Biologically Inspired Cognitive Architectures 10, 1–9 (2014)
21. Itti, L., Baldi, P.F.: Bayesian surprise attracts human attention. In: Advances in Neural
Information Processing Systems, vol. 19 (2006)
22. Reisenzein, R.: Emotions as metarepresentational states of mind: Naturalizing the belief–
desire theory of emotion. Cognitive Systems Research 10, 6–20 (2009)
23. Rosenbloom, P.S.: Extending mental imagery in Sigma. In: Bach, J., Goertzel, B., Iklé, M.
(eds.) AGI 2012. LNCS, vol. 7716, pp. 272–281. Springer, Heidelberg (2012)
24. Frintrop, S., Rome, E., Christensen, H.I.: Computational visual attention systems and their
cognitive foundation: a survey. ACM Transactions on Applied Perception 7 (2010)
25. Itti, L., Borji, A.: State-of-the-art in visual attention modeling. IEEE Transactions on
Pattern Analysis and Machine Intelligence 35, 185–207 (2013)
26. Rosenbloom, P.S.: Towards a 50 msec cognitive cycle in a graphical architecture. In:
Proceedings of the 11th International Conference on Cognitive Modeling (2012)
27. Rosenbloom, P.S., Demski, A., Ustun, V.: Efficient message computation in Sigma’s
graphical architecture. Biologically Inspired Cognitive Architectures 11, 1–9 (2015)
28. Itti, L.: Automatic foveation for video compression using a neurobiological model of
visual attention. IEEE Transactions on Image Processing 13, 1304–1318 (2004)
29. Marinier, R.P.: A Computational Unification of Cognitive Control, Emotion, and Learning.
Ph.D Thesis, University of Michigan (2008)
30. Marinier, R.P, Laird, J.E.: Emotion-driven reinforcement learning. In: Proceedings of the
30th Annual Meeting of the Cognitive Science Society (2008)
31. Newell, A., Shaw, J.C., Simon, H.A.: Report on a general problem-solving program.
In: Proceedings of the International Conference on Information Processing (1959)
32. Ustun, V., Rosenbloom, P.S.: Towards adaptive, interactive virtual humans in Sigma. In: Pro-
ceedings of the Fifteenth International Conference on Intelligent Virtual Agents (2015). In press
Inferring Human Values for Safe AGI Design
1 Introduction
1
We use goals, rewards, utilities, and values interchangeably in this work.
c Springer International Publishing Switzerland 2015
J. Bieger (Ed.): AGI 2015, LNAI 9205, pp. 152–155, 2015.
DOI: 10.1007/978-3-319-21365-1 16
Inferring Human Values for Safe AGI Design 153
because of their long list of assumptions. For instance, in most IRL methods
the environment is usually assumed to be stationary, fully observable, and some-
times known; the policy of the agent is assumed to be stationary and optimal
or near-optimal; the reward function is assumed to be stationary as well; and
the Markov property is assumed. Such assumptions are reasonable for limited
motor control tasks such as grasping and manipulation; however, if our goal is
to learn high-level human values, they become unrealistic. For instance, assum-
ing that humans have optimal policies discards the possibility of superintelligent
machines and ignores the entire cognitive biases literature. In this work, we pro-
pose a general framework for inferring the reward mechanisms of arbitrary agents
that relaxes all the aforementioned assumptions. Through this work, we do not
only intend to offer a potential solution to the problem of inferring human values
(i.e., the so-called Value Learning Problem [7]), but also stimulate AI researchers
to investigate the theoretical limits of IRL.
where a1:n := a1 a2
. . . an , o1:n := o1 o2 . . . on , and pR (o1:n ) = r1 r2 . . . rn . It
should be noted that pR m(pR ||a1:n , o1:n ) = 1 and the true probability measure
154 C.E. Sezener
Fig. 1. The interaction between the agent, the environment, and the reward mechanism
can be obtained via normalization. We also assume that the agent cannot access
the reward mechanism directly, but can only sample it. If the agent has access
to the reward mechanism, pA (pR (o1:n ), o1:n ) in (1) should be replaced with
pA (pR (o1:n ), pR , o1:n ).
Equation 1 provides a simple way to to estimate reward mechanisms of arbi-
trary agents with a very few assumptions. We do not assume Markov property,
fully-observable and stationary environments, optimal and stationary policies, or
stationary rewards. However, this degree of generality comes with high computa-
tional costs. Due to the infinite loop over the programs and the existence of non-
halting programs, this solution is incomputable. Nevertheless, one can obtain
approximations of (1) or use different complexity measures (such as Schmidhu-
ber’s Speed Prior [6]) in order to obtain computable solutions.
It should also be noted that even though we assumed deterministic agents
and reward mechanisms and fully-observable action-observation histories, these
assumption can be relaxed and a framework that assumes probabilistic agent
and reward functions and noisy action-observation histories can be developed.
3 Discussion
In principle if we can capture the actions and observations of a human with high
accuracy, we might be able to estimate its values. This is a potential solution for
the Value Learning Problem [7]. For example, we can infer the values of some
individuals who are ‘good’ members of the society and possess ‘desirable’ values.
Then we can preprocess the inferred values and give a mixture of them to an
AGI system as its reward mechanism. The preprocessing stage would involve
weeding out states/activities that are valuable for biological agents but not for
robots such as eating2 . How to achieve this is an open problem.
2
This should be done such that the robot will not value consuming food but will value
providing humans with food.
Inferring Human Values for Safe AGI Design 155
Dewey [1] suggests an AGI architecture that replaces the rewards in AIXI
with a utility function as well. The proposed agent can either be provided with a
hand-crafted utility function or a set of candidate, weighted utility functions. If
the latter is the case, the agent can improve its utility function by adjusting the
weights. However, it is not specified how the agent should or can do the adjust-
ments. Furthermore, the proposed agent improves its utility function through
interacting with the environment, whereas we suggest that human values should
be estimated and processed first and then be provided to an AGI system.
Acknowledgments. I would like to thank Erhan Oztop for helpful discussions and
comments and the anonymous reviewers for their suggestions.
References
1. Dewey, D.: Learning what to value. In: Schmidhuber, J., Thórisson, K.R., Looks,
M. (eds.) AGI 2011. LNCS, vol. 6830, pp. 309–314. Springer, Heidelberg (2011)
2. Hibbard, B.: Avoiding unintended AI behaviors. In: Bach, J., Goertzel, B., Iklé, M.
(eds.) AGI 2012. LNCS, vol. 7716, pp. 107–116. Springer, Heidelberg (2012)
3. Hutter, M.: Universal Artificial Intelligence: Sequential Decisions based on Algo-
rithmic Probability. Springer, Berlin (2005)
4. Muehlhauser, L., Helm, L.: The singularity and machine ethics. In: Eden, A.H.,
Moor, J.H., Sraker, J.H., Steinhart, E. (eds.) Singularity Hypotheses, pp. 101–126.
Springer, Heidelberg (2012). The Frontiers Collection
5. Ng, A.Y., Russell, S.J.: Algorithms for inverse reinforcement learning. In: Proceed-
ings of the Seventeenth International Conference on Machine Learning, ICML 2000,
pp. 663–670. Morgan Kaufmann Publishers Inc., San Francisco (2000)
6. Schmidhuber, J.: The speed prior: a new simplicity measure yielding near-optimal
computable predictions. In: Kivinen, J., Sloan, R.H. (eds.) COLT 2002. LNCS
(LNAI), vol. 2375, p. 216. Springer, Heidelberg (2002)
7. Soares, N.: The value learning problem. Tech. rep., Machine Intelligence ResearchIn-
stitute, Berkeley, CA (2015)
8. Solomonoff, R.: A formal theory of inductive inference. part i. Information and
Control 7(1), 1–22 (1964)
9. Yudkowsky, E.: Complex value systems in friendly AI. In: Schmidhuber, J.,
Thórisson, K.R., Looks, M. (eds.) AGI 2011. LNCS, vol. 6830, pp. 388–393. Springer,
Heidelberg (2011)
Two Attempts to Formalize Counterpossible
Reasoning in Deterministic Settings
1 Introduction
What does it mean to “make good decisions”? To formalize the question, it
is necessary to precisely define a process that takes a problem description and
identifies the best available decision (with respect to some set of preferences1 ).
Such a process could not be run, of course; but it would demonstrate a full
understanding of the question.
The difficulty of this question is easiest to illustrate in a deterministic setting.
Consider a deterministic decision procedure embedded within a deterministic
environment (e.g., an algorithm operating in a virtual world). There is exactly
one action that the decision procedure is going to select. What, then, “would
happen” if the decision procedure selected a different action instead? At a glance,
this question seems ill-defined, and yet, this is the problem faced by a decision
procedure embedded within an environment.
Philosophers have studied candidate procedures for quite some time, under
the name of decision theory. The investigation of what is now called decision
theory stretches back to Pascal and Bernoulli; more recently decision theory has
been studied by Lehmann [7], Lewis [9], Jeffrey [6], Pearl [12] and many others.
Unfortunately, the standard answers from the literature do not allow for the
description of an idealized decision procedure, as discussed in Section 2. Section 3
introduces the notion of “counterpossibles” (logically impossible counterfactuals)
1
For simplicity, assume von Neumann-Morgenstern rational preferences [13], that is,
preferences describable by some utility function. The problems discussed in this
paper arise regardless of how preferences are encoded.
c Springer International Publishing Switzerland 2015
J. Bieger (Ed.): AGI 2015, LNAI 9205, pp. 156–165, 2015.
DOI: 10.1007/978-3-319-21365-1 17
Two Attempts to Formalize Counterpossible Reasoning 157
and motivates the need for a decision theory using them. It goes on to discuss
two attempts to formalize such a decision theory, one using graphical models
and another using proof search. Section 4 concludes.
2 Counterfactual Reasoning
The modern academic standard decision theory is known as “causal decision
theory,” or CDT. It is used under the guise of “potential outcomes” in statistics,
economics and game theory, and it is used implicitly by many modern narrow
AI systems under the guise of “decision networks.”
Pearl’s calculus of interventions on causal graphs [12] can be used to formalize
CDT. This requires that the environment be represented by a causal graph in
which the agent’s action is represented by a single node. This formalization of
CDT prescribes evaluating what “would happen” if the agent took the action a
by identifying the agent’s action node, cutting the connections between it and
its causal ancestors, and setting the output value of that node to be a. This is
known as a causal intervention. The causal implications of setting the action
node to a may then be evaluated by propagating this change through the causal
graph in order to determine the amount of utility expected from the execution of
action a. The resulting modified graph is a “causal counterfactual” constructed
from the environment.
Unfortunately, causal counterfactual reasoning is unsatisfactory, for two rea-
sons. First, CDT is underspecified: it is not obvious how to construct a causal
graph in which the agent’s action is an atomic node. While the environment
can be assumed to have causal structure, a sufficiently accurate description of
the problem would represent the agent as arising from a collection of transis-
tors (or neurons, or sub-atomic particles, etc.). While it seems possible in many
cases to draw a boundary around some part of the model which demarcates “the
agent’s action,” this process may become quite difficult in situations where the
line between “agent” and “environment” begins to blur, such as scenarios where
the agent distributes itself across multiple machines.
Secondly, CDT prescribes low-scoring actions on a broad class of decision
problems where high scores are possible, known as Newcomblike problems [11].
For a simple example of this, consider a one-shot Prisoner’s Dilemma played by
two identical deterministic agents. Each agent knows that the other is identical.
Agents must choose whether to cooperate (C) or defect (D) without prior coor-
dination or communication. If both agents cooperate, they both achieve utility
2. If both defect, they both achieve utility 1. If one cooperates and the other
defects, then the defector achieves 3 utility while the cooperator achieves 0.2
2
This scenario (and other Newcomblike scenarios) are multi-agent scenarios. Why use
decision theory rather than game theory to evaluate them? The goal is to define a
procedure which reliably identifies the best available action; the label of “decision
theory” is secondary. The desired procedure must identify the best action in all set-
tings, even when there is no clear demarcation between “agent” and “environment.”
Game theory informs, but does not define, this area of research.
158 N. Soares and B. Fallenstein
The actions of the two agents will be identical by assumption, but neither
agent’s action causally impacts the other’s: in a causal model of the situation,
the action nodes are causally separated, as in Figure 1. When determining the
best action available to the left agent, a causal intervention changes the left node
without affecting the right one, assuming there is some fixed probability p that
the right agent will cooperate independent of the left agent. No matter what
value p holds, CDT reasons that the left agent gets utility 2p if it cooperates
and 2p + 1 if it defects, and therefore prescribes defection [8].
A O
Fig. 1. The causal graph for a one-shot Prisoner’s Dilemma. A represents the agent’s
action, O represents the opponent’s action, and U represents the agent’s utility.
Indeed, many decision theorists hold that it is in fact rational for an agent to
defect against a perfect copy of itself in a one-shot Prisoner’s Dilemma, as after
all, no matter what the opponent does, the agent does better by defecting [5,9].
Others object to this view, claiming that since the agents are identical, both
actions must match, and mutual cooperation is preferred to mutual defection,
so cooperation is the best available action [1]. Our view is that, in the moment,
it is better to cooperate with yourself than defect against yourself, and so CDT
does not reliably identify the best action available to an agent.
CDT assumes it can hold the action of one opponent constant while freely
changing the action of the other, because the actions are causally separated.
However, the actions of the two agents are logically connected; it is impossible for
one agent to cooperate while the other defects. Causal counterfactual reasoning
neglects non-causal logical constraints.
It is a common misconception that Newcomblike scenarios only arise when
some other actor is a perfect predictor (perhaps by being an identical copy).
This is not the case: while Newcomblike scenarios are most vividly exemplified
by situations involving perfect predictors, they can also arise when other actors
have only partial ability to predict the agent [10]. For example, consider a situ-
ation in which an artificial agent is interacting with its programmers, who have
intimate knowledge of the agent’s inner workings. The agent could well find itself
embroiled in a Prisoner’s Dilemma with its programmers. Let us assume that
the agent knows the programmers will be able to predict whether or not it will
cooperate with 90% accuracy. In this case, even though the programmers are
imperfect predictors, the agent is in a Newcomblike scenario.
In any case, the goal is to formalize what is meant when asking that agents
take “the best available action.” Causal decision theory often identifies the best
Two Attempts to Formalize Counterpossible Reasoning 159
3 Counterpossibles
Consider the sort of reasoning that a human might use, faced with a Prisoner’s
Dilemma in which the opponent’s action is guaranteed to match our own:
The opponent will certainly take the same action that I take. Thus, there
is no way for me to exploit the opponent, and no way for the opponent
to exploit me. Either we both cooperate and I get $2, or we both defect
and I get $1. I prefer the former, so I cooperate.
Contrast this with the hypothetical reasoning of a reasoner who, instead, reasons
according to causal counterfactuals:
There is some probability p that the opponent defects. (Perhaps I can
estimate p, perhaps not.) Consider cooperating. In this case, I get $2
if the opponent cooperates and $0 otherwise, for a total of $2p. Now
consider defecting. In this case I get $3 if the opponent cooperates and
$1 otherwise, for a total of $2p + 1. Defection is better no matter what
value p takes on, so I defect.
Identifying the best action requires respecting the fact that identical algorithms
produce identical outputs. It is not the physical output of the agent’s hardware
which must be modified to construct a counterfactual, it is the logical output of
the agent’s decision algorithm. This insight, discovered independently by Dai [4]
and Yudkowsky [14], is one of the main insights behind “updateless decision
theory” (UDT).
UDT identifies the best action by evaluating a world-model which represents
not only causal relationships in the world, but also the logical effects of algo-
rithms upon the world. In a symmetric Prisoner’s Dilemma, a reasoner following
the prescriptions of UDT might reason as follows:
The physical actions of both myself and my opponent are determined
by the same algorithm. Therefore, whatever action this very decision
algorithm selects will be executed by both of us. If this decision algorithm
selects “cooperate” then we’ll both cooperate and get a payoff of 2. If
instead this decision algorithm selects “defect” then we’ll both defect and
get a payoff of 1. Therefore, this decision algorithm selects “cooperate.”
Using reasoning of this form, a selfish agent acting according to the prescriptions
of UDT cooperates with an identical agent on a symmetric one-shot Prisoner’s
Dilemma, and achieves the higher payoff.3
3
The agent does not care about the utility of its opponent. Each agent is maximizing
its own personal utility. Both players understand that the payoff must be symmetric,
and cooperate out of a selfish desire to achieve the higher symmetric payoff.
160 N. Soares and B. Fallenstein
A()
A O
Fig. 2. The logical graph for a symmetric Prisoner’s Dilemma where both the agent’s
action A and the opponent’s action O are determined by the algorithm A()
Given a probabilistic graphical model of the world representing both logical and
causal connections, and given that one of the nodes in the graph corresponds to
the agent’s decision algorithm, and given some method of propagating updates
through the graph, UDT can be specified in a manner very similar to CDT. To
identify the best action available to an agent, iterate over all available actions a ∈
A, change the value of the agent’s algorithm node in the graph to a, propagate
the update, record the resulting expected utility, and return the action a leading
to the highest expected utility. There are two obstacles to formalizing UDT in
this way.
4
Some versions of counterpossibles are quite intuitive; for instance, we could imagine
how the cryptographic infrastructure of the Internet would fail if we found that
P = NP, and it seems as if that counterfactual would still be valid even once we
proved that P = NP. And yet by the Principle of Explosion, literally any consequence
can be deduced from a falsehood, and thus no counterfactual could be “more valid”
than any other in a purely formal sense.
Two Attempts to Formalize Counterpossible Reasoning 161
The first obstacle is that UDT (like CDT) is underspecified, pending a formal
description of how to construct such a graph from a description of the environ-
ment (or, eventually, from percepts). However, constructing a graph suitable
for UDT is significantly more difficult than constructing a graph suitable for
CDT. While both require decreasing the resolution of the world model until the
agent’s action (in CDT’s case) or algorithm (in UDT’s case) is represented by a
single node rather than a collection of parts, the graph for UDT further requires
some ability to identify and separate “algorithms” from the physical processes
that implement them. How is UDT supposed to recognize that the agent and its
opponent implement the same algorithm? Will this recognition still work if the
opponent’s algorithm is written in a foreign programming language, or otherwise
obfuscated in some way?
A() X
A O
Fig. 3. The desired logical graph for the one-shot Prisoner’s Dilemma where agent
A acts according to A(), and the opponent either mirrors A() or does the opposite,
according to the random variable X
Even given some reliable means of identifying copies of an agent’s decision algo-
rithm in the environment, this may not be enough to specify a satisfactory
graph-based version of UDT. To illustrate, consider UDT identifying the best
action available to an agent playing a Prisoner’s Dilemma against an opponent
that does exactly the same thing as the agent 80% of the time, and takes the
opposite action otherwise. It seems UDT should reason according to a graph as
in Figure 3, in which the opponent’s action is modeled as dependent both upon
the agent’s algorithm and upon some source X of randomness. However, gener-
ating logical graphs as in Figure 3 is a more difficult task than simply detecting
all perfect copies of the an algorithm in an environment.
Secondly, a graphical model capable of formalizing UDT must provide some
way of propagating “logical updates” through the graph, and it is not at all
clear how these logical updates could be defined. Whenever one algorithm’s
“logical node” in the graph is changed, how does this affect the logical nodes
of other algorithms? If the agent’s algorithm selects the action a, then clearly
the algorithm “do what the agent does 80% of the time and nothing otherwise”
is affected. But what about other algorithms which correlate with the agent’s
algorithm, despite not referencing it directly? What about the algorithms of
other agents which base their decisions on an imperfect model of how the agent
will behave? In order to understand how logical updates propagate through a
162 N. Soares and B. Fallenstein
logical graph, we desire a better notion of how “changing” one logical fact can
“affect” another logical fact.
Given some method of reasoning about the effects of A() = a on any other algo-
rithm, a graphical formalization of UDT is unnecessary: the environment itself is
an algorithm which contains the agent, and which describes how to compute the
agent’s expected utility! Therefore, a formal understanding of “logical updating”
could be leveraged to analyze the effects of A() = a upon the environment; to
evaluate the action a, UDT need only compute the expected utility available in
the environment as modified by the assumption A() = a.
This realization leads to the idea of “proof-based UDT,” which evaluates
actions by searching for formal proofs, using some mathematical theory such as
Peano Arithmetic (PA), of how much utility is attained in the world-model if
A() selects the action a. As a bonus, this generic search for formal proofs obvi-
ates the need to identify the agent in the environment: given an environment
which embeds the agent and a description of the agent’s algorithm, no matter
how the agent is embedded in the environment, a formal proof of the outcome
will implicitly identify the agent and describe the implications of that algorithm
outputting a. While that proof does the hard work of propagating counterpos-
sibles, the high-level UDT algorithm simply searches all proofs, with no need to
formally locate the agent. This allows for an incredibly simple specification of
updateless decision theory, given below.
First, a note on syntax: Square quotes ( · ) denote sentences encoded as
objects that a proof searcher can look for. This may be done via e.g., a Gödel
encoding. Overlines within quotes denote “dequotes,” allowing the reference of
meta-level variables. That is, if at some point in the algorithm a := 3 and o := 10,
then the string A() = a → E() = o is an abbreviation of A() = 3 → E() =
10. The arrow → denotes logical implication.
The algorithm is defined in terms of a finite set A of actions available to the
agent and a finite sorted list O of outcomes that could be achieved (ordered from
best to worst). The proof-based UDT algorithm takes a description E() of the
environment and A() of the agent’s algorithm. E() computes an outcome, A()
computes an action. It is assumed (but not necessary) that changing the output
of A() would change the output of E().
5
One must be careful with this sort of reasoning, for if PA could prove that A() = D
then it could also prove A() = C → E() = 3 by the principle of explosion. However,
in this case, that sort of “spurious proof” is avoided by technical reasons discussed
by BensonspsTilsen [2].
164 N. Soares and B. Fallenstein
truly leads to the highest outcome, or there is no action a such that PA can prove
A() = a, and thus the only proofs found will be genuine implications. Even so,
the apparent deficits of UDT at analyzing other algorithms are troubling, and
it is not obvious that reasoning about the logical implications of A() = a is the
right way to formalize counterpossible reasoning.
A better understanding of counterpossible reasoning may well be necessary in
order to formalize UDT in a stochastic setting, where it maximizes expected util-
ity instead of searching for proofs of a certain outcome. Such an algorithm would
evaluate actions conditioned on the logical fact A() = a, rather than searching
for logical implications. How does one deal with the case where A() = a, so that
A() = a is a zero-probability event? In order to reason about expected utility
conditioned on A() = a, it seems necessary to develop a more detailed under-
standing of counterpossible reasoning. If one deterministic algorithm violates the
laws of logic in order to output something other than what it outputs, then how
does this affect other algorithms? Which laws of logic, precisely, are violated,
and how does this violation affect other logical statements?
It is not clear that these questions are meaningful, nor even that a satisfactory
general method of reasoning about counterpossibles actually exists. It is plau-
sible that a better understanding of reasoning under logical uncertainty would
shed some light on these issues, but a satisfactory theory of reasoning under
logical uncertainty does not yet exist.6 Regardless, it seems that some deeper
understanding of counterpossibles is necessary in order to give a satisfactory
formalization of updateless decision theory.
4 Conclusion
The goal of answering all these questions is not to identify practical algorithms,
directly. Rather, the goal is to ensure that the problem of decision-making is well
understood: without a formal description of what is meant by “good decision,” it
is very difficult to justify high confidence in a practical heuristic that is intended
to make good decisions.
It currently looks like specifying an idealized decision theory requires for-
malizing some method for evaluating counterpossibles, but this problem is a
difficult one, and counterpossible reasoning is an open philosophical problem.
While these problems have remained open for some time, our examination in
the light of decision-theory, with a focus on concrete algorithms, has led to some
new ideas. We are optimistic that further decision theory research could lead
to significant progress toward understanding the problem of idealized decision-
making.
6
A logically uncertain reasoner can know both the laws of logic and the source code
of a program without knowing what the program outputs.
Two Attempts to Formalize Counterpossible Reasoning 165
References
1. Bar-Hillel, M., Margalit, A.: Newcomb’s paradox revisited. British Journal for the
Philosophy of Science 23(4), 295–304 (1972). http://www.jstor.org/stable/686730
2. Benson-Tilsen, T.: UDT with known search order. Tech. Rep. 2014–4, Machine
Intelligence Research Institute (2014).
http://intelligence.org/files/UDTSearchOrder.pdf
3. Cohen, D.: On what cannot be. In: Dunn, J., Gupta, A. (eds.) Truth or Conse-
quences, pp. 123–132. Kluwer (1990)
4. Dai, W.: Towards a new decision theory. Less Wrong (2009). http://lesswrong.
com/lw/15m/towards a new decision theory/
5. Gibbard, A., Harper, W.L.: Counterfactuals and two kinds of expected utility. In:
Hooker, C.A., Leach, J.J., McClennen, E.F. (eds.) Foundations and Applications
of Decision Theory, The Western Ontario Series in Philosophy of Science, vol. 13a.
D. Reidel (1978)
6. Jeffrey, R.C.: The Logic of Decision, 2 edn. Chicago University Press (1983)
7. Lehmann, E.L.: Some principles of the theory of testing hypotheses. Annals of
Mathematical Statistics 21(1), 1–26 (1950)
8. Lewis, D.: Prisoners’ dilemma is a Newcomb problem. Philosophy & Public Affairs
8(3), 235–240 (1979). http://www.jstor.org/stable/2265034
9. Lewis, D.: Causal decision theory. Australasian Journal of Philosophy 59(1), 5–30
(1981)
10. Lewis, D.: Why ain’cha rich? Noûs 15(3), 377–380 (1981).
http://www.jstor.org/stable/2215439
11. Nozick, R.: Newcomb’s problem and two principles of choice. In: Rescher, N. (ed.)
Essays in Honor of Carl G. Hempel, pp. 114–146. No. 24 in Synthese Library, D.
Reidel (1969)
12. Pearl, J.: Causality, 1 edn. Cambridge University Press (2000)
13. Von Neumann, J., Morgenstern, O.: Theory of Games and Economic Behavior, 1
edn. Princeton University Press (1944)
14. Yudkowsky, E.: Timeless decision theory. Tech. rep., The Singularity Institute, San
Francisco, CA (2010). http://intelligence.org/files/TDT.pdf
Bounded Cognitive Resources
and Arbitrary Domains
Abstract. When Alice in Wonderland fell down the rabbit hole, she
entered a world that was completely new to her. She gradually explored
that world by observing, learning, and reasoning. This paper presents
a simple system Alice in Wonderland that operates analogously. We
model Alice’s Wonderland via a general notion of domain and Alice her-
self with a computational model including an evolving belief set along
with mechanisms for observing, learning, and reasoning. The system
operates autonomously, learning from arbitrary streams of facts from
symbolic domains such as English grammar, propositional logic, and
simple arithmetic. The main conclusion of the paper is that bounded
cognitive resources can be exploited systematically in artificial general
intelligence for constructing general systems that tackle the combinato-
rial explosion problem and operate in arbitrary symbolic domains.
1 Introduction
environments and survive: e.g., when an ocean wave hits the shore and so forms
a new ecosystem in a rock pool. Also crustaceans are capable of learning [18].
Artificial systems have not reached the same level of flexibility. No robots can
come to new environments – say, private homes – and do the laundry, wash the
dishes, clean up, make coffee. No robots can go to high school and learn natural
languages, mathematics, and logic so as to match an average school child.
One strategy for making artificial systems more flexible is to simulate human
cognition. Such an approach has been taken by Soar [8], ACT-R [1], NARS [17],
MicroPsi [2], OpenCog [4], and Sigma [11]. Turing proposed building artificial
systems that simulate children’s cognitive development [15]. Piaget writes that
children adapt to new information in one of two ways: by assimilation, in which
new information fits into existing knowledge structures; and accommodation, in
which new information causes new knowledge structures to form or old ones to
be modified [9].
This paper presents the system Alice in Wonderland, which is able to
operate autonomously across arbitrary symbolic and crisp domains. As the name
suggests, we take the Alice in Wonderland story as inspiration, modeling Won-
derland via a general notion of domain and Alice herself with a computational
model including an evolving belief set along with mechanisms for observing,
learning, and reasoning. The system functions with or without human interven-
tion, developing intelligence on the basis of random streams of facts taken from
arbitrary domains. The computational complexity of the system is restricted by
using a simple cognitive model with bounded cognitive resources.
The Alice in Wonderland system builds on theory borrowed from devel-
opmental psychology [16], along with bounded rationality [13], belief revision
[6], and inductive program synthesis [7,12]. Popper [10, p.261] provides a key
inspiration:
The growth of our knowledge is the result of a process closely resembling
what Darwin called ’natural selection’; that is, the natural selection of
hypotheses: our knowledge consists, at every moment, of those hypothe-
ses which have shown their (comparative) fitness by surviving so far in
their struggle for existence, a competitive struggle which eliminates those
hypotheses which are unfit.
The present paper improves on our previous work [14], since the system can
learn from arbitrary streams of observations and not only when being spoon-fed
with carefully selected examples. Sections 2–6 describe how Alice observes, rep-
resents knowledge, reasons, learns, and answers questions, respectively. Section 7
presents results and Section 8 offers some conclusions.
In the present context domains model Wonderland; streams model Alice’s obser-
vations of Wonderland.
Example 7. The rules x*0 0 and x∨ are pure, whereas 0 x*0 and
x∨ are not.
If c is a context and t is a term, then c(t) is the result of replacing the unique
occurrence of in c by t. If c =, then c(t) = t.
(2+4)*(6+1)
2+4 6 0*12 x*y y*x
6*(6+1)
6+1 7 12*0
x*0 0
6*7
6*7 42 0
42
Fig. 1. These computations can be interpreted as arithmetic computations with rules
that preserve equality
(p → q)∨p
x → y ¬x ∨ y
(¬ p∨q)∨p
x∨y y∨x
(q∨¬ p)∨p black(Hugin)
(x∨y)∨z x∨(y∨z) black(x) raven(x)
q∨(¬ p∨p) raven(Hugin)
¬x ∨ x
q∨
x∨
Fig. 2. These computations can be interpreted as logic computations with rules that
preserve or increase logical strength: i.e., goal-driven proofs. To prove (p → q)∨p, it
is sufficient to prove ; to prove black(Hugin), it is sufficient to prove raven(Hugin).
Stella plays
plays crawls
Stella crawls
Stella Alice
Alice crawls
Alice crawls OK
OK
rev([6,7])
rev(x:xs) rev(xs) ++ [x]
rev([7]) ++ [6]
rev(x:xs) rev(xs) ++ [x]
(rev([]) ++ [7]) ++ [6]
rev([]) []
([] ++ [7]) ++ [6]
[] ++ xs xs
[7] ++ [6]
[x] ++ xs x:xs
[7,6]
Fig. 4. This computation can be interpreted as a Haskell computation with rules that
preserve equality. The derivation shows how the function rev reverses the list [6,7].
Proof. All theories are finite by definition. Given the purity condition on rules,
only finitely many bounded T -computations beginning with any given closed
term t are possible. The bounded T -computations starting with t form a finitely
branching tree where each branch has a maximum length. Hence, the tree is
finite and ∗T is decidable.
If card(Tn ) = AliceLT M , then no new rule can be added to Tn+1 until some old
rule has been removed. Preference orders are used for determining which rules
should be added to or removed from Tn . Update mechanisms are invoked as
described in Table 1. Table 2 gives an example of a learning process.
Fn = t t Fn = t t
t ∗Tn
t Endogenic update Exogenic update
t ∗Tn t Exogenic update Endogenic update
7 Results
The Alice in Wonderland system consists of around 5,000 lines of Haskell
code, modeling Wonderland as unknown domain D and Alice’s belief set at time
n as theory Tn . Alice starts with theory T0 , which is empty by default. At any
time n, the system can be in learning or inquiry mode, as determined by the
human operator:
Learning Mode. Alice receives fact Fn ∈ D. Alice learns from Fn and updates
her theory to Tn+1 .
Inquiry Mode. Alice receives an open or closed problem, Pn . Alice outputs a
solution to Pn or reports failure and makes Tn+1 = Tn .
In learning mode, Fn could come from any source: e.g. sensors, text file, or
human entry. For purposes of illustration, predefined streams can be chosen
from a dropdown menu. In inquiry mode, Pn is entered by the human operator.
174 A.R. Nizamani et al.
Fig. 5. Screenshot of Alice in Wonderland. The system has been running in learn-
ing mode, processing 599 arithmetic facts in the course of approximately 20 minutes
(assuming a standard laptop). From left, the first and second panels represent Alice’s
beliefs in the form of closed and open rules, respectively. Alice’s current theory consists
of 164 closed rules, including 8*7 = 56, and 9 open rules, including 0*x = 0. The third
panel shows the fact stream, the fourth panel solutions to problems entered by the
human operator.
Bounded Cognitive Resources and Arbitrary Domains 175
8 Conclusions
We have described the system Alice in Wonderland, which does autonomous
learning and problem-solving in arbitrary symbolic domains. A key component
is a simple cognitive model that reduces the computational complexity from
undecidable to finite. In this way, we tackle the combinatorial explosion problem
that arises in e.g. inductive logic programming, automatic theorem proving, and
grammar learning. Our results show that the system is able to learn multiple
domains from random streams of facts and also challenge human problem-solving
in some cases. Thus bounded cognitive resources were exploited for constructing
a general system that tackles the combinatorial explosion problem and operates
in arbitrary symbolic domains.
References
1. Anderson, J.R., Lebiere, C.: The atomic components of thought. Lawrence Erl-
baum, Mahwah, N.J. (1998)
2. Bach, J.: MicroPsi 2: the next generation of the MicroPsi framework. In: Bach,
J., Goertzel, B., Iklé, M. (eds.) AGI 2012. LNCS, vol. 7716, pp. 11–20. Springer,
Heidelberg (2012)
3. Clarke, D., Whitney, H., Sutton, G., Robert, D.: Detection and learning of floral
electric fields by bumblebees. Science 340(6128), 66–69 (2013)
4. Goertzel, B., Pennachin, C., Geisweiller, N.: The OpenCog framework. In: Engi-
neering General Intelligence, Part 2, pp. 3–29. Springer (2014)
5. Gould, J.L., Gould, C.G., et al.: The honey bee. Scientific American Library (1988)
6. Hansson, S.O., Fermé, E.L., Cantwell, J., Falappa, M.A.: Credibility Limited Revi-
sion. The Journal of Symbolic Logic 66(04), 1581–1596 (2001)
7. Kitzelmann, E.: Inductive Programming: A Survey of Program Synthesis Tech-
niques. In: Schmid, U., Kitzelmann, E., Plasmeijer, R. (eds.) AAIP 2009. LNCS,
vol. 5812, pp. 50–73. Springer, Heidelberg (2010)
8. Laird, J.E., Newell, A., Rosenbloom, P.S.: Soar: An Architecture for General Intel-
ligence. Artificial Intelligence 33(3), 1–64 (1987)
176 A.R. Nizamani et al.
9. Piaget, J.: La construction du réel chez l’enfant. Delachaux & Niestlé (1937)
10. Popper, K.R.: Objective knowledge: An evolutionary approach. Clarendon Press,
Oxford (1972)
11. Rosenbloom, P.S.: The Sigma Cognitive Architecture and System. AISB Quarterly
136, 4–13 (2013)
12. Schmid, U., Kitzelmann, E.: Inductive rule learning on the knowledge level. Cog-
nitive Systems Research 12(3), 237–248 (2011)
13. Simon, H.A.: Models of Bounded Rationality: Empirically Grounded Economic
Reason, vol. 3. MIT press (1982)
14. Strannegård, C., Nizamani, A.R., Persson, U.: A General system for learning and
reasoning in symbolic domains. In: Goertzel, B., Orseau, L., Snaider, J. (eds.) AGI
2014. LNCS, vol. 8598, pp. 174–185. Springer, Heidelberg (2014)
15. Turing, A.: Computing machinery and intelligence. Mind 59(236), 433–460 (1950)
16. Von Glasersfeld, E.: Radical Constructivism: A Way of Knowing and Learning.
Studies in Mathematics Education Series: 6. ERIC (1995)
17. Wang, P.: From NARS to a thinking machine. In: Proceedings of the 2007 Confer-
ence on Artificial General Intelligence. pp. 75–93. IOS Press, Amsterdam (2007)
18. Wiese, K.: The Crustacean Nervous System. Springer (2002)
Using Localization and Factorization to Reduce
the Complexity of Reinforcement Learning
1 Introduction
We presented bounds that are linear in the number of laws instead of the number
of environments. All deterministic environment classes are trivially generated by
sets of laws that equal the environments but some can also be generated by
exponentially fewer laws than there are environments.
We here expand the formal analysis of optimistic agents with hypothesis
classes based on laws, from the deterministic to the stochastic case and we further
consider fruitful combinations of those two basic cases.
Outline. Section 2 provides background on general reinforcement learning
agents. Section 3 introduces the concept of environments generated by laws and
extends previous concepts and results from the determinstic to the stochastic
case as well as to the mixed setting. Section 4 concludes.
2 Background
We define the a priori environment ξ by letting ξ(·) = wν ν(·) and the AIXI
agent is defined by following the policy
which is its general form. Sometimes AIXI refers to the case of a certain universal
class and a Solomonoff style prior [2]. The above agent, and only agents of that
form, satisfies the strict rationality axioms presented first in [7] while the slightly
looser version we presented in [9] enables optimism. The optimist chooses its next
action based on
π ◦ := arg max max Vξπ (2)
π ξ∈Ξ
for a set of environments (beliefs) Ξ which we in the rest of the article will
assume to be finite, though results can be extended further [11]. We will rely on
an agent framework presented in [11].
and
∀h∀a∀j ∈ {1, ..., m} τ (h, a, o) ∈ {0, 1},
o∈O⊥ :oj =⊥
i.e. the marginal probability of the “no prediction” symbol ⊥ always equals zero
or one. We will use the notation τ (o|h, a) := τ (h, a, o).
we say that τ does not make a prediction for j given h, a and write τ (h, a)j = ⊥.
Otherwise, i.e. when
τ (h, a, o) = 1,
o∈O⊥ :oj =⊥
we say that τ does make a prediction for j given h, a and write τ (h, a)j = ⊥.
Proof. Any ν ∈ Ξ(T ) is such that ν(·) ≥ cμ(·) where c is the smallest constant
such that all the laws in T are dominant with that constant. For each law τ ∈ T
pick an environment ν ∈ Ξ(T ) such that τ is a restriction of ν, i.e. ν predicts
according to τ whenever τ predicts something. We use the notation ντ for the
environment chosen for τ . The Blackwell-Dubins Theorem says that ντ merges
with μ almost surely under the policy followed (but not necessarily off that
policy) and therefore τ merges with μ, i.e. with the restriction of μ to what τ
makes predictions for, under the followed policy. Given ε > 0, let T be such that
and applying this to νht proves that |Vμπ̃ (ht ) − Vμ∗ (ht )| < ε ∀t ≥ T by Lemma 1
in [9]. Since there is, almost surely, such a T for every ε > 0 the claim is proved.
The analysis closely follows the structure learning case in [1] where it relies on a
more general theorem for predictions based on k possible algorithms. The main
difference is that that they could do this per feature which we cannot since
we are in a much more general setting where a law sometimes makes a predic-
tion for a feature and sometimes not. One can have at most mk 2 disagreements
(actually slightly fewer) where k is the number of laws. It is possible that this
square dependence can be improved to linear, but it is already an exponential
improvement for many cases compared to a linear dependence on the number
of environments. There can only be errors when there is sufficient disagreement.
The above argument works under a coherence assumption and for γ = 0 while
for γ > 0 there are horizon effects that adds extra technical difficulty to proving
optimal bounds avoiding losing a factor 1/(1 − γ). [5] shows how such complica-
tions can be dealt with.
Having a Background Environment. The earlier deterministic results
demanded that the set of laws in the class is rich enough to combine into com-
plete environments and in particular to the true one. This might require such a
large class of laws that the linear dependence on the number of laws in the error
bound, though much better than depending on the number of environments,
still is large. The problem is simplified if the agent has access to a background
environment, which is here something that given previous history and the next
features predicted by laws, assigns probabilities for the rest of the feature vector.
A further purpose for this section is to prepare for classes with a mix of deter-
ministic laws and stochastic laws. In this case the stochastic laws learn what we
in this section call a background environment. Computer games provide a simple
example where it is typically clear that we have a background and then objects.
If the agent has already learnt a model of the background, then what remains
is only the subproblem of finding laws related to how objects behave and affect
the environment. As an alternative, we might not be able to deterministically
predict the objects but we can learn a cruder probabilistic model for them and
this is background that completes the deterministic world model the agent learns
for the rest.
Example 3 (Semi-deterministic environment). Consider a binary vector of length
m where some elements are fixed and some fluctuate randomly with probabil-
ity 1/2. Consider the background environment where all coefficients are Bernoulli
processes with probability 1/2 and consider the 2m laws that each always makes
a deterministic prediction for one coefficient and it is fixed. The laws that make a
prediction for a fluctuating coefficient will quickly get excluded and then the agent
will have learnt the environment.
Definition 7 (Predicted and not predicted features). Given a set of deter-
ministic laws T , let
be the features T cannot predict and q2 (h, a, T ) := {1, ..., m} \ q1 (h, a, T ) the
predicted features.
184 P. Sunehag et al.
Since we are now working with sets of laws that are not complete, subsets
can also not be complete, but they can be maximal in the sense that they predict
all that any law in the full set predicts.
If
∀h, a∀j ∈ q2 (h, a, T )∀τ, τ̃ ∈ T̃ τ̃ (h, a)j ∈ {⊥, τ (h, a)j }
we say that T̃ is coherent.
and
ν x | h, a, x|q2 (h,a,T ) = ν(h, a)|q2 (h,a,T ) = P x | x|q2 (h,a,T ) = ν(h, a)q2 (h,a,T ) .
The last expression above says that the features not predicted by laws
(denoted by q1 ) are predicted by P where we condition on the predicted fea-
tures (denoted by q2 ).
The resulting error bound theorem has almost identical formulation as the
previous case (Theorem 1) and is true for exactly the same reasons. However,
the class M̄ contains stochasticity but of the predefined form.
4 Conclusions
We have further developed the theory of optimistic agents with hypothesis classes
defined by combining laws. Previous results were restricted to the deterministic
setting while stochastic environments are necessary for any hope of real appli-
cation. We here remedied this by introducing and studying stochastic laws and
environments generated by such.
References
1. Diuk, C., Li, L., Leffer, B.R.: The adaptive k-meteorologists problem and its appli-
cation to structure learning and feature selection in reinforcement learning. In:
Danyluk, A.P., Bottou, L., Littman, M.L. (eds.) ICML. ACM International Con-
ference Proceeding Series, vol. 382 (2009)
2. Hutter, M.: Universal Articial Intelligence: Sequential Decisions based on Algorith-
mic Probability. Springer, Berlin (2005)
3. Lattimore, T.: Theory of General Reinforcement Learning. Ph.D. thesis, Australian
National University (2014)
4. Lattimore, T., Hutter, M.: PAC bounds for discounted MDPs. In: Bshouty,
N.H., Stoltz, G., Vayatis, N., Zeugmann, T. (eds.) ALT 2012. LNCS, vol. 7568,
pp. 320–334. Springer, Heidelberg (2012)
5. Lattimore, T., Hutter, M., Sunehag, P.: The sample-complexity of general rein-
forcement learning. Journal of Machine Learning Research, W&CP: ICML 28(3),
28–36 (2013)
6. Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach, 3rd edn. Pren-
tice Hall, Englewood Clifs (2010)
7. Sunehag, P., Hutter, M.: Axioms for rational reinforcement learning. In: Kivinen,
J., Szepesvári, C., Ukkonen, E., Zeugmann, T. (eds.) ALT 2011. LNCS, vol. 6925,
pp. 338–352. Springer, Heidelberg (2011)
8. Sunehag, P., Hutter, M.: Optimistic agents are asymptotically optimal. In:
Thielscher, M., Zhang, D. (eds.) AI 2012. LNCS, vol. 7691, pp. 15–26. Springer,
Heidelberg (2012)
9. Sunehag, P., Hutter, M.: Optimistic AIXI. In: Bach, J., Goertzel, B., Iklé, M. (eds.)
AGI 2012. LNCS, vol. 7716, pp. 312–321. Springer, Heidelberg (2012)
10. Sunehag, P., Hutter, M.: Learning agents with evolving hypothesis classes. In:
Kühnberger, K.-U., Rudolph, S., Wang, P. (eds.) AGI 2013. LNCS, vol. 7999, pp.
150–159. Springer, Heidelberg (2013)
11. Sunehag, P., Hutter, M.: A dual process theory of optimistic cognition. In: Annual
Conference of the Cognitive Science Society, CogSci 2014 (2014)
12. Sunehag, P., Hutter, M.: Rationality, Optimism and Guarantees in General
Reinforcement Learning. Journal of Machine Learning Reserch (to appear, 2015)
13. Veness, J., Ng, K.S., Hutter, M., Uther, W., Silver, D.: A Monte-Carlo AIXI
approximation. Journal of Artifiicial Intelligence Research 40(1), 95–142 (2011)
14. Willems, F., Shtarkov, Y., Tjalkens, T.: The context tree weighting method: Basic
properties. IEEE Transactions on Information Theory 41, 653–664 (1995)
Towards Flexible Task Environments
for Comprehensive Evaluation of Artificial
Intelligent Systems and Automatic Learners
1 Introduction
Although many AI evaluation frameworks exist [13], none address all of the
above concerns. In Sect. 3 we attempt to collect in one place the full set of
requirements that such a comprehensive framework should address, and present
some preliminary ideas on how this could be realized in Sect. 4.
2 Related Work
In a comprehensive and recent survey, Hernández-Orallo argued that the assess-
ment of general “real” intelligence – as opposed to specialized performance –
should be oriented towards the testing a range of cognitive abilities that enable
a system to perform in a range of tasks [11]. One way to accomplish this is
to procedurally generate task-environments that require a suite of abilities, and
appropriately sample and weight them. Hernández-Orallo takes this approach,
but focuses on discrete and deterministic task-environments [10,12]. Legg &
Veness’s Algorithmic IQ approach posits a similar framework for measuring uni-
versal AI with respect to some reference machine which interprets a description
language to run the environment [14]. The choice of this description language
remains a major issue and deeply affects the kinds of environments that are
more likely to be generated. The BF programming language used in their work
closely resembles the operations of a Turing machine, but cannot easily gener-
ate complex structured environments and is opaque to analysis. A wide range
of description languages has been proposed for coordination and planning tasks
(e.g. TÆMS [7] and PDDL [17]), but tend to focus on static, observable domains
and specify things in terms of agent actions and task hierarchies which can then
drive the development of AI systems specialized to the specified task.
Games have long been considered a possible testbed for the evaluation of
intelligence [20]. In the General Game Playing competition, AI systems play pre-
viously unseen games after being provided with the rules in the very analyzable
Game Description Language, but the games must be finite and synchronous [16].
More recently, there has been a lot of interest in the automatic play of Atari-era
video games. An extensible, user friendly description language for such games
has been proposed that relies heavily on opaque built-in functions and should
in the future be amenable to procedural generation [8,19]. Much work has been
done on procedural generation in specific video games, but more general work is
still in its infancy [21]. Lim & Harrell were able to automatically generate vari-
ants for video games written using the PuzzleScript description language, but
the simulation-based approach they used for the evaluation of candidate rulesets
is not feasible for very difficult games since it requires an agent that is intelligent
enough to perform the task [15].
Some research has tried to relate problem structure to heuristic search algo-
rithm performance, including efforts to use a wide variety of problem types to
increase the generality of algorithms [2,5]. Some of this work, notably that on
hyperheuristics [6], has focused on algorithms that try to learn general search
strategies and don’t only perform well on a few specific problem types. Under-
standing the impact of problem characteristics on learning has been key in these
efforts, but so far only search and optimization domains have been addressed.
190 K.R. Thórisson et al.
Similar work has been done in the field of generating random Markov Deci-
sion Problems (MDPs) [1,3], focusing on a rather limited domain of potential
task-environments. Our own Merlin tool [9] supports various methods for the
procedural generation of discrete and continuous multi-objective MDPs, but does
not adequately address the full set of requirements below.
base operators); few atomic units – as opposed to multiple types – means greater
transparency since superstructures can be inspected more easily than larger black
boxes can, facilitating comparison between task-environments. This also lays the
foundation for smooth, incremental increase in complexity, as each addition or
change can be as small as the smallest blocks. Sect. 4.1 gives an example of what
this might look like. On top of this methods for modification (Sect. 4.2), analysis
(Sect. 4.3), construction (Sect. 4.4), and execution can be developed.
Fig. 1a shows a description of an extremely simple task where the agent must
reach a goal position in a 2-dimensional space. We describe a task-environment
by a set of (time-) dependent variables with causal relations. The Initialization
section provides the initial values for the variables. In our case these are goal
and agent position. The Dynamics section defines how variables change over
time by allowing us to refer to the past variable values using time arguments
and the reserved variables t (for the current time) and dt for the size of the
(arbitrarily) smallest atomic time step. Unlike other languages in Sect. 2 we
allow the specification of arbitrary expressions.
1. Initialization:
2. gx = 3 // goal x
3. gy = 3 // goal y
4. ax = 4 // agent x
5. ay = 10 // agent y
6. Dynamics:
7. dx ( t ) = 0 // step x
8. dy ( t ) = 0 // step y
9. ax ( t ) = ax (t - dt ) + dx ( t )
10. ay ( t ) = ay (t - dt ) + dy ( t )
11. at ( t ) = ax ( t ) == gx ( t ) &&
ay ( t ) == gy ( t )
12. reward ( t ) = 10 if at ( t ) else -1
13. Terminals:
14. reward ( t ) > 0
15. Rewards:
16. reward ( t )
17. Observations:
18. ax ( t ) , ay ( t ) , gx ( t ) , gy ( t )
19. Controls:
20. dx ( t ) = [ -1 , 0 , 1]
21. dy ( t ) = [ -1 , 0 , 1]
(a) (b)
Fig. 1. Example description (a) and extracted graph (b) for a task where the agent
must reach a goal location, including causal connections with latency
to variables in the current time step. The arithmetic and comparison operations
make up the (extensible) set of base operators which are not further defined.
While the Initialization and Dynamics sections mostly describe the environ-
ment, the Terminals and Rewards sections can be said to describe the task in
terms of environment variables. They consist of zero or more lines which each
specify an expression that evaluates to a terminal (Boolean) that ends the task
or a reward (a number). Like everything else, rewards can depend on time and
other variables, which allows tasks to have e.g. time pressure, deadlines, start
times, and complex preconditions and interactions – perhaps even modulating
other dependencies such as time.
Finally, the sections for Observations and Controls describe how the agent
interacts with the environment. Observations consist of a set of sensations that
occur simultaneously and are described on their own line with a comma-separated
list of expressions. Controls are described as assignments of a collection of accept-
able values to environment variables whose value is overwritten when specified.
Non-deterministic responses of an agent’s body can be modeled by making the
causal connections following the controls more complex.
Making the space continuous in this way requires relatively significant changes.
It is much easier to go from this continuous representation to one that appears
discrete to the agent by discretizing controls and sensations (e.g. by rounding to
the nearest integer). In the new lines 7 and 8 we have started making the envi-
ronment (more) continuous in time as well. dt would ideally be infinitesimal
for continuous environments, but a small value will have to suffice in practice.
To make the task more dynamic and periodic we can have the goal move
a little. We replace the initialization of gx with gx(t) = 4+3*sin(t ) and move
it to the Dynamics section. The environment can easily be made stochastic by
the use of random number generators that are provided as base operations.
We can further decrease observability by adding delays into the causal
chain and changing refresh rates. For example, to let observations of ax and ay
occur with a delay of one time step and allow observation of the goal position
only at time steps 1, 3, 5, . . .:
17. ax(t-dt), ay(t-dt) 18. gx, gy @ [1:2:]
4.3 Analysis
References
1. Archibald, T.W., McKinnon, K.I.M., Thomas, L.C.: On the generation of Markov
decision processes. J. Oper. Res. Soc. 46, 354–361 (1995)
2. Asta, S., Özcan, E., Parkes, A.J.: Batched mode hyper-heuristics. In: Nicosia, G.,
Pardalos, P. (eds.) LION 7. LNCS, vol. 7997, pp. 404–409. Springer, Heidelberg
(2013)
3. Bhatnagar, S., Sutton, R.S., Ghavamzadeh, M., Lee, M.: Natural actor-critic algo-
rithms. Automatica 45(11), 2471–2482 (2009)
4. Bieger, J., Thórisson, K.R., Garrett, D.: Raising AI: tutoring matters. In: Goertzel,
B., Orseau, L., Snaider, J. (eds.) AGI 2014. LNCS, vol. 8598, pp. 1–10. Springer,
Heidelberg (2014)
196 K.R. Thórisson et al.
5. Bischl, B., Mersmann, O., Trautmann, H., Preuß, M.: Algorithm selection based
on exploratory landscape analysis and cost-sensitive learning. In: Proceedings of
the 14th Annual Conference on Genetic and Evolutionary Computation, GECCO
2012, pp. 313–320. ACM, New York (2012)
6. Burke, E.K., Gendreau, M., Hyde, M., Kendall, G., Ochoa, G., Özcan, E., Qu,
R.: Hyper-heuristics: A survey of the state of the art. J. Oper. Res. Soc. 64(12),
1695–1724 (2013)
7. Decker, K.: TAEMS: A framework for environment centered analysis & design of
coordination mechanisms. In: O’Hare, G.M.P., Jennings, N.R. (eds.) Foundations
of Distributed Artificial Intelligence, pp. 429–448. Wiley Inter-Science (1996)
8. Ebner, M., Levine, J., Lucas, S.M., Schaul, T., Thompson, T., Togelius, J.: Towards
a video game description language. In: Lucas, S.M., Mateas, M., Preuss, M.,
Spronck, P., Togelius, J. (eds.) Artificial and Computational Intelligence in Games.
Dagstuhl Follow-Ups, vol. 6, pp. 85–100. Schloss Dagstuhl (2013)
9. Garrett, D., Bieger, J., Thórisson, K.R.: Tunable and generic problem instance gen-
eration for multi-objective reinforcement learning. In: ADPRL 2014. IEEE (2014)
10. Hernández-Orallo, J.: A (hopefully) non-biased universal environment class for
measuring intelligence of biological and artificial systems. In: Baum, E., Hutter,
M., Kitzelmann, E. (eds.) AGI 2010, pp. 182–183. Atlantis Press (2010)
11. Hernández-Orallo, J.: AI Evaluation: past, present and future (2014).
arXiv:1408.6908
12. Hernández-Orallo, J., Dowe, D.L.: Measuring universal intelligence: Towards an
anytime intelligence test. Artif. Intell. 174(18), 1508–1539 (2010)
13. Legg, S., Hutter, M.: Tests of Machine Intelligence [cs] (December 2007).
arXiv:0712.3825
14. Legg, S., Veness, J.: An approximation of the universal intelligence measure. In:
Dowe, D.L. (ed.) Solomonoff Festschrift. LNCS(LNAI), vol. 7070, pp. 236–249.
Springer, Heidelberg (2013)
15. Lim, C.U., Harrell, D.F.: An approach to general videogame evaluation and auto-
matic generation using a description language. In: CIG 2014. IEEE (2014)
16. Love, N., Hinrichs, T., Haley, D., Schkufza, E., Genesereth, M.: General game
playing: Game description language specification. Tech. Rep. LG-2006-01, Stanford
Logic Group (2008)
17. McDermott, D., Ghallab, M., Howe, A., Knoblock, C., Ram, A., Veloso, M., Weld,
D., Wilkins, D.: PDDL-The Planning Domain Definition Language. Tech. Rep.
TR-98-003, Yale Center for Computational Vision and Control (1998). http://
www.cs.yale.edu/homes/dvm/
18. Rohrer, B.: Accelerating progress in Artificial General Intelligence: Choosing a
benchmark for natural world interaction. J. Art. Gen. Int. 2(1), 1–28 (2010)
19. Schaul, T.: A video game description language for model-based or interactive learn-
ing. In: CIG 2013, pp. 1–8. IEEE (2013)
20. Schaul, T., Togelius, J., Schmidhuber, J.: Measuring intelligence through games
(2011). arXiv preprint arXiv:1109.1314
21. Togelius, J., Champandard, A.J., Lanzi, P.L., Mateas, M., Paiva, A., Preuss, M.,
Stanley, K.O.: Procedural content generation: Goals, challenges and actionable
steps. In: Lucas, S.M., Mateas, M., Preuss, M., Spronck, P., Togelius, J. (eds.)
Artificial and Computational Intelligence in Games. Dagstuhl Follow-Ups, vol. 6,
pp. 61–75. Schloss Dagstuhl (2013)
22. Turing, A.M.: Computing machinery and intelligence. Mind 59(236), 433–460
(1950)
Assumptions of Decision-Making Models in AGI
1 Formalizing Decision-Making
An AGI system needs to make decisions from time to time. To achieve its goals,
the system must execute certain operations, which are chosen from all possi-
ble operations, according to the system’s beliefs on the relations between the
operations and the goals, as well as their applicability to the current situation.
On this topic, the dominating normative model is decision theory [3,12].
According to this model, “decision making” means to choose one action from a
finite set of actions that is applicable at the current state. Each action leads to
some consequent states according to a probability distribution, and each conse-
quent state is associated with a utility value. The rational choice is the action
that has the maximum expected utility (MEU).
When the decision extends from single actions to action sequences, it is often
formalized as a Markov decision process (MDP), where the utility function is
replaced by a reward value at each state, and the optimal policy, as a collection
of decisions, is the one that achieves the maximum expected total reward (usually
with a discount for future rewards) in the process. In AI, the best-known app-
roach toward solving this problem is reinforcement learning [4,16], which uses
various algorithms to approach the optimal policy.
Decision theory and reinforcement learning have been widely considered as
setting the theoretical foundation of AI research [11], and the recent progress in
deep learning [9] is increasing the popularity of these models. In the current AGI
research, an influential model in this tradition is AIXI [2], in which reinforcement
learning is combined with Solomonoff induction [15] to provide the probability
values according to algorithmic complexity of the hypotheses used in prediction.
c Springer International Publishing Switzerland 2015
J. Bieger (Ed.): AGI 2015, LNAI 9205, pp. 197–207, 2015.
DOI: 10.1007/978-3-319-21365-1 21
198 P. Wang and P. Hammer
The assumption on task: The task of “decision making” is to select the best
action from all applicable actions at each state of the process.
The assumption on belief: The selection is based on the system’s beliefs
about the actions, represented as probability distributions among their con-
sequent states.
The assumption on desire: The selection is guided by the system’s desires
measured by a (utility or reward) value function defined on states, and the
best action is the one that with the maximum expectation.
The assumption on budget: The system can afford the computational
resources demanded by the selection algorithm.
There are many situations where the above assumptions can be reasonably
accepted, and the corresponding models have been successfully applied [9,11].
However, there are reasons to argue that artificial general intelligence (AGI) is
not such a field, and there are non-trivial issues on each of the four assumptions.
Issues on Task: For a general-purpose system, it is unrealistic to assume that
at any state all the applicable actions are explicitly listed. Actually, in human
decision making the evaluation-choice step is often far less significant than diag-
nosis or design [8]. Though in principle it is reasonable to assume the system’s
actions are recursively composed of a set of basic operations, decision makings
often do not happen at the level of basic operations, but at the level of composed
actions, where there are usually infinite possibilities. So decision making is often
not about selection, but selective composition.
Issues on Belief: For a given action, the system’s beliefs about its possible
consequences are not necessarily specified as a probability distribution among
following states. Actions often have unanticipated consequences, and even the
beliefs about the anticipated consequences usually do not fully specify a “state”
of the environment or the system itself. Furthermore, the system’s beliefs about
the consequences may be implicitly inconsistent, so does not correspond to a
probability distribution.
Issues on Desire: Since an AGI system typically has multiple goals with con-
flicting demands, usually no uniform value function can evaluate all actions with
respect to all goals within limited time. Furthermore, the goals in an AGI system
change over time, and it is unrealistic to expect such a function to be defined
on all future states. How desirable a situation is should be taken as part of the
problem to be solved, rather than as a given.
Issues on Budget: An AGI is often expected to handle unanticipated prob-
lems in real time with various time requirements. In such a situation, even if
Assumptions of Decision-Making Models in AGI 199
it into memory, the system may also use it to revise or update the pre-
vious beliefs on statement S, as well as to derive new conclusions using
various inference rules (including deduction, induction, abduction, analogy,
etc.). Each rule uses a truth-value function to calculate the truth-value of the
conclusion according to the evidence provided by the premises. For example,
the deduction rule can take P f1 , c1 and P ⇒ Q f2 , c2 to derive Qf, c,
where f, c is calculated from f1 , c1 and f2 , c2 by the truth-value func-
tion for deduction.1 There is also a revision rule that merges distinct bodies
of evidence on the same statement to produce more confident judgments.
Question. A question has the form of “S?”, and represents a request for the
system to find the truth-value of S according to its current beliefs. A question
may contain variables to be instantiated. Besides looking in the memory for
a matching belief, the system may also use the inference rules backwards
to generate derived questions, whose answers will lead to answers of the
original question. For example, from question Q? and belief P ⇒ Q f, c,
a new question P ? can be proposed by the deduction rule. When there are
multiple candidate answers, a choice rule is used to find the best answer
among them, based on truth-value, simplicity, and so on.
Goal. A goal has the form of “S!”. Similar to logic programming [5], in NARS
certain concepts are given a procedural interpretation, so a goal is taken
as a statement to be achieved, and an operation as a statement that can
be achieved by an executable routine. The processing of a goal also includes
backward inference guided by beliefs that generates derived goals. For exam-
ple, from goal Q! and belief P ⇒ Q f, c, a new goal P ! can be proposed
by the deduction rule. If the content of a goal corresponds to an executable
operation, the associated routine is invoked to directly realize the goal, like
what a Prolog built-in predicate does.
Under the restriction of the available knowledge and resources, no task can
be accomplished perfectly. Instead, what the system attempts is to accomplish
them as much as allowed by its available knowledge and resources. In NARS,
decision making is most directly related to the processing of goals, though the
other inference activities are also relevant.2
In Narsese, an operation is expressed by an operator (which identifies the
associated routine) with an argument list (which includes both input and out-
put arguments). The belief about the execution condition and consequence of an
operation is typically represented as “(condition, operation) ⇒ consequence”,
which is logically equivalent to “condition ⇒ (operation ⇒ consequence)”.3
This belief can be used in different ways. In an idealized situation (where the
1
Since P and Q can be events with an occurence time, the same rules can be used
for temporal reasoning, which is described in more detail in [21].
2
Different types of inference tasks may work together. For example, from important
judgments of low confidence, questions can be derived, and from certain questions,
goals can be derived, which if pursued give rise to curious and exploratory behaviors.
3
Like other beliefs, there is a truth-value attached, which is omitted here to simplify
the discussion.
Assumptions of Decision-Making Models in AGI 201
uncertainty of the belief and the existence of other beliefs and tasks are ignored),
if “condition” is true, the execution of “operation” will make “consequence” true
by forward inference; when “consequence!” is a goal, backward inference will
generate “condition!” as a derived goal. When the latter goal is satisfied (either
confirmed by a belief or achieved recursively by other operations), “operation!”
becomes another derived goal, which is directly achieved by invoking the asso-
ciated routine. Here the process looks similar to logic programming, though the
situation is more complicated, especially in backward inference.
As an open system working in real time, new tasks can come while the system
is still working on other goals, and there is no guarantee that all the co-existing
goals are consistent with each other in what they demand the system to do.
Even if all the innate and given goals are consistent, the derived ones may
not be, since they usually come as means to achieve certain goal in isolation,
without considering their impacts on the other goals. Even among goals that are
consistent with each other in content, they still compete for resources, especially
processing time. In NARS, to fully process a goal means to take all relevant
beliefs into consideration. Since the system’s capability is finite and the goals
all should be accomplished as soon as possible, it is usually impossible to fully
process all of them. Consequently, it becomes necessary to have preference among
goals to indicate their different significance to the system.
Instead of defining a separate measurement for preference, NARS takes the
“desire as belief” approach [10]. The desire-value of statement S is taken as the
truth-value of statement S ⇒ D, where D is a virtual statement representing
the “desired state” where all the system’s goals are satisfied. D is “virtual” in
the sense that its content is not explicitly spelled out, nor is it actually stored
in the system’s memory. It is only used in the conceptual design to turn the
processing of the desire-values into that of the related truth-values. While every
judgement has an assigned truth-value, every goal has an assigned desire-value.
Like a truth-value, the intuitive meaning of a desire-value can also be explained
using idealized situations. S has desire-value w+/w, w/(w + 1) if the system
believes that if S is realized, w+ of its w consequences are “good”, while the rest
of them are “bad”, with respect to the system’s goals. In this way, the system can
calculate the desire-value of a statement according to the desire-value of another
statement and the belief that linked them, using the truth-value functions of the
inference rules. For example, the desire-value of statement S1 , d1 , is interpreted
as the truth-value of statement S1 ⇒ D, so can be used with the truth-value
of belief S2 ⇒ S1 , t1 , by the deduction function to calculate d2 , the truth-value
of S2 ⇒ D, which is the desire-value of statement S2 . In this process the exact
content of D is irrelevant, as far as it is the same in its two usages. Even without
going into the details of the above calculation, it is easy to see that d2 depends
on both d1 and t1 . S2 is highly desired only when S1 is highly desired and the
implication relation S2 ⇒ S1 is strongly supported by available evidence.
Similarly, the revision rule can be used to merge conflicting desire-values. For
example, after a high desire-value of S2 is established by a goal S1 , another goal
S3 is taken into consideration, but the system believes that it can be realized
202 P. Wang and P. Hammer
only when S2 is not realized. By deduction again S2 will get another desire-value
d2 whose frequency value is low. Now the revision rule can combine d2 and d2
into d2 , as the desire-value of S2 when both goals S1 and S3 are taken into
account. In this case, whether S2 will still be treated as a goal depends on the
total evidence – if the frequency factor in d2 is too low, it will not be pursued
by the system, despite of the positive evidence from S1 . In this way, “decision
making” in NARS can be discussed in two senses:
given goal, the system may pursue zero, one, or multiple derived goals, and some
of the alternatives may be discovered or constructed in the process. Unlike in
the traditional models, in this approach there is no demand for an exhaustive
list of mutually exclusive actions to be available in advance for each decision.
The traditional decision-making process can still be carried out in NARS as
a special case. If all possible actions are listed, and only one of them can be
selected, then the evidence favoring one action will be taken as evidence against
the other actions. Consequently, the best action selected by the traditional model
will also be the one selected by the choice rule of NARS, and its selection will
block the others under the mutual-exclusion restriction.
In all the models the selection of action is based on the system’s relevant beliefs
about its preconditions and its effects. In the traditional models, these two
aspects are embedded in the states of the environment, while in NARS they
are expressed by statements. In general, a statement only partially specifies a
state.
Based on the assumption of insufficient knowledge, in NARS even if a belief
(condition, operation) ⇒ consequence has a relatively high frequency and confi-
dence, condition does not necessarily specifies the operation’s full preconditions,
nor consequence its full effects. This approach is taken, not because the “state-
based” approach is bad, but because it is unrealistic. Even POMDP (partially
observable Markov decision process) models are too idealized on this aspect,
where states still need to be estimated from observations, since the Markov prop-
erty is defined only in a state-based representation. There have been attempts in
reinforcement learning study to change the “flat” state space into a hierarchical
one. However, the current approaches all assume static abstractions, and how to
get dynamic abstractions is still acknowledged as an open problem [1]. For a gen-
eral purpose system, it is crucial to move between different levels of abstractions,
as well as to generate them at run time. A statement-based description satisfies
such a need. An AGI should be able to work in a non-stationary environment,
where the states of the environment never accurately repeat. In such a situation,
though it still makes sense to talk about “the state of the environment”, to use
them to specify an operation is not possible, because future states are usually
different from past ones. A statement, on the other hand, only captures certain
aspect of states, so can be repeatedly observed in experience. If a classifier is used
to merge similar states, then it actually turns the model into “statement-based”,
since here one “state” may correspond to different situations.
Another difference between NARS and the traditional models is that the
truth-value in NARS is not probability. This topic has been discussed in previous
publications [18,20], so only the major arguments are summarized:
4 Conclusion
– “Decision making” means to select the best action from all applicable actions.
– Beliefs on actions are expressed as probabilistic transitions among states.
– Desires are measured by a value function defined on states.
– The system can afford the resources demanded by the involved algorithms.
Though the traditional models can still be extended and revised, they cannot
drop all these fundamental assumptions without becoming fundamentally differ-
ent models.4 They should not be taken as idealized models to be approximated,
since these assumptions change the nature of the problem of decision making.
4
There is no space in this paper to discuss approaches where some of them are rejected.
206 P. Wang and P. Hammer
References
1. Barto, A.G., Mahadevan, S.: Recent advances in hierarchical reinforcement learn-
ing. Discrete Event Dynamic Systems 13, 41–77 (2003)
2. Hutter, M.: Universal Artificial Intelligence: Sequential Decisions based on Algo-
rithmic Probability. Springer, Berlin (2005)
3. Jeffrey, R.C.: The Logic of Decision. McGraw-Hill, New York (1965)
4. Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey.
Journal of Artificial Intelligence Research 4, 237–285 (1996)
5. Kowalski, R.: Logic for Problem Solving. North Holland, New York (1979)
6. Legg, S., Hutter, M.: Universal intelligence: a definition of machine intelligence.
Minds & Machines 17(4), 391–444 (2007)
7. Medin, D.L., Ross, B.H.: Cognitive Psychology. Harcourt Brace Jovanovich, Fort
Worth (1992)
8. Mintzberg, H., Raisinghani, D., Théorêt, A.: The structure of ‘unstructured’ deci-
sion processes. Administrative Sciences Quarterly 21, 246–275 (1976)
9. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G.,
Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C.,
Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis,
D.: Human-level control through deep reinforcement learning. Nature 518(7540),
529–533 (2015)
10. Price, H.: Defending desire-as-belief. Mind 98, 119–127 (1989)
11. Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 3rd edn. Pren-
tice Hall, Upper Saddle River (2010)
12. Savage, L.J.: The Foundations of Statistics. Wiley, New York (1954)
13. Simon, H.A.: Models of Man: Social and Rational. John Wiley, New York (1957)
14. Simon, H.A.: Motivational and emotional controls of cognition. Psychological
Review 74, 29–39 (1967)
15. Solomonoff, R.J.: A formal theory of inductive inference. Part I and II. Information
and Control 7(1-2), 1–22, 224–254 (1964)
16. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press,
Cambridge (1998)
17. Wang, P.: Non-Axiomatic Reasoning System: Exploring the Essence of Intelligence.
Ph.D. thesis, Indiana University (1995)
18. Wang, P.: Rigid Flexibility: The Logic of Intelligence. Springer, Dordrecht (2006)
5
For source code and working examples, visit https:// github.com/ opennars/
opennars.
Assumptions of Decision-Making Models in AGI 207
19. Wang, P.: Motivation management in agi systems. In: Bach, J., Goertzel, B., Iklé,
M. (eds.) AGI 2012. LNCS, vol. 7716, pp. 352–361. Springer, Heidelberg (2012)
20. Wang, P.: Non-Axiomatic Logic: A Model of Intelligent Reasoning. World Scien-
tific, Singapore (2013)
21. Wang, P., Hammer, P.: Issues in temporal and causal inference. In: Proceedings of
the Eighth Conference on Artificial General Intelligence (2015)
22. Weirich, P.: Realistic Decision Theory. Oxford University Press, New York (2004)
Issues in Temporal and Causal Inference
The formal definitions of the symbols used above are given in [32], and here
they only need to be intuitively understood. Also, for the current discussion, it
is enough to see the memory of NARS as a collection of interrelated concepts.
In this way, NARS uniformly represents all empirical knowledge as sentences
in a formal language, while still keeps the differences among types of knowledge.
This design is very different from the tradition of cognitive architectures, where
the common practice is to distinguish “semantic/declarative memory”, “episodic
memory”, and “procedural memory” from each other, and to handle them in
separate modules, each with its storage structure and processing mechanism
[6,14,17]. There have been other attempts to unify these memory modules, such
as in a graphical model [25], while NARS does it in a logical model that has some
similarity with logical programming [13], even though the memory of NARS can
also be roughly seen as a conceptual graph.
Since an event is just a statement whose truth-value is specified for a period,
the most straightforward representation of temporal information is to attach a
time interval to each event [1,18], or even to every statement, since accurately
speaking, every conceptual relation hold in an interval, including “forever” as a
special case. NARS does not take this approach, because in different situations
the accuracy in specifying the beginning and ending of an event varies greatly, so
to use a single unit of time by which all events are measured is probably neither
necessary nor possible for an AGI. To be natural and flexible, in NARS an event
can be seen as both a point and an interval in time, depending on the desired
granularity. This treatment is consistent with the opinion that “The unit of
composition of our perception of time is a duration” [15]. Therefore, the temporal
information of an event is specified relatively with respect to another event, using
one of the two built-in temporal relations: sequential and parallel (also known as
before-after and at-the-same-time), which correspond to the precedes and overlap
predicates in the Russell-Kamp construction [15].
As a reasoning system, NARS runs by repeating a working cycle, and in
each cycle the system carries out a step of inference, as well as some simple
input/output activities. Just like a biological system uses certain rhythmic event
as a “biological clock”, NARS uses its working cycles as an internal clock, since
each working cycle roughly takes a short constant amount of time. Using this
internal clock, NARS can express the durations of certain events. For example,
it can represent something like “Event A is observed, then, after 5 cycles, event
B is observed”, where the “5 cycles” is an event measurable by the system.
Beside current events, the system can make judgments about past and future
events, too. In NARS every sentence has a time-stamp indicating when the judg-
ment is created (either from input or from inference); if the sentence is about
an event, there is also a time-stamp about the estimated occurrence time. All
the time-stamps are in terms of the system’s internal clock, and each takes
an integer as value, which can be either positive or negative. This treatment
210 P. Wang and P. Hammer
has some similarity with “step-logic” [5], though in NARS a time-stamp is not
explicitly expressed as part of a statement. Unlike some cognitive architectures
[7,17], NARS does not attempt to simulate the response time of the human
brain. The system uses its (subjective) working cycle as the unit of time, not
the (objective) time provided by the clock of the host computer, so as to achieve
platform-independence in testing. For example, if a certain inference process
takes 10 steps in one computer, so does it in a different computer, even when
the two systems have different running speeds.
The internal clock and built-in temporal relations are preferred for their
simplicity and flexibility, but they are not used to represent all types of temporal
information. NARS can use an external clock by specifying an event as occurring
at the same moment as a time indicated by the clock. Since such a clock is an
optional tool, the system can use different clocks in different situations for various
demands of accuracy and granularity in time measurement.
In summary, in NARS temporal information is represented at three levels:
Term. A term (either atomic or compound) can represent a temporal concept
(such as “New Year’s Day”) or relation (such as “after a while”). Such a term
is handled just like the other terms, though its meaning contains acquired
temporal information.
Statement. A temporal statement can be formed using a built-in temporal
relation combined with certain logical connectors. For example, if A, B, and
C are events, then the Narsese statement “(A, B) /⇒ C” represents “If A is
followed by B, then C will occur after them”.
Sentence. A temporal sentence uses a time-stamp to indicate the estimated
occurrence time of the event, with respect to the internal clock of the system.
Since the internal clock is “private” to the system, when a temporal sentence
needs to be expressed in Narsese for communication purpose, its time-stamp is
converted into a “tense”, which has three possible values: “past”, “present”, and
“future”, with respect to the “current moment” when the message is created.
Symmetrically, when an input judgment has a tense attached, it is converted
into a time-stamp, according to the current time.
It is important to see that an AGI system like NARS should not directly carry
out inference on tense, because since the system works in real time, the “current
moment” changes constantly [5,12]. On this aspect, NARS is fundamentally
different from many traditional temporal logic systems [22,29], which treat the
tense of a statement as one of its intrinsic properties, as if the reasoning system
itself is outside the flow of time.
In summary, many different techniques have been proposed in AI to represent
temporal information, each of which is effective under different assumptions [2].
NARS uses three approaches, and integrates them to satisfy the need of AGI.
the frequency value is the proportion of positive evidence among all currently
available evidence, and the confidence value is the proportion of the currently
available evidence among all evidence accumulated after the coming of new evi-
dence by a unit amount. Their relation with probability is explained in [31].
Based on this semantics, each inference rule in NARS has a truth-value func-
tion calculating the truth-value of the conclusion according to the evidence pro-
vided by the premises. Without going into the details of the inference rules
(covered in [32] and other publications on NARS), for the current discussion it
is sufficient to know that as far as the confidence of the conclusion is concerned,
there are three types of inference rules:
Strong Inference. For example, from premises “{T weety} → bird 1.00, 0.90”
(“Tweety is a bird”) and “($x → bird) ⇒ ($x → [yellow]) 1.00, 0.90”
(“Birds are yellow”, where $x can be substituted by another term), the deduc-
tion rule derives the conclusion “{T weety} → [yellow] 1.00, 0.81” (“Tweety
is yellow”). Such a rule is “strong” because the confidence of its conclusion can
approach 1. If the truth-values are dropped and all the statements are taken
to be “true”, the rule is still valid in its binary form.
Weak Inference. For example, from premises “{T weety} → bird 1.00, 0.90”
and “{T weety} → [yellow] 1.00, 0.90”, the induction rule derives “($x →
bird) ⇒ ($x → [yellow]) 1.00, 0.45”; similarly, from “($x → bird) ⇒ ($x →
[yellow]) 1.00, 0.90” and “{T weety} → [yellow] 1.00, 0.90”, the abduction
rule derives “{T weety} → bird 1.00, 0.45”. Such a rule is “weak” because
the confidence of its conclusion cannot be higher than 0.5. If the truth-values
are dropped and all the statements are taken to be “true”, the rule becomes
invalid in its binary form.
Evidence pooling. If two premises have the same statement but are supported
by distinct evidence, such as “bird → [yellow] 1.00, 0.50” and “bird →
[yellow] 0.00, 0.80”, the revision rule derives “bird → [yellow] 0.20, 0.83”.
This is the only rule whose conclusion has a higher confidence value than
both premises, since here the premises are based on the distinct evidential
bases, while the conclusion is based on the pooled evidence.
There are many other inference rules in the system for other combinations of
premises with respect to various term connectors, and they will not be addressed
in this paper. In the following we only briefly describe how temporal inference
is carried out. Here the basic idea is to process temporal information and logical
information in parallel. Among other functions, this type of inference can carry
out a process that is similar to classical (Pavlovian) conditioning, by associating
a conditioned stimulus (CS) with an unconditioned stimulus (US). However what
is special in NARS is that temporal inference will also happen between neutral
stimuli. In the rare case that they get attention and also turn out to be important,
they will find relations which a classical conditioning model would have missed.
To show how it works, assume initially the system gets to know that an
occurrence of event C is followed by an occurrence of event U . As mentioned pre-
viously, events are represented as statements with temporal information. In this
212 P. Wang and P. Hammer
case, the occurrence time of C will be recognized by the system as before that of
U . As soon as the temporal succession between the two events is noticed by the
system, a temporal version of the induction rule will be invoked to generalize
the observation into a temporal implication “C /⇒ U ”. The truth-value of this
conclusion depends on the quality of the observations, as well as the restriction
applied by the induction rule, so that the confidence value of the conclusion will
be less than 0.5 – since it is only based on a single observation, the conclusion
is considered a “hypothesis” that differs from a “fact” in confidence.
If at a later time C occurs again, then from it and the previous hypothesis
the system derives U by deduction, with a time-stamp suggesting that it will
occur soon. Since the hypothesis has a low confidence, the prediction on U is
also tentative, though it may still be significant enough to raise the system’s
anticipation of the event, so as to make it more recognizable even when the input
signal is relatively weak or noisy. An anticipation-driven observation is “active”,
rather than “passive” (where the system simply accepts all incoming signals
without any bias), and the difference is not only in sensitivity. When expressed
as Narsese sentences, the inputs provided by a sensor normally correspond to
affirmative judgments, without any negative ones – we can directly see or hear
what is out there, but cannot directly see or hear what is not there. “Negative
observations” are actually unrealized anticipations and can only be produced by
active observations.
In the current example, if the anticipated U does not appear at the estimated
time, this unrealized anticipation and the preceding C will be taken as negative
evidence by the induction rule to generate a negative judgment “C /⇒ U ” that
has a low (near 0) frequency value. Then the revision rule can pool this one
with the previous (affirmative) one to get a new evaluation for the temporal
statement “C /⇒ U ”. In this way, the successes and failures of anticipation will
gradually lead the system to a relatively stable belief on whether, or how often,
U is followed by C. The conclusion is similar to a statistical one, though it is
revised incrementally, with no underlying probabilistic distribution assumed.
If the system has an unconditioned response (UR) to the US, this “instinct”
corresponds to a temporal implication “U /⇒ ⇑R” that represents a sufficient
precondition U for the operation ⇑R to be executed, and it will have an affir-
mative truth-value, such as 1.00, 0.99 (confidence cannot reach 1, even for an
instinct). From this instinct and the belief on “C /⇒ U ”, the deduction rule
generates “C /⇒ ⇑R”, which gives the operation an acquired sufficient precon-
dition, though with a lower confidence than the instinct at the beginning. Now
⇑R becomes a conditioned response (CR) to the CS.
Similarly, if the system already has a strong belief on “C /⇒ U ”, and it notices
an occurrence of U , then by temporal abduction the system will guess that C
has occurred previously, though the system may fail to notice it in the input
stream, or it may be not directly observable. Similar to inductive conclusions,
such an abductive conclusion is not very confident until it is strengthen by other
evidence. As proposed by C. S. Peirce [20], a major function of abduction is to
provide explanations for observations.
Issues in Temporal and Causal Inference 213
Most of the existing models of classical conditioning are built in the frame-
work of dynamic system [3,8,24,28], while in NARS it is modeled as an inference
process. Though Bayesian models [27] also treat conditioning as reasoning, there
the process only evaluates the probability of given statements, while NARS, fol-
lowing a logic, can generate new statements. Beside recognizing the preconditions
and consequences of single operations, temporal inference also allows the system
to do the same for compound operations consisting of multiple steps, which is
usually called “planning”, “scheduling”, or “skill learning” [15]. Typically, the
consequence of a preceding operation enables or triggers a following operation,
and such a compound operation as a whole will gradually be used as an individual
operation by the system. Such a process recursively forms an action hierarchy,
which allows efficient reaction and planning with different granularity. Unlike in
reinforcement learning or many other planning systems, NARS does not plan its
actions in all situations in terms of the same set of basic operations.
In an AGI system, the above restrictions do not rule out the feasibility of
predicting the future and describing the past. As shown by the previous example,
NARS can learn the regularity in its experience, and use it to predict the future.
Here the relevant knowledge is represented as temporal implication judgments
like “C /⇒ U f, c”, which is a summary of the relevant past experience, not an
accurate or approximate description of an objective “law of nature”.
The existence of objective causation is a long-lasting belief accepted by many
scientists and philosophers, but it has been challenged in the recent century both
in science (especially in physics) and in philosophy. From the point of view of
cognitive science, it can be argued that all the beliefs of a system are restricted
by the cognitive capability (nature) and past experience (nurture) of the system,
and there is no ground to assume that such a belief will converge to an objective
truth. Even so, an adaptive system can form beliefs about causation. According to
Piaget [21], such beliefs originate from the observations about the consequences
of one’s own operations. For example, if event E is repeatedly observed after
the execution of operation R, NARS will form a belief “⇑R /⇒ E 0.98, 0.99”,
which can be interpreted as “E is caused by R”. This will be the case even when
this achievement of the operation actually depends on a condition C, which is
usually (say 98% of the time) satisfied – the belief is stable and useful enough
for C to be ignored. However, when C is not usually satisfied, a belief like
“⇑R /⇒ E 0.49, 0.99” will not be as useful to the system, so in this case a more
reliable (though also more complicated) belief “(C, ⇑R) /⇒ E 0.99, 0.95” will
be favored by the system as the knowledge about how to get E. Please note that
even in such a case it is hard to say what is the “true cause” for E to happen,
since accurately speaking there may be other events involved, though for the
system’s current purpose, they do not need to be taken into consideration.
This discussion is also related to the Frame Problem [16], where the issue
is: for a given operation of the system, how to represent all of its preconditions
and consequences. The solutions proposed for this problem usually deal with it
in idealized or simplified situations, while the response to it in NARS is to give
up the attempt of getting all the information. An AGI system should depend on
operations with incomplete descriptions of preconditions and consequences, and
make decisions according to the available knowledge and resources [34].
NARS uses temporal inference to carry out prediction and explanation, which
are often considered as “causal inference”, though within the system there is no
built-in “causal relation”. The system has temporal versions of implication and
equivalence relations built into its grammar and inference rules, so a “causal
relation” can be represented in the system as their variant with domain-specific
and context-dependent additional requirements. This treatment is arguably sim-
ilar to the everyday usage of “causation”. In many fields, questions of the form
of “What is the real cause of X?”, with various X, have been under debate for
decades, even centuries. The notion of cause is interpreted very differently in
different situations – it can be deterministic or probabilistic; it may correspond
to a sufficient condition or a sufficient-and-necessary condition; it may or may
not be an intentional action; and so on. However, behind all of these versions,
Issues in Temporal and Causal Inference 215
the invariant components include a logical factor (from the given “causes”, the
“effects” can be derived) and a temporal factor (the “causes” happen no later
than the “effects”). NARS covers these two aspects in temporal inference, while
leaves the additional and variable aspects of causation to learning.
In this model, a causal relation and a covariant (or correlative) relation can
still be distinguished, as usually desired [4]. However here their difference is quan-
titative, not qualitative. If the judgment on “C /⇒ U ” gets its truth-value solely
by induction from a small amount of evidence, the confidence of the conclusion
will be relatively low, and we tend to consider such a relation “covariant”, but
if the conclusion can also be established by a chain of deduction, such as from
“C /⇒ M ” and “M /⇒ U ” where M is another event, then the relation between
C and U may be considered as “casual”, because it has an explanation leading to
a high confidence. As far as prediction is concerned, what matters is the truth-
value of the conclusion, not how they are derived. For instance, in Pavlovian
conditioning the actual relation between CS and US is often coincidental, not
causal, though animals in such experiments cannot tell the difference.
For a given event E, NARS can be asked to find its “cause” and “effect”.
The simplest form is to ask the system to instantiate the query variable ?x when
answering questions “?x /⇒ E” and “E /⇒?x”, respectively. When there are
multiple candidate answers, a choice rule will be invoked to compare their truth-
value, simplicity, relevance, etc., to pick the best answer. Additional requirements
can be provided for the term or the statement that can be accepted as an answer.
In general, NARS does not assume that such a question has a unique correct or
final answer, but always reports the best answer it can find using the available
knowledge and resources. Therefore, though the design of NARS does not include
an innate causal relation, the system has the potential to predict, or even to
control, the occurrence of an event. This is arguably what we should expect
from an AGI.
4 Conclusions
Temporal inference plays a crucial role in AGI. An intelligent system needs the
ability to learn the preconditions and consequences of each operation, to organize
them into feasible plans or skills to reach complicated goals, and to find stable
patterns among the events in its experience. This ability enables the system to
predict the future, and to prepare sequences of operations to achieve its goals.
Classical conditioning can be seen as a concrete case of this ability.
The approach of temporal inference in NARS allows temporal information to
be expressed in several forms for different purposes. Some temporal notions are
innate, while others are acquired, and they can be at different levels of granularity
and accuracy. NARS integrates temporal inference with other inference, and
utilizes a uniform memory for declarative, episodic, and procedural knowledge.
NARS carries out many cognitive functions, like prediction, that are usually
associated with “causal inference”. However there is no fixed notion of a “causal
relation” within the system. NARS is based on the assumption that an accurate
216 P. Wang and P. Hammer
description of the universe with objective causal relations among the events may
not be available to, or manageable by, the system, which makes NARS applicable
to situations where many other models cannot be applied. Instead of trying to
find or to approximate certain objective causal relations, what an intelligent
system should do is to behave according to the regularity and invariance that it
has summarized from its experience, and the generation, revision, and evaluation
of such knowledge is a lifelong task.
All the aspects of NARS described in this paper have been implemented
in the most recent version of the system. Currently the system, which is open
source, is under testing and tuning. As an AGI system, NARS is not designed
for any specific application, but as a testbed for a new theory about intelli-
gence. Though the current implementation already shows many interesting and
human-like properties, there are still many issues to be explored. This paper only
addresses the aspects of NARS that are directly related to temporal inference.
References
1. Allen, J.F.: Towards a general theory of action and time. Artificial Intelligence
23(2), 123–154 (1984)
2. Allen, J.F.: Time and time again: The many ways to represent time. International
Journal of Intelligent Systems 6(4), 341–356 (1991)
3. Anderson, J.J., Bracis, C., Goodwin, R.A.: Pavlovian conditioning from a forag-
ing perspective. In: Proceedings of the 32nd Annual Conference of the Cognitive
Science Society, pp. 1276–1281 (2010)
4. Cheng, P.W.: From covariation to causation: a causal power theory. Psychological
Review 104(2), 367–405 (1997)
5. Elgot-Drapkin, J., Perlis, D.: Reasoning situated in time I: Basic concepts. Journal
of Experimental & Theoretical Artificial Intelligence 2, 75–98 (1990)
6. Franklin, S.: A foundational architecture for artificial general intelligence. In:
Goertzel, B., Wang, P. (eds.) Advance of Artificial General Intelligence, pp. 36–54.
IOS Press, Amsterdam (2007)
7. Franklin, S., Strain, S., McCall, R., Baars, B.: Conceptual commitments of the
LIDA model of cognition. Journal of Artificial General Intelligence 4(2), 1–22
(2013)
8. Gallistel, C.R., Gibbon, J.: Time, rate, and conditioning. Psychological Review
107(2), 289–344 (2000)
9. Giunchiglia, E., Lee, J., Lifschitz, V., McCain, N., Turner, H.: Nonmonotonic causal
theories. Artificial Intelligence 153, 49–104 (2004)
10. Goodman, N.D., Ullman, T.D., Tenenbaum, J.B.: Learning a theory of causality.
Psychological Review (2011)
11. Halpern, J.Y., Pearl, J.: Causes and explanations: A structural-model approach.
Part I: Causes. The British Journal for the Philosophy of Science 56(4), 843 (2005)
12. Ismail, H.O., Shapiro, S.C.: Two problems with reasoning and acting in time. In:
Principles of Knowledge Representation and Reasoning: Proceedings of the Seventh
International Conference, pp. 355–365 (2000)
13. Kowalski, R.: Logic for Problem Solving. North Holland, New York (1979)
Issues in Temporal and Causal Inference 217
14. Laird, J.E.: The Soar Cognitive Architecture. MIT Press, Cambridge (2012)
15. van Lambalgen, M., Hamm, F.: The proper treatment of events. Blackwell Publishing,
Malden (2005)
16. McCarthy, J., Hayes, P.J.: Some philosophical problems from the standpoint of
artificial intelligence. In: Meltzer, B., Michie, D. (eds.) Machine Intelligence 4, pp.
463–502. Edinburgh University Press, Edinburgh (1969)
17. Newell, A.: Unified Theories of Cognition. Harvard University Press, Cambridge
(1990)
18. Nivel, E., Thórisson, K.R., Steunebrink, B.R., Dindo, H., Pezzulo, G., Rodriguez, M.,
Hernandez, C., Ognibene, D., Schmidhuber, J., Sanz, R., Helgason, H.P., Chella, A.,
Jonsson, G.K.: Bounded recursive self-improvement. CoRR abs/1312.6764 (2013)
19. Pearl, J.: Causality: Models, Reasoning, and Inference. Cambridge University
Press, Cambridge (2000)
20. Peirce, C.S.: Collected Papers of Charles Sanders Peirce, vol. 2. Harvard University
Press, Cambridge (1931)
21. Piaget, J.: The construction of reality in the child. Basic Books, New York (1954)
22. Pratt-Hartmann, I.: Temporal prepositions and their logic. Artificial Intelligence
166, 1–36 (2005)
23. Rattigan, M.J., Maier, M., Jensen, D.: Relational blocking for causal discovery. In:
Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence (2011)
24. Rescorla, R., Wagner, A.: A theory of Pavlovian conditioning: Variations in the
effectiveness of reinforcement and non reinforcement. In: Black, A., Prokasy, W.
(eds.) Classical Conditioning II, pp. 64–99. Appleton-Century-Crofts, New York
(1972)
25. Rosenbloom, P.S.: Rethinking cognitive architecture via graphical models.
Cognitive Systems Research 12, 198–209 (2011)
26. Shoham, Y.: Nonmonotonic reasoning and causation. Cognitive Science 14,
213–252 (1990)
27. Srivastava, N., Schrater, P.: Classical conditioning via inference over observable
situation contexts. In: Proceedings of the 36th Annual Meeting of the Cognitive
Science Society, pp. 1503–1508 (2014)
28. Sutton, R.S., Barto, A.G.: Time-derivative models of Pavlovian reinforcement. In:
Gabriel, M., Moore, J. (eds.) Learning and Computational Neuroscience: Founda-
tions of Adaptive Networks, pp. 497–537. MIT Press (1990)
29. Vila, L.: A survey on temporal reasoning in artificial intelligence. AI Communica-
tions 7(1), 4–28 (1994)
30. Wang, P.: Experience-grounded semantics: a theory for intelligent systems.
Cognitive Systems Research 6(4), 282–302 (2005)
31. Wang, P.: Rigid Flexibility: The Logic of Intelligence. Springer, Dordrecht (2006)
32. Wang, P.: Non-Axiomatic Logic: A Model of Intelligent Reasoning. World Scien-
tific, Singapore (2013)
33. Williamson, J.: Causality. In: Gabbay, D., Guenthner, F. (eds.) Handbook of Philo-
sophical Logic, vol. 14, pp. 95–126. Springer (2007)
34. Xu, Y., Wang, P.: The frame problem, the relevance problem, and a package
solution to both. Synthese (2012)
The Space of Possible Mind Designs
Roman V. Yampolskiy()
Abstract. The paper attempts to describe the space of possible mind designs by
first equating all minds to software. Next it proves some properties of the mind
design space such as infinitude of minds, size and representation complexity of
minds. A survey of mind design taxonomies is followed by a proposal for a new
field of investigation devoted to study of minds, intellectology.
1 Introduction
In 1984 Aaron Sloman published “The Structure of the Space of Possible Minds” in
which he described the task of providing an interdisciplinary description of that struc-
ture [1]. He observed that “behaving systems” clearly comprise more than one sort of
mind and suggested that virtual machines may be a good theoretical tool for analyzing
mind designs. Sloman indicated that there are many discontinuities within the space
of minds meaning it is not a continuum, nor is it a dichotomy between things with
minds and without minds [1]. Sloman wanted to see two levels of exploration namely:
descriptive – surveying things different minds can do and exploratory – looking at
how different virtual machines and their properties may explain results of the descrip-
tive study [1]. Instead of trying to divide the universe into minds and non-minds he
hoped to see examination of similarities and differences between systems. In this
work we attempt to make another step towards this important goal1.
What is a mind? No universally accepted definition exists. Solipsism notwithstand-
ing, humans are said to have a mind. Higher order animals are believed to have one as
well and maybe lower level animals and plants or even all life forms. We believe that
an artificially intelligent agent such as a robot or a program running on a computer
will constitute a mind. Based on analysis of those examples we can conclude that a
mind is an instantiated intelligence with a knowledgebase about its environment, and
while intelligence itself is not an easy term to define, a recent work of Shane Legg
provides a satisfactory, for our purposes, definition [2]. Additionally, some hold a
point of view known as Panpsychism, attributing mind like properties to all matter.
Without debating this possibility we will limit our analysis to those minds which can
1
This paper is adapted, with permission, from Dr. Yampolskiy’s forthcoming book – Artificial
Superintelligence: a Futuristic Approach © 2015 by CRC Press.
© Springer International Publishing Switzerland 2015
J. Bieger (Ed.): AGI 2015, LNAI 9205, pp. 218–227, 2015.
DOI: 10.1007/978-3-319-21365-1_23
The Space of Possible Mind Designs 219
actively interact with their environment and other minds. Consequently, we will not
devote any time to understanding what a rock is thinking.
If we accept materialism, we have to also accept that accurate software simulations
of animal and human minds are possible. Those are known as uploads [3] and they
belong to a class comprised of computer programs no different from that to which
designed or evolved artificially intelligent software agents would belong. Consequent-
ly, we can treat the space of all minds as the space of programs with the specific
property of exhibiting intelligence if properly embodied. All programs could be
represented as strings of binary numbers, implying that each mind can be represented
by a unique number. Interestingly, Nick Bostrom via some thought experiments spe-
culates that perhaps it is possible to instantiate a fractional number of mind, such as .3
mind as opposed to only whole minds [4]. The embodiment requirement is necessary
since a string is not a mind, but could be easily satisfied by assuming that a universal
Turing machine is available to run any program we are contemplating for inclusion in
the space of mind designs. An embodiment does not need to be physical as a mind
could be embodied in a virtual environment represented by an avatar [5, 6] and react
to simulated environment like a brain-in-a-vat or a “boxed” AI [7].
2 Infinitude of Minds
Two minds identical in terms of the initial design are typically considered to be dif-
ferent if they possess different information. For example, it is generally accepted that
identical twins have distinct minds despite exactly the same blueprints for their con-
struction. What makes them different is their individual experiences and knowledge
obtained since inception. This implies that minds can’t be cloned since different cop-
ies would immediately after instantiation start accumulating different experiences and
would be as different as two twins.
If we accept that knowledge of a single unique fact distinguishes one mind from
another we can prove that the space of minds is infinite. Suppose we have a mind M
and it has a favorite number N. A new mind could be created by copying M and re-
placing its favorite number with a new favorite number N+1. This process could be
repeated infinitely giving us an infinite set of unique minds. Given that a string of
binary numbers represents an integer we can deduce that the set of mind designs is an
infinite and countable set since it is an infinite subset of integers. It is not the same as
set of integers since not all integers encode for a mind.
Given that we have already shown that the set of minds is infinite, such an entity does
not exist. However, if we take into account our embodiment requirement the largest
mind may in fact correspond to the design at the physical limits of computation [9].
Another interesting property of the minds is that they all can be generated by a
simple deterministic algorithm, a variant of Levin Search [10]: start with an integer
(for example 42), check to see if the number encodes a mind, if not, we discard the
number, otherwise we add it to the set of mind designs and proceed to examine the
next integer. Every mind will eventually appear on our list of minds after a predeter-
mined number of steps. However, checking to see if something is in fact a mind is not
a trivial procedure. Rice’s theorem [11] explicitly forbids determination of non-trivial
properties of random programs. One way to overcome this limitation is to introduce
an arbitrary time limit on the mind-or-not-mind determination function effectively
avoiding the underlying halting problem.
Analyzing our mind-design generation algorithm we may raise the question of
complexity measure for mind designs, not in terms of the abilities of the mind, but in
terms of complexity of design representation. Our algorithm outputs minds in order of
their increasing value, but this is not representative of the design complexity of the
respective minds. Some minds may be represented by highly compressible numbers
with a short representation such as 1013, while others may be comprised of 10,000
completely random digits, for example 735834895565117216037753562914… [12].
We suggest that Kolmogorov Complexity (KC) [13] measure could be applied to
strings representing mind designs. Consequently some minds will be rated as “ele-
gant” – having a compressed representation much shorter than the original string
while others will be “efficient” representing the most efficient representation of that
particular mind. Interesting elegant minds might be easier to discover than efficient
minds, but unfortunately KC is not generally computable.
Each mind design corresponds to an integer and so is finite, but since the number
of minds is infinite some have a much greater number of states compared to others.
This property holds for all minds. Consequently, since a human mind has only a finite
number of possible states, there are minds which can never be fully understood by a
human mind as such mind designs have a much greater number of states, making their
understanding impossible as can be demonstrated by the pigeonhole principle.
of artificially intelligent systems for which inability to predict their future behavior is
a highly undesirable property from the safety point of view [40, 41]. Consciousness
on the other hand seems to have no important impact on the behavior of the system as
can be seen from some thought experiments supposing existence of “consciousless”
intelligent agents [42]. This may change if we are successful in designing a test, per-
haps based on observer impact on quantum systems [43], to detect and measure con-
sciousness.
In order to be social, two minds need to be able to communicate which might be
difficult if the two minds don’t share a common communication protocol, common
culture or even common environment. In other words, if they have no common
grounding they don’t understand each other. We can say that two minds understand
each other if given the same set of inputs they produce similar outputs. For example,
in sequence prediction tasks [44] two minds have an understanding if their predictions
are the same regarding the future numbers of the sequence based on the same ob-
served subsequence. We can say that a mind can understand another mind’s function
if it can predict the other’s output with high accuracy. Interestingly, a perfect ability
by two minds to predict each other would imply that they are identical and that they
have no free will as defined above.
5 A Survey of Taxonomies
Yudkowsky describes the map of mind design space as follows: “In one corner, a tiny
little circle contains all humans; within a larger tiny circle containing all biological
life; and all the rest of the huge map is the space of minds-in-general. The entire map
floats in a still vaster space, the space of optimization processes. Natural selection
creates complex functional machinery without mindfulness; evolution lies inside the
space of optimization processes but outside the circle of minds” [45].
Similarly, Ivan Havel writes “… all conceivable cases of intelligence (of people,
machines, whatever) are represented by points in a certain abstract multi-dimensional
“super space” that I will call the intelligence space (shortly IS). Imagine that a specific
coordinate axis in IS is assigned to any conceivable particular ability, whether human,
machine, shared, or unknown (all axes having one common origin). If the ability is
measurable the assigned axis is endowed with a corresponding scale. Hypothetically,
we can also assign scalar axes to abilities, for which only relations like “weaker-
stronger”, “better-worse”, “less-more” etc. are meaningful; finally, abilities that may be
only present or absent may be assigned with “axes” of two (logical) values (yes-no).
Let us assume that all coordinate axes are oriented in such a way that greater distance
from the common origin always corresponds to larger extent, higher grade, or at least
to the presence of the corresponding ability. The idea is that for each individual intelli-
gence (i.e. the intelligence of a particular person, machine, network, etc.), as well as for
each generic intelligence (of some group) there exists just one representing point in IS,
whose coordinates determine the extent of involvement of particular abilities [46].” If
the universe (or multiverse) is infinite, as our current physics theories indicate, then all
possible minds in all states are instantiated somewhere [4].
Ben Goertzel proposes the following classification of Kinds of Minds, mostly
centered around the concept of embodiment [47]: Singly Embodied – control a single
224 R.V. Yampolskiy
6 Conclusions
Science periodically experiences a discovery of a whole new area of investigation.
For example, observations made by Galileo Galilei lead to the birth of observational
astronomy [55], aka study of our universe; Watson and Crick’s discovery of the struc-
ture of DNA lead to the birth of the field of genetics [56], which studies the universe of
The Space of Possible Mind Designs 225
blueprints for organisms; Stephen Wolfram’s work with cellular automata has resulted
in “a new kind of science” [57] which investigates the universe of computational
processes. I believe that we are about to discover yet another universe – the universe of
minds.
As our understanding of human brain improves, thanks to numerous projects aimed
at simulating or reverse engineering a human brain, we will no doubt realize that hu-
man intelligence is just a single point in the vast universe of potential intelligent
agents comprising a new area of study. The new field, which I would like to term
intellectology, will study and classify design space of intelligent agents, work on es-
tablishing limits to intelligence (minimum sufficient for general intelligence and max-
imum subject to physical limits), contribute to consistent measurement of intelligence
across intelligent agents, look at recursive self-improving systems, design new intelli-
gences (making AI a sub-field of intellectology) and evaluate capacity for understand-
ing higher level intelligences by lower level ones.
References
1. Sloman, A.: The Structure and Space of Possible Minds. The Mind and the Machine:
philosophical aspects of Artificial Intelligence. Ellis Horwood LTD (1984)
2. Legg, S., Hutter, M.: Universal Intelligence: A Definition of Machine Intelligence. Minds
and Machines 17(4), 391–444 (2007)
3. Hanson, R.: If Uploads Come First. Extropy 6(2) (1994)
4. Bostrom, N.: Quantity of experience: brain-duplication and degrees of consciousness.
Minds and Machines 16(2), 185–200 (2006)
5. Yampolskiy, R., Gavrilova, M.: Artimetrics: Biometrics for Artificial Entities. IEEE
Robotics and Automation Magazine (RAM) 19(4), 48–58 (2012)
6. Yampolskiy, R.V., Klare, B., Jain, A.K.: Face recognition in the virtual world: Recogniz-
ing Avatar faces. In: 11th International Conference on Machine Learning and Applications
(2012)
7. Yampolskiy, R.V.: Leakproofing Singularity - Artificial Intelligence Confinement Prob-
lem. Journal of Consciousness Studies (JCS) 19(1–2), 194–214 (2012)
8. Wikipedia, Universal Turing Machine.
http://en.wikipedia.org/wiki/Universal_Turing_machine (retrieved April 14, 2011)
9. Lloyd, S.: Ultimate Physical Limits to Computation. Nature 406, 1047–1054 (2000)
10. Levin, L.: Universal Search Problems. Problems of Information Transm. 9(3), 265–266
(1973)
11. Rice, H.G.: Classes of recursively enumerable sets and their decision problems. Transac-
tions of the American Mathematical Society 74(2), 358–366 (1953)
12. Yampolskiy, R.V.: Efficiency Theory: a Unifying Theory for Information, Computation
and Intelligence. Journal of Discrete Mathematical Sciences & Cryptography 16(4–5),
259–277 (2013)
13. Kolmogorov, A.N.: Three Approaches to the Quantitative Definition of Information.
Problems Inform. Transmission 1(1), 1–7 (1965)
14. De Simone, A., et al.: Boltzmann brains and the scale-factor cutoff measure of the multiverse.
Physical Review D 82(6), 063520 (2010)
15. Kelly, K.: A Taxonomy of Minds (2007).
http://kk.org/thetechnium/archives/2007/02/a_taxonomy_of_m.php
16. Krioukov, D., et al.: Network Cosmology. Sci. Rep. (February 2012)
226 R.V. Yampolskiy
17. Bostrom, N.: Are You Living In a Computer Simulation? Philosophical Quarterly 53(211),
243–255 (2003)
18. Lanza, R.: A new theory of the universe. American Scholar 76(2), 18 (2007)
19. Miller, M.S.P.: Patterns for Cognitive Systems. In: 2012 Sixth International Conference on
Complex, Intelligent and Software Intensive Systems (CISIS) (2012)
20. Cattell, R., Parker, A.: Challenges for Brain Emulation: Why is it so Difficult? Natural In-
telligence 1(3), 17–31 (2012)
21. de Garis, H., et al.: A world survey of artificial brain projects, Part I: Large-scale brain
simulations. Neurocomputing 74(1–3), 3–29 (2010)
22. Goertzel, B., et al.: A world survey of artificial brain projects, Part II: Biologically inspired
cognitive architectures. Neurocomput. 74(1–3), 30–49 (2010)
23. Vernon, D., Metta, G., Sandini, G.: A Survey of Artificial Cognitive Systems: Implications
for the Autonomous Development of Mental Capabilities in Computational Agents. IEEE
Transactions on Evolutionary Computation 11(2), 151–180 (2007)
24. Yampolskiy, R.V., Fox, J.: Artificial General Intelligence and the Human Mental Model.
In: Singularity Hypotheses, pp. 129–145. Springer, Heidelberg (2012)
25. Yampolskiy, R.V., Ashby, L., Hassan, L.: Wisdom of Artificial Crowds—A Metaheuristic
Algorithm for Optimization. Journal of Intelligent Learning Systems and Applications
4(2), 98–107 (2012)
26. Ashby, L.H., Yampolskiy, R.V.: Genetic algorithm and Wisdom of Artificial Crowds algo-
rithm applied to Light up. In: 2011 16th International Conference on Computer Games
(CGAMES) (2011)
27. Hughes, R., Yampolskiy, R.V.: Solving Sudoku Puzzles with Wisdom of Artificial
Crowds. International Journal of Intelligent Games & Simulation 7(1), 6 (2013)
28. Port, A.C., Yampolskiy, R.V.: Using a GA and Wisdom of Artificial Crowds to solve soli-
taire battleship puzzles. In: 2012 17th International Conference on Computer Games
(CGAMES) (2012)
29. Hall, J.S.: Self-Improving AI: An Analysis. Minds and Machines 17(3), 249–259 (2007)
30. Yonck, R.: Toward a Standard Metric of Machine Intelligence. World Future Review 4(2),
61–70 (2012)
31. Herzing, D.L.: Profiling nonhuman intelligence: An exercise in developing unbiased tools
for describing other “types” of intelligence on earth. Acta Astronautica 94(2), 676–680
(2014)
32. Yampolskiy, R.V.: Construction of an NP Problem with an Exponential Lower Bound.
Arxiv preprint arXiv:1111.0305 (2011)
33. Yampolskiy, R.V.: Turing Test as a Defining Feature of AI-Completeness. In: Yang, X.-S.
(ed.) Artificial Intelligence, Evolutionary Computing and Metaheuristics. SCI, vol. 427,
pp. 3–17. Springer, Heidelberg (2013)
34. Yampolskiy, R.V.: AI-Complete, AI-Hard, or AI-Easy–Classification of Problems in AI.
In: The 23rd Midwest Artificial Intelligence and Cognitive Science Conference, Cincin-
nati, OH, USA (2012)
35. Yampolskiy, R.V.: AI-Complete CAPTCHAs as Zero Knowledge Proofs of Access to an
Artificially Intelligent System. ISRN Artificial Intelligence, 271878 (2011)
36. Hales, C.: An empirical framework for objective testing for P-consciousness in an artificial
agent. Open Artificial Intelligence Journal 3, 1–15 (2009)
37. Aleksander, I., Dunmall, B.: Axioms and Tests for the Presence of Minimal Consciousness
in Agents I: Preamble. Journal of Consciousness Studies 10(4–5), 4–5 (2003)
38. Arrabales, R., Ledezma, A., Sanchis, A.: ConsScale: a plausible test for machine
consciousness? (2008)
The Space of Possible Mind Designs 227
39. Aaronson, S.: The Ghost in the Quantum Turing Machine. arXiv preprint arXiv:1306.0159
(2013)
40. Yampolskiy, R.V.: Artificial intelligence safety engineering: Why machine ethics
is a wrong approach. In: Philosophy and Theory of Artificial Intelligence, pp. 389–396.
Springer, Berlin (2013)
41. Yampolskiy, R.V.: What to Do with the Singularity Paradox? In: Philosophy and Theory
of Artificial Intelligence, pp. 397–413. Springer, Heidelberg (2013)
42. Chalmers, D.J.: The conscious mind: In search of a fundamental theory. Oxford Univ.
Press (1996)
43. Gao, S.: A quantum method to test the existence of consciousness. The Noetic Journal
3(3), 27–31 (2002)
44. Legg, S.: Is There an Elegant Universal Theory of Prediction? In: Balcázar, J.L.,
Long, P.M., Stephan, F. (eds.) ALT 2006. LNCS (LNAI), vol. 4264, pp. 274–287.
Springer, Heidelberg (2006)
45. Yudkowsky, E.: Artificial Intelligence as a Positive and Negative Factor in Global Risk.
In: Bostrom, N., Cirkovic, M.M. (eds.) Global Catastrophic Risks, pp. 308–345. Oxford
University Press, Oxford (2008)
46. Havel, I.M.: On the Way to Intelligence Singularity. In: Kelemen, J., Romportl, J., Zackova, E.
(eds.) Beyond Artificial Intelligence. TIEI, vol. 4, pp. 3–26. Springer, Heidelberg (2013)
47. Geortzel, B.: The Hidden Pattern: A Patternist Philosophy of Mind. ch. 2. Kinds of Minds.
Brown Walker Press (2006)
48. Goertzel, B.: Mindplexes: The Potential Emergence of Multiple Levels of Focused Con-
sciousness in Communities of AI’s and Humans Dynamical Psychology (2003).
http://www.goertzel.org/dynapsyc/2003/mindplex.htm.
49. Hall, J.S.: Chapter 15: Kinds of Minds, in Beyond AI: Creating the Conscience of the Ma-
chine. Prometheus Books, Amherst (2007)
50. Roberts, P.: Mind Making: The Shared Laws of Natural and Artificial. CreateSpace (2009)
51. Kelly, K.: Inevitable Minds (2009).
http://kk.org/thetechnium/archives/2009/04/inevitable_mind.php
52. Kelly, K.: The Landscape of Possible Intelligences (2008).
http://kk.org/thetechnium/archives/2008/09/the_landscape_o.php
53. Kelly, K.: What Comes After Minds? (2008).
http://kk.org/thetechnium/archives/2008/12/what_comes_afte.php
54. Kelly, K.: The Evolutionary Mind of God (2007).
http://kk.org/thetechnium/archives/2007/02/the_evolutionar.php
55. Galilei, G.: Dialogue concerning the two chief world systems: Ptolemaic and Copernican.
University of California Pr. (1953)
56. Watson, J.D., Crick, F.H.: Molecular structure of nucleic acids. Nature 171(4356),
737–738 (1953)
57. Wolfram, S.: A New Kind of Science. Wolfram Media, Inc. (May 14, 2002)
Papers Presented as Posters
A Definition of Happiness for Reinforcement
Learning Agents
1 Introduction
Desiderata. We can simply ask a human how happy they are. But artificial rein-
forcement learning agents cannot yet speak. Therefore we use our human “com-
mon sense” intuitions about happiness to come up with a definition. We arrive
at the following desired properties.
Research supported by the People for the Ethical Treatment of Reinforcement Learn-
ers http://petrl.org. See the extended technical report for omitted proofs and details
about the data analysis [4].
Both authors contributed equally.
c Springer International Publishing Switzerland 2015
J. Bieger (Ed.): AGI 2015, LNAI 9205, pp. 231–240, 2015.
DOI: 10.1007/978-3-319-21365-1 24
232 M. Daswani and J. Leike
2 Reinforcement Learning
In reinforcement learning (RL) an agent interacts with an environment in cycles:
at time step t the agent chooses an action at ∈ A and receives an observation
ot ∈ O and a real-valued reward rt ∈ R; the cycle then repeats for time step
t + 1 [11]. The list of interactions a1 o1 r1 a2 o2 r2 . . . is called a history. We use ht
to denote a history of length t, and we use the shorthand notation h := ht−1 and
h := ht−1 at ot rt . The agent’s goal is to choose actions to maximise cumulative
rewards. To avoid infinite sums, ∞we use a discount factor γ with 0 < γ < 1 and
maximise the discounted sum t=1 γ t rt . A policy is a function π mapping every
history to the action taken after seeing this history, and an environment μ is a
stochastic mapping from histories to observation-reward-tuples.
A policy π together with an environment μ yields a probability distribution
over histories. Given a random variable X over histories, we write the π-μ-
expectation of X conditional on the history h as Eπμ [X | h].
The (true) value function Vμπ of a policy π in environment μ maps a history
ht to the expected total future reward when interacting with environment μ and
taking actions according to the policy π:
∞
Vμπ (ht ) := Eπμ k=t+1 γ
k−t−1
rk | ht . (1)
It is important to emphasise that Eπμ denotes the objective expectation that can
be calculated only by knowing the environment μ. The optimal value function
Vμ∗ is defined as the value function of the optimal policy, Vμ∗ (h) := supπ Vμπ (h).
Typically, reinforcement learners do not know the environment and are trying
to learn it. We model this by assuming that at every time step the agent has
(explicitly or implicitly) an estimate V̂ of the value function Vμπ . Formally, a
value function estimator maps a history h to a value function estimate V̂ . Finally,
we define an agent to be a policy together with a value function estimator. If
the history is clear from context, we refer to the output of the value function
estimator as the agent’s estimated value.
If μ only depends on the last observation and action, μ is called Markov
decision process (MDP). In this case, μ(ot rt | ht−1 at ) = μ(ot rt | ot−1 at ) and the
observations are called states (st = ot ). In MDPs we use the Q-value function, the
∞
value of a state-action pair, defined as Qπμ (st , at ) := Eπμ k=t+1 γ
k−t−1
rk | st at .
Assuming that the environment is an MDP is very common in the RL literature,
but here we will not make this assumption.
not conform to our intuition: sometimes enjoying pleasures just fails to provide
happiness, and reversely, enduring suffering does not necessarily entail unhap-
piness (see Example 3 and Example 7). In fact, it has been shown empirically
that rewards and happiness cannot be equated [8] (p-value < 0.0001).
There is also a formal problem with defining happiness in terms of reward: we
can add a constant c ∈ R to every reward. No matter how the agent-environment
interaction
t plays out, the agent will have received additional cumulative rewards
C := i=1 c. However, this did not change the structure of the reinforcement
learning problem in any way. Actions that were optimal before are still optimal
and actions that are slightly suboptimal are still slightly suboptimal to the same
degree. For the agent, no essential difference between the original reinforcement
learning problem and the new problem can be detected: in a sense the two
problems are isomorphic. If we were to define an agent’s happiness as received
reward, then an agent’s happiness would vary wildly when we add a constant to
the reward while the problem stays structurally exactly the same.
We propose the following definition of happiness.
then we call E := Eπρ the agent’s subjective expectation. Note that we can always
find such a probability distribution, but this notion only really makes sense for
model-based agents (agents that learn a model of their environment). Using the
agent’s subjective expectation, we can rewrite Definition 1 as follows.
Example 3. Mary is travelling on an air plane. She knows that air planes crash
very rarely, and so is completely at ease. Unfortunately she is flying on a budget
airline, so she has to pay for her food and drink. A flight attendant comes to her
seat and gives her a free beverage. Just as she starts drinking it, the intercom
informs everyone that the engines have failed. Mary feels some happiness from
the free drink (payout), but her expected future reward is much lower than in
the state before learning the bad news. Thus overall, Mary is unhappy.
For each of the two components, payout and good news, we distinguish the
following two sources of happiness.
– Pessimism: 1 the agent expects the environment to contain less rewards than
it actually does.
– Luck: the outcome of rt is unusually high due to randomness.
Example 4. Suppose Mary fears flying and expected the plane to crash (pes-
simism). On hearing that the engines failed (bad luck ), Mary does not experi-
ence very much change in her future expected reward. Thus she is happy that
she (at least) got a free drink.
The following proposition states that once an agent has learned the envi-
ronment, its expected happiness is zero. In this case, underestimation cannot
contribute to happiness and thus the only source of happiness is luck, which
cancels out in expectation.
1
Optimism is a standard term in the RL literature to denote the opposite phe-
nomenon. However, this notion is somewhat in discord with optimism in humans.
236 M. Daswani and J. Leike
Scaling. If we transform the rewards to rt = crt + d with c > 0, d ∈ R for each
time step t without changing the value function, the value of will be completely
different. However, a sensible learning algorithm should be able to adapt to
the new reinforcement learning problem with the scaled rewards without too
much problem. At that point, the value function gets scaled as well, Vnew (h) =
cV (h) + d/(1 − γ). In this case we get
(hat ot r , Vnew ) = rt + γVnew (hat ot rt ) − Vnew (h)
d d
= crt + d + γcV (hat ot rt ) + γ − cV (h) −
1−γ 1−γ
= c rt + γV (hat ot rt ) − V (h) ,
hence happiness gets scaled by a positive factor and thus its sign remains the
same, which would not hold if we defined happiness just in terms of rewards.
Subjectivity. The definition (4) of depends only on the current reward and
the agent’s current estimation of the value function, both of which are available
to the agent.
Intuitively, it seems that if things are constantly getting better, this should
increase happiness. However, this is not generally the case: even an agent that
obtains monotonically increasing rewards can be unhappy if it thinks that these
rewards mean even higher negative rewards in the future.
Example 7. Alice has signed up for a questionable drug trial which examines
the effects of a potentially harmful drug. This drug causes temporary pleasure
to the user every time it is used, and increased usage results in increased plea-
sure. However, the drug reduces quality of life in the long term. Alice has been
informed of the potential side-effects of the drug. She can be either part of a
placebo group or the group given the drug. Every morning Alice is given an
injection of an unknown liquid. She finds herself feeling temporary but intense
feelings of pleasure. This is evidence that she is in the non-placebo group, and
thus has a potentially reduced quality of life in the long term. Even though she
experiences pleasure (increasing rewards) it is evidence of very bad news and
thus she is unhappy.
2 2
β : −1
α:0 α:2
Happiness
Rewards
s0 s1 0 0
β : −1 Optimistic
−2 Pessimistic −2
Opt. (rewards)
0 20 40 60 80 100
Time step
Fig. 1. MDP of Example 8 with tran- Fig. 2. A plot of happiness for Exam-
sitions labelled with actions α or β ple 8. We use the learning rate α = 0.1.
and rewards. We use the discount factor The pessimistic agent has zero happiness
γ = 0.5. The agent starts in s0 . Define (and rewards), whereas the optimistic
π0 (s0 ) := α, then V π0 (s0 ) = 0. The opti- agent is initially unhappy, but once it
∗
mal policy is π ∗ (s0 ) = β, so V π (s0 ) = 1 transitions to state s1 becomes happy.
∗
and V π (s1 ) = 4. The plot also shows the rewards of the
optimistic agent.
However, the next transition is s1 αs1 2 which has happiness (s1 αs1 2, V̂0 ) =
2 + γ Q̂0 (s1 , α) − Q̂0 (s1 , α) = 2 − 0.5ε. If Q̂0 is not updated by some learn-
ing mechanism the agent will continue to accrue this positive happiness for all
future time steps. If the agent does learn, it will still be some time steps before
Q̂ converges to Q∗ and the positive happiness becomes zero (see Figure 2). It
is arguable whether this agent which suffered one time step of unhappiness but
potentially many time steps of happiness is overall a happier agent, but it is
some evidence that absolute pessimism does not necessarily lead to the happiest
agents.
How can an agent increase their own happiness? The first source of happiness,
luck, depends entirely on the outcome of a random event that the agent has
no control over. However, the agent could modify its learning algorithm to be
systematically pessimistic about the environment. For example, when fixing the
value function estimation below rmin /(1−γ) for all histories, happiness is positive
at every time step. But this agent would not actually take any sensible actions.
Just as optimism is commonly used to artificially increase exploration, pessimism
discourages exploration which leads to poor performance. As demonstrated in
Example 8, a pessimistic agent may be less happy than a more optimistic one.
Additionally, an agent that explicitly tries to maximise its own happiness is
no longer a reinforcement learner. So instead of asking how an agent can increase
its own happiness, we should fix a reinforcement learning algorithm and ask for
the environment that would make this algorithm happy.
240 M. Daswani and J. Leike
6 Conclusion
An artificial superintelligence might contain subroutines that are capable of suf-
fering, a phenomenon that Bostrom calls mind crime [1, Ch. 8]. More generally,
Tomasik argues that even current reinforcement learning agents could have moral
weight [12]. If this is the case, then a general theory of happiness for reinforce-
ment learners is essential; it would enable us to derive ethical standards in the
treatment of algorithms. Our theory is very preliminary and should be thought
of as a small step in this direction. Many questions are left unanswered, and we
hope to see more research on the suffering of AI agents in the future.
Acknowledgments. We thank Marcus Hutter and Brian Tomasik for careful reading
and detailed feedback. The data from the smartphone experiment was kindly provided
by Robb Rutledge. We are also grateful to many of our friends for encouragement and
interesting discussions.
References
1. Bostrom, N.: Superintelligence: Paths, Dangers. Oxford University Press, Strate-
gies (2014)
2. Brickman, P., Campbell, D.T.: Hedonic relativism and planning the good society.
Adaptation-Level Theory, pp. 287–305 (1971)
3. Brickman, P., Coates, D., Janoff-Bulman, R.: Lottery winners and accident victims:
Is happiness relative? Journal of Personality and Social Psychology 36, 917 (1978)
4. Daswani, M., Leike, J.: A definition of happiness for reinforcement learn-
ing agents. Technical report, Australian National University (2015).
http://arxiv.org/abs/1505.04497
5. Diener, E., Lucas, R.E., Scollon, C.N.: Beyond the hedonic treadmill: Revising the
adaptation theory of well-being. American Psychologist 61, 305 (2006)
6. Jacobs, E., Broekens, J., Jonker, C.: Joy, distress, hope, and fear in reinforce-
ment learning. In: Conference on Autonomous Agents and Multiagent Systems,
pp. 1615–1616 (2014)
7. Niv, Y.: Reinforcement learning in the brain. Journal of Mathematical Psychology
53, 139–154 (2009)
8. Rutledge, R.B., Skandali, N., Dayan, P., Dolan, R.J.: A computational and neu-
ral model of momentary subjective well-being. In: Proceedings of the National
Academy of Sciences (2014)
9. Schmidhuber, J.: Formal theory of creativity, fun, and intrinsic motivation (1990–
2010). IEEE Transactions on Autonomous Mental Development. 2, 230–247 (2010)
10. Sutton, R., Barto, A.: Time-derivative models of Pavlovian reinforcement. In:
Learning and Computational Neuroscience: Foundations of Adaptive Networks,
pp. 497–537. MIT Press (1990)
11. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press,
Cambridge (1998)
12. Tomasik, B.: Do artificial reinforcement-learning agents matter morally? Technical
report, Foundational Research Institute (2014). http://arxiv.org/abs/1410.8233
Expression Graphs
Unifying Factor Graphs and Sum-Product Networks
Abram Demski(B)
1 Motivation
Factor graphs are a graphical model which generalizes Bayesian networks,
Markov networks, constraint networks, and other models [4]. New light was shed
on existing algorithms through this generalization.1
As a result, factor graphs have been treated as a unifying theory for graphical
models. It has furthermore been proposed, in particular in [2] and [11], that
factor graphs can provide a computational foundation through which we can
understand cognitive processes. The present work came out of thinking about
potential inadequacies in the Sigma cognitive architecture [11].
Factor graphs have emerged from progressive generalization of techniques
which were initially narrow AI. Because they capture a breadth of knowledge
about efficient AI algorithms, they may be useful for those AGI approaches which
This work was sponsored by the U.S. Army. Statements and opinions expressed may
not reflect the position or policy of the United States Government, and no official
endorsement should be inferred. Special thanks to Paul Rosenbloom and L ukasz
Stafiniak for providing comments on a draft of this paper.
1
The sum-product algorithm for factor graphs provided a generalization of existing
algorithms for more narrow domains, often the best algorithms for those domains
at the time. The main examples are belief propagation, constraint propagation, and
turbo codes [4]. Other algorithms such as mean-field can be stated very generally
using factor graphs as well.
c Springer International Publishing Switzerland 2015
J. Bieger (Ed.): AGI 2015, LNAI 9205, pp. 241–250, 2015.
DOI: 10.1007/978-3-319-21365-1 25
242 A. Demski
seek to leverage the progress which has been made in narrow AI, rather than
striking out on an entirely new path. However, this paper will argue that factor
graphs fail to support an important class of algorithms.
Sum-product networks (SPNs) are a new type of graphical model which
represent a probability distribution through sums and products of simpler dis-
tributions [7].2 Whereas factor graphs may blow up to exponential-time exact
inference, SPN inference is guaranteed to be linear in the size of the SPN.
An SPN can compactly represent any factor graph for which exact inference is
tractable. When inference is less efficient, the corresponding SPN will be larger.
In the worst case, an SPN may be exponentially larger than the factor graph
which it represents. On the other hand, being able to represent a distribution
as a compact SPN does not imply easy inference when converted to a factor
graph. There exist SPNs which represent distributions for which standard exact
inference algorithms for factor graphs are intractable.
Probabilistic context-free grammars (PCFGs) are an important class of prob-
abilistic model in computational linguistics. In [8], the translation of PCFGs into
factor graphs (specifically, into Bayesian networks) is given. This allows general
probabilistic inference on PCFGs (supporting complicated queries which special-
case PCFG algorithms don’t handle). However, the computational complexity
becomes exponential due to the basic complexity of factor graph inference.
Sum-product networks can represent PCFGs with bounded sentence length
to be represented in an SPN of size cubic in the length, by directly encoding the
sums and products of the inside algorithm (a basic algorithm for PCFGs) This
preserves cubic complexity of inference, while allowing the more general kinds of
queries for which [8] required exponential-time algorithms. This illustrates that
SPNs can be efficient in cases where factor graphs are not.
More recently, [12] used loopy belief propagation (an approximate algorithm
for factor graph problems) to efficiently approximate complex parsing tasks
beyond PCFGs, but did so by implementing a dynamic programming parse as
one of the factors. This amounts to using SPN-style inference as a special module
to augment factor-graph inference.
The present work explores a more unified approach, to integrate the two types
of reasoning without special-case optimization. The resulting representation is
related to the expression tree introduced in [4]. As such, the new formalism is
being referred to as the Expression Graph (EG).
2 Factor Graphs
A factor graph (FG) is a bipartite graph where one set of nodes represents
the variables x1 , x2 , ...xn ∈ U, and the other set of nodes represent real-valued
2
Case-factor diagrams [6] are almost exactly the same as sum-product networks,
and have historical precedence. However, the formalism of sum-product networks
has become more common. Despite their similarities, the two papers [6] and [7]
use very different mathematical setups to justify the new graphical model and the
associated inference algorithm. (A reader confused by one paper may benefit from
trying the other instead.)
Expression Graphs: Unifying Factor Graphs and Sum-Product Networks 243
3 Sum-Product Networks
A sum-product network (SPN) is a directed acyclic graph with a unique root.
Terminal nodes are associated with indicator variables. Each domain variable in
U has an indicator for each of its values; these take value 1 when the variable
takes on that value, and 0 when the variable is in a different value.
The root and the internal nodes are all labeled as sum nodes or product nodes.
A product node represents the product of its children. The links from a sum node
to its children are weighted, so that the sum node represents a weighted sum of
its children. Thus, the SPN represents an expression formed out of the indicator
variables via products and weighted sums. As for factor graphs, this expression
could represent a variety of things, but in order to build an intuition for the
structure we shall assume that this represents a probability distribution over U .
The scope of a node is the set of variables appearing under it. That is: the
scope of a leaf node is the variable associated with the indicator variable, and
the scope of any other node is the union of the scopes of its children.
244 A. Demski
4 Expression Graphs
In order to compactly represent all the distributions which can be represented
by SPNs or FGs, we introduce the expression graph (EG).
An expression graph is little more than an SPN with the network restrictions
lifted: a directed acyclic graph with a unique root, whose non-terminal nodes are
labeled as sums or products. The terminal nodes will hold functions rather than
indicators; this is a mild generalization for convenience. For discrete variables,
these functions would be represented as tables of values. For the continuous
case, some class of functions such as Gaussians would be chosen. These terminal
functions will be referred to as elemental functions. We will only explicitly work
with the discrete case here. Expression graphs represent complicated functions
build up from the simple ones, as follows:
n
C (A ) if N is a sum node
N (A) = ni=1 i i
i=1 i (Ai ) if N is a product node
C
Where Ci are the n children of node N , and A is the union of their arguments Ai .
From now on, we will not distinguish strongly between a node and the function
associated with the node. The root node is D, the global distribution.
The scope of a node is defined as the set arguments in its associated function,
inheriting the definitions of complete and decomposable which were introduced
for SPNs. Unlike in the case of SPNs, we do not enforce these properties.
3
In [7], a weaker requirement of consistency replaces decomposability. However, with-
out full decomposability, the inference by differentiation can give wrong results. For
example, the SPN representing .5x21 + .5x1 is acceptable by their definition. Differ-
entiation would have it that x1 = true is twice as likely as x1 = f alse, whereas the
two are equally likely by evaluation of the network value at each instantiation. We
therefore do not consider the weaker requirement here.
Expression Graphs: Unifying Factor Graphs and Sum-Product Networks 245
F (x) G(x, y)
× ×
F (x) .4 G(x) .6
while ensuring that the overarching structure is tractable. This is used as part of
an SPN structure-learning algorithm, which takes advantage of existing factor-
graph structure learning to capture certain types of variable interactions which
took too long to discover in a previous (pure SPN) structure learner.
5 Exact Inference
Here, A is the scope of the parent, which (by completeness) is also the scope of
each child. (As in the previous section, Ci will represent the children of N .)
Similarly, if we had decomposability, we could push down the sums through
product nodes:
n
N (A) = Ci (Ai )
A−X i=1 Ai −X
Here, A is the scope of the parent, and the Ai are the scopes of the children. By
decomposability, the Ai must be mutually exclusive, so that we can apply the
distributive rule to push the sum down through the product. This reduces the
complexity of the computation by allowing us to sum over the sets of variables
Ai separately, and then combine the results.
Since we do not in general have a graph which is complete and decomposable,
we need to adjust for that. The adjustment at sum nodes is computationally easy,
Expression Graphs: Unifying Factor Graphs and Sum-Product Networks 247
augmenting the values from children with a multiplier to account for summing
out the wider scope required by the parent:
n
N (A) = Ci (Ai ) (1)
A−X i=1 A−Ai −X Ai −X
⎛ ⎞
n
= ⎝ |V(y)|⎠ Ci (Ai )
i=1 y∈A−Ai −X Ai −{x}
Where B is the set of variables that appear in more than one of Ai . (Note that
we do not have to worry that some variables might appear in no Ai , because
the scope of the parent was defined as the union of the scopes of the children.)
What this equation says is just that we cannot push down the summation
over a particular variable if that variable is shared between several children of a
product node. This fails to satisfy the conditions for the distributive law. As a
result, we have to sum this variable out at the offending product node.
Applying these equations recursively, we can create a dynamic-programming
style algorithm which computes the desired marginal. This proceeds first in a
downward pass, in which we mark which variables we need to avoid summing
out at which nodes. Then, we pass messages up through the network. The mes-
sages are multidimensional arrays, giving a value for each combination of marked
variables.
Algorithm 1. To find U−X D(U):
1. Mark variables X at the root.
2. For non-decomposable products, mark shared variables in the product node.
3. Propagate marks downward, marking a variable in a child whenever it is
marked in the parent and occurs in the scope of the child.
4. Set the messages M(N ) where N is a terminal node to be H N (A), where
A are the arguments of the function N and H are any unmarked arguments.
5. Propagate up messages M(N ) =
n
i=0πni Hi M(Ci ) if N is a sum node
H i=0 M(Ci ) if N is a product node
where πi is the same multiplier between parent and child as in the upward
messages, Cj are the m other children of parent Pi , and H is the set of
variables marked in Pi and not in N .
4. For each variable v ∈ U, compute the marginal as the sum of partial deriva-
tives for terminal nodes, and partial derivatives coming from π-adjustments
involving that variable:
n
Md (v) = i=1 Hi Fi Md (Fi )
+ (Pi ,Cj ) Md (Pi )
the Fi are the terminal nodes, Hi are the arguments of Fi other than
where
v, (Pi ,Cj ) is summing over parent-child pairs (Pi , Cj ) such that Pi has v in
scope and not marked but Cj does not (so that π-adjustments would appear
in the messages).
The intuition is that upward messages compute the total value of the cor-
responding AC, whereas downward messages compute the partial derivative of
the total value with respect to individual AC nodes. Each scalar value in the
multidimensional message corresponds to an AC node.
This computes the same quantities which we would get by compiling to an
AC and differentiating. The technique rolls together the compilation to an AC
with the inference in the AC, so that if we apply it to an EG representing a factor
Expression Graphs: Unifying Factor Graphs and Sum-Product Networks 249
6 Future Work
The concrete algorithms here have dealt with finite, fixed-size expression graphs,
but the motivation section mentioned representation of grammars, which handle
sequential information of varying size. Work is in progress applying expression
graphs to grammar learning, enabling an expressive class of grammars.
Unlike factor graphs, expression graphs and SPNs can represent structural
uncertainty within one graph, by taking a sum of multiple possible structures.
Theoretically, structure learning and weight learning can be reduced to one prob-
lem. Of course, a graph representing the structure learning problem is too large
for practical inference. In [5], infinite SPNs are defined via Dirichlet distribu-
tions, and sampling is used to make them tractable. Perhaps future work could
define similar infinite EGs to subsume structure learning into inference.
The structure-learning algorithm in [10] is also quite interesting, employ-
ing heuristics to split the data into cases or factor the data, alternatively. This
could point to two different sets of cognitive mechanisms, dealing independently
with sums and products. Sum-like mechanisms include clustering, boosting, and
bagging. These deal with complexity by making mixture models. Product-like
mechanisms deal with complexity by splitting up the variables involved into
sub-problems which may be independent or related by constraints (that is, fac-
toring!). Perhaps distinct psychological processes deal with these two options. In
future work, we hope to use this distinction in a cognitive architecture context.
7 Conclusion
It is hoped that this representation will help shed light on things from a theoreti-
cal perspective, and also perhaps aid in practical implementation in cases where
a mixture of factor-graph style and SPN-style reasoning is required. Expres-
sion graphs are a relatively simple extension: from the perspective of a factor
graph, we are merely adding the ability to take sums of distributions rather than
250 A. Demski
only products. From the perspective of SPNs, all we are doing is dropping the
constraints on network structure. This simple move nonetheless provides a rich
representation.
This formalism helps to illustrate the relationship between factor graphs
and sum-product networks, which can be somewhat confusing at first, as sum-
product networks are described in terms of indicator variables and representing
the network polynomial, concepts which may seem alien to factor graph repre-
sentations.
Expression graphs improve upon factor graphs in two respects. First, it is
a more expressive representation than factor graphs as measured in the kinds
of distributions which can be represented compactly. Second, the representation
is more amenable to exact inference in some cases, where generic factor graph
inference algorithms have suboptimal complexity and must be augmented by
special-case optimization to achieve good performance.
References
1. Darwiche, A.: A differential approach to inference in bayesian networks. Journal
of the ACM (2003)
2. Derbinsky, N., Bento, J., Yedidia, J.: Methods for integrating knowledge with the
three-weight optimization algorithm for hybrid cognitive processing. In: AAAI Fall
Symposium on Integrated Cognition (2013)
3. Drechsler, R., Becker, B.: Binary Decision Diagrams: Theory and Implementation.
Springer (1998)
4. Kschischang, F., Frey, B., Loeliger, H.: Factor graphs and the sum-product
algorithm. IEEE Transactions on Information Theory (2001)
5. Lee, S.W., Watkins, C., Zhang, B.T.: Non-parametric bayesian sum-product net-
work. In: Proc. Workshop on Learning Tractable Probabilistic Models, vol. 1 (2014)
6. McAllester, D., Collins, M., Pereira, F.: Case-factor diagrams for structured prob-
abilistic modeling. In: Proc. UAI 2004 (2004)
7. Poon, H., Domingos, P.: Sum-product networks: A new deep architecture. In: Proc.
UAI 2011 (2011)
8. Pynadath, D., Wellman, M.: Generalized queries on probabilistic context-free
grammars. IEEE Transactions on Pattern Analysis and Machine Intelligence 20,
65–77
9. Ratajczak, M., Tschiatschek, S., Pernkopf, F.: Sum-product networks for struc-
tured prediction: Context-specific deep conditional random fields. In: Proc. Work-
shop on Learning Tractable Probabilistic Models, vol. 1 (2014)
10. Rooshenas, A., Lowd, D.: Learning sum-product networks with direct and
indirect variable interactions. In: Proc. Workshop on Learning Tractable Prob-
abilistic Models, vol. 1 (2014)
11. Rosenbloom, P.: The sigma cognitive architecture and system. AISB Quarterly
(2013)
12. Smith, D.A., Eisner, J.: Dependency parsing by belief propagation. In: Proceed-
ings of the Conference on Empirical Methods in Natural Language Processing.
Association for Computational Linguistics (2008)
Toward Tractable Universal Induction Through
Recursive Program Learning
Arthur Franz(B)
1 Introduction
What is intelligence? After compiling a large set of definitions in the literature
Legg and Hutter [8] came up with a definition of intelligence that is consistent
with most other definitions:
“Intelligence measures an agent’s ability to achieve goals in a wide range of
environments.”
Based on that definition Marcus Hutter [5] has developed a mathematical
formulation and theoretical solution to the universal AGI problem, called AIXI.
Although it is not computable, approximations may lead to tractable solutions.
AIXI is in turn essentially based on Solomonoff’s theory of universal induction
[15], that assigns the following universal prior to any sequence x:
M (x) := 2−l(p) (1)
p:U (p)=x∗
short programs (Occam’s razor) this proves that compressed representations lead
to successful predictions of any computable environment. This realization makes
it especially promising to try to construct an efficient algorithm for universal
induction as a milestone, even cornerstone, of AGI.
A general but brute force approach is universal search. For example, Levin
search [10] executes all possible programs, starting with the shortest, until one of
them generates the required data sequence. Although general, it is not surprising
that the approach is computationally costly and rarely applicable in practice.
On the other side of the spectrum, there are non-general but computationally
tractable approaches. Specifically, inductive programming techniques are used to
induce programs from data [6] and there are some approaches within the context
of AGI as well [3,12,14,16]. However, the reason why the generalization of many
algorithms is impeded is the curse of dimensionality faced by all algorithms
at some point. Considering the (algorithmic) complexity and diversity of tasks
solved by today’s typical algorithms, we observe that most if not all will be
highly specific and many will be able to solve quite complex tasks (known as
“narrow AI” [7]). Algorithms from the field of data compression are no exception.
For example, the celebrated Lempel-Ziv compression algorithm (see e.g. [2])
handles stationary sequences but fails at compressing simple but non-stationary
sequences efficiently. AI algorithms undoubtedly exhibit some intelligence, but
when comparing them to humans, a striking difference comes to mind: the tasks
solvable by humans seem to be much less complex albeit very diverse, while tasks
solved by AI algorithms tend to be quite complex but narrowly defined (Fig. 1).
For this reason, we should not try to beat the curse of dimensionality merci-
lessly awaiting us at high complexities, but instead head for general algorithms
at low complexity levels and fill the task cup from the bottom up.
ill-defined for agents not disposing of such external information or the agent has
to be provided with such information extending texts to arbitrary data, which is
equivalent to the compression of arbitrary sequences as proposed here.
2.2 Formalization
Saving all the generated strings paired with their optimal programs (xi , poi ) with
poi (xi ) = argminp {|p| + log t : U (p) = xi in t steps, |p| ≤ L}, we have all we
need for the progress measure. The goal of universal induction is to find all such
optimal programs poi for each of the xi . If pi is the actually found program, its
performance can be measured by
|poi |
ri (L) = ∈ (0, 1] (3)
|pi |
One may object that the number of programs increases exponentially with
their length such that an enumeration quickly becomes intractable. This is a
weighty argument if the task is universal search – a general procedure for inver-
sion problems. However, we suggest this procedure to play the mere role of a
benchmark for an efficient universal induction algorithm, which will use com-
pletely different methods than universal search and will be described in Section
3. Therefore, using the set of simple programs as a benchmark may be enough
to set the universal induction algorithm on the right track.
Note that only a small fraction of possible sequences can be generated this
way. After all, it is well known that only exponentially few, O(2n−m ), sequences
of length n can be compressed by m bits [11].
1
The Kolmogorov complexity of a string is defined as the length of the shortest
program able to generate that string on a Turing machine.
Toward Tractable Universal Induction Through Recursive Program Learning 255
2.3 Implementation
Implementing this test does not require coding of a universal Turing machine
(TM) since computers are already universal TMs. Instead, enumerating all tran-
sition functions of an n-state machine is sufficient. The machine used here has
one bidirectional, two way infinite work tape and a unidirectional, one way infi-
nite, write only output tape. Two symbols are used, B = {0, 1}, the states taken
from Q = {0, . . . , n − 1}. The transition map is then:
Q × B → Q × {0, 1, L, R, N } × {0, 1, N } (5)
where L, R, and N denote left, right and no motion of the head, respectively.
The work tape can move in any direction while the output tape either writes 0
or 1 and moves to the right, or does not move at all (N ). No halting or accepting
states were utilized. The machine starts with both tapes filled with zeros. A finite
sequence x is considered as generated by machine T given transition function
(program) p, if it is at the left of the output head at some point: we write
T (p) = x∗. The transition table enumerated all possible combinations of state
and work tape content, which amounts to |Q| · |B| = 2n. Therefore, there exist
|Q|·5·3 = 15n different instructions and consequently (15n)2n different machines
with n states. For n = 2, 3 this amounts to around 106 and 1010 machines. All
those machines (n = 1 machines are trivial) were executed until 50 symbols
were written on the output tape or the maximum number of 400 time steps
was reached. All unique outputs were stored, amounting to 210 and 43295, for
n = 2, 3, respectively, and paired with their respective programs.
Table 1 depicts a small sample of the outputs. It may be interjected that
sequences generated by 2 and 3 state machines are not very “interesting”. How-
ever, the present work is the just initial step. Moreover, it is interesting to note
that even the 2 state machine shows non-repetitive patterns with an ever increas-
ing number of 1’s. In the 3 state machine patterns become quickly more involved
and require “intelligence” to detect the regularities in the patterns (try the last
one!). Consistently with the reasoning in the introduction, could it be that the
threshold complexity level of human intelligence is not far off from the sequence
complexity of 3 state machines, especially when the data presentation is not
comfortably tuned according to natural image statistics?
We suggest that these patterns paired with their respective programs con-
stitute a benchmark for partial progress in artificial general intelligence. If an
efficient algorithm can compress these patterns to small programs then it can be
claimed to be moderately intelligent. Modern compression algorithms, such as
Lempel-Ziv (on which the famous Zip compression is based), fail at compressing
those sequences, since the packed file size increases with sequence length (ergo
ri gets arbitrary small) while the size of the TM transition table is always the
same independently of sequence length.
Having generated all strings printed by two and three state programs the task
is to build an efficient algorithm compressing those strings back into a short
representation, not necessarily the original one though, but having a similar size
in terms of entropy.
As exemplified in Fig. 2 the present algorithm induces a recursive network of
function primitives using a sequence generated by a three state Turing machine.
Four function primitives were used that generate constant, alternating or incre-
mental sequences or a single number:
3.2 Results
Since our fixed-size Turing machine programs can create sequences of arbitrary
length, successful program induction is defined as induction of a program with
a fixed number of inputs to the function network. Further, to establish a bench-
mark, the entropy of the Turing machine programs is computed as follows. There
are (15n)2n machines with n states, hence the amount of information needed to
specify a TM program with n states is
which results in a program size of around 20 and 33 bits for two and three state
TMs, respectively. Since the induced programs encode the length l of the target
sequence and the TM programs do not, the information contained in the length
has to be subtracted from the induced program entropy (the bold and underlined
numbers in Fig. 2).
All sequences generated by all two state machines could be compressed
successfully. The average induced program size is μ2 = 22 bits with a stan-
dard deviation of σ2 = 23 bits. Because of the large number of three states
sequences, 200 sequences were randomly sampled. This way, 80 ± 4% of three
state sequences could be compressed successfully, with μ3 = 27 bits and σ3 = 20
bits. However, “unsuccessful” sequences could be compressed to some extent
as well, although the resulting program size was not independent of sequence
length. With sequences of length l = 100 the entropy statistics of “unsuccessful”
sequences are μ3 = 112 bits and σ3 = 28 bits. Given an average sequence entropy
of 146 bits, this constitutes an average compression factor of 1.3.
3
Python code and string/program pairs are available upon request.
Toward Tractable Universal Induction Through Recursive Program Learning 259
It may seem surprising that the average entropy of the induced programs is
even below the entropy of the TM programs (transition tables). However, since
not all rows of a transition table are guaranteed to be used when executing a
program, the actual shortest representation will not contain unused rows leading
to a smaller program size than 20 or 33 bits. The most important result is that
very short programs, with a size roughly around the Kolmogorov complexity,
have indeed been found for most sequences.
4 Discussion
The present approach has shown that it is possible to both sensibly define a
measure for partial progress toward AGI by measuring the complexity level up to
which all sequences can be induced and to build an algorithm actually performing
universal induction for most low complexity sequences. Our demonstrator has
been able to compress all sequences generated by two state Turing machines and
80% of the sequences generated by three state Turing machines.
The current demonstrator presents work in progress and it is already fairly
clear how to improve the algorithm such that the remaining 20% are also covered.
For example, there is no unique partition of a sequence into a set of concate-
nated primitives. The way, those partitions are selected should also be guided by
compressibility considerations, e.g. partition subsets of equal length should have
a higher prior chance to be analyzed further. Currently, the partition is imple-
mented in a non-principled way, which is one of the reasons for the algorithm
to run into dead ends. Remarkably, all reasons for stagnation seem to be those
aspects of the algorithm that are not yet guided by the compression principle.
This observation leads to the conjecture that the further extension and general-
ization of the algorithm may not require any additional class of measures, but a
“mere” persistent application of the compression principle.
One may object that the function primitives are hard-coded and may there-
fore constitute an obstacle for generalizability. However, those primitives can
also be resolved into a combination of elementary operations, e.g. the incre-
mental function can be constructed by adding a fixed number to the previous
sequence element, hence be itself represented by a function network. Therefore, it
is all a matter of flexible application and organization of the very same function
network and thus lies within the scope of the present approach.
The hope of this approach is that it may lead us on a path finally scaling up
universal induction to practically significant levels. It would be nice to backup
this hope by a time complexity measure of the present algorithm, which not avail-
able at present unfortunately, since this is work in progress. Further, it can not
be excluded that a narrow algorithm is also able to solve all low-complexity prob-
lems. In fact, the present algorithm is narrow as well since there are numerous
implicit assumptions about the composition of the sequence, e.g. the concate-
nation of outputs of several functions, no possibility to represent dependencies
within a sequence, or regularities between different inputs etc. Nevertheless,
since we represent general programs without specific a priori restrictions this
260 A. Franz
setup seems to be general enough to tackle such questions which will hopefully
result in a scalable system.
References
1. H-Prize, H.: http://prize.hutter1.net (accessed: May 17, 2015)
2. Cover, T.M., Thomas, J.A.: Elements of information theory. John Wiley & Sons
(2012)
3. Friedlander, D., Franklin, S.: LIDA and a theory of mind. In: 2008: Proceedings of
the First AGI Conference on Artificial General Intelligence, vol. 171, p. 137. IOS
Press (2008)
4. Hernandez-Orallo, J.: Beyond the turing test. Journal of Logic, Language and
Information 9(4), 447–466 (2000)
5. Hutter, M.: Universal Artificial Intelligence: Sequential Decisions based on Algo-
rithmic Probability, 300 pages. Springer, Berlin (2005). http://www.hutter1.net/
ai/uaibook.htm
6. Kitzelmann, E.: Inductive Programming: A Survey of Program Synthesis Tech-
niques. In: Schmid, U., Kitzelmann, E., Plasmeijer, R. (eds.) AAIP 2009. LNCS,
vol. 5812, pp. 50–73. Springer, Heidelberg (2010)
7. Kurzweil, R.: The singularity is near: When humans transcend biology. Penguin
(2005)
8. Legg, S., Hutter, M.: A collection of definitions of intelligence. In: Goertzel, B.,
Wang, P. (eds.) Advances in Artificial General Intelligence: Concepts, Architec-
tures and Algorithms. Frontiers in Artificial Intelligence and Applications, vol. 157,
pp. 17–24. IOS Press, Amsterdam (2007). http://arxiv.org/abs/0706.3639
9. Legg, S., Veness, J.: An Approximation of the Universal Intelligence Measure. In:
Dowe, D.L. (ed.) Solomonoff Festschrift. LNCS, vol. 7070, pp. 236–249. Springer,
Heidelberg (2013)
10. Levin, L.A.: Universal sequential search problems. Problemy Peredachi Informatsii
9(3), 115–116 (1973)
11. Li, M., Vitányi, P.M.: An introduction to Kolmogorov complexity and its applica-
tions. Springer (2009)
12. Looks, M., Goertzel, B.: Program representation for general intelligence. In: Proc.
of AGI, vol. 9 (2009)
13. Mahoney, M.V.: Text compression as a test for artificial intelligence. In:
AAAI/IAAI, p. 970 (1999)
14. Potapov, A., Rodionov, S.: Universal Induction with Varying Sets of Combinators.
In: Kühnberger, K.-U., Rudolph, S., Wang, P. (eds.) AGI 2013. LNCS, vol. 7999,
pp. 88–97. Springer, Heidelberg (2013)
15. Solomonoff, R.J.: A formal theory of inductive inference. Part I. Information and
Control 7(1), 1–22 (1964)
16. Veness, J., Ng, K.S., Hutter, M., Uther, W., Silver, D.: A Monte-Carlo AIXI
approximation. Journal of Artificial Intelligence Research 40(1), 95–142 (2011)
How Can Cognitive Modeling Benefit
from Ontologies? Evidence from the HCI
Domain
1 Introduction
Cognitive architectures like Soar [13] and ACT-R [2] have enabled researchers
to create sophisticated cognitive models of intelligent human behavior in lab-
oratory situations. One major drawback of cognitive modeling, especially from
the artificial general intelligence perspective, is that those models tend to be
very problem-specific. While a cognitive model of air traffic control may show
human-like intelligence in exactly that task, it is completely unable to perform
anything else, like solving a basic algebra problem. One major cause of the the-
matic narrowness of cognitive models is the restricted amount of knowledge that
those models have access to. In most cases, every single piece of information has
to be coded into the model by a researcher. This has been critized before, as
a human cognitive architecture should be able to maintain and integrate large
amounts of knowledge [3].
The link from human error research to artificial intelligence is not an obvious one.
We think of error as “window to the mind” [15]. Understanding why and when
humans err helps identifying the building blocks of intelligent human behavior.
Of special interest are errors of trained users. Using software systems after having
received some training is characterized by rule-based behavior [17]. Goals are
reached by using stored rules and procedures that have been learned during
training or earlier encounters with similar systems. While errors are not very
frequent on this level of action control, they are also pervasive and cannot be
eliminated through training [18].
Our focus on rule-based behavior allows a straightforward definition of error:
Procedural error means that the (optimal) path to the current goal is violated
by a non-optimal action. This can either be the addition of an unnecessary or
even hindering action, which is called an intrusion. Or a necessary step can be
left out, constituting an omission.
2 Experiment
The empirical basis for our model is provided by a usability study targeting a
kitchen assistant from an ambient assisted living context. The kitchen assistant
provides basic help during the preparation of meals by proposing recipes, calcu-
lating ingredients quantities, and by presenting interactive cooking instructions.
In order to assess the three ontology-based predictions stated above, we per-
formed a reanalysis of previously published data [12]. We are concentrating on
a single screen of the kitchen assistant that allows searching for recipes based
on predefined attributes. A screenshot of the search attribute form translated
to English is given in Fig. 1. The search attributes are grouped into national-
ity (French, German, Italian, Chinese) and type-of-dish (Main Course, Pastry,
Dessert, Appetizer). We excluded three health-related search options as they
were neither well represented in the experimental design, nor in the ontology.
For the eight remaining buttons, we identified the best matching concept from
the DBpedia ontology and use the number of links to it as measure of relevance
of the concept. As can be seen in Table 2, the buttons in the nationality group
are two to three magnitudes more relevant than the buttons in the type-of-dish
group. Our empirical analysis therefore unfolds around the differences between
those two groups.
2.1 Method
Twenty participants recruited on and off campus (15 women, 5 men, Mage =32.3,
SDage =11.9) took part in the experiment. Amongst other things, each participant
completed 34 recipe search tasks using the attribute selection screen (see Fig. 1).
264 M. Halbrügge et al.
One half of the tasks was done using a tablet computer, a large touch screen was
used for the other half. Instructions were given verbally by the experimenter.
All user actions were logged and videotaped for subsequent task execution time
and error analysis.
2.2 Results
We observed a total of 1607 clicks on the eight search attribute buttons under
investigation. The results for our three ontology-based predictions are as follows.
Execution Time. We exluded all clicks with substantial wait time (due to task
instruction or system response) from the analysis. The remaining 822 clicks still
differ in the necessary accuracy of the finger movement which is strongly related
to the time needed to perform the movement as formulated in Fitts’ law [9].
Individual differences in motor performance were large, and the device used also
had an effect on the click time. We therefore added subjects as random factor
with device and Fitts-slope within subject to the analysis. The click time was
analyzed using a linear mixed model [4], fixed effects were tested for significance
using the Satterthwaite approximation for degrees of freedom. Results are given
How Can Cognitive Modeling Benefit from Ontologies? 265
in Table 1. Besides the expected effects of Fitts’ law and device, we observed a
significant difference between the buttons for type-of-dish and nationality, with
type-of-dish needing approximately 100 ms longer.
Omissions and Intrusions. If those 100 ms are caused by lack of activation (as
predicted by the MFG), then this lack of activation should cause more omis-
sions for the type-of-dish group and more intrusions for the nationality group.
We observed 14 intrusions and 19 omissions during the handling of the search
attribute page (error rate 2.0%). Mixed logit models with subject as random fac-
tor showed no significant influence of the attribute group, but at least for omis-
sions, the effect points into the expected direction (omissions: z = 1.50, p = .133;
intrusions: z = −.05, p = .964). The omission rates for nationality and type-of-
dish are 0.8% and 1.6%, respectively.
Table 1. Linear mixed model results for the click time analysis
Factor Estimate t df p
ms
Fitts’ Index of Difficulty in bit 173 bit 4.95 22.4 < .001
Device (Tablet vs. Screen) 213 ms 4.38 24.3 < .001
Attr. Group (Dish vs. Nationality) 112 ms 2.47 611.3 .014
3 Cognitive Model
The cognitive model presented here has been created using ACT-R 6 [2]. It
has been shown to reproduce omission and intrusion errors for task-oriented vs.
device-oriented UI elements well [12]. A comparison of the model’s predictions
for the different search attribute buttons has not been done before.
Following the MFG, the model creates and memorizes a chain of subgoal
chunks when it receives task instructions through ACT-R’s auditory system. It
follows this chain of subgoals until either the goal is reached or memory gets
weak. In case of retrieval failure, the model reverts to a knowledge-in-the-world
266 M. Halbrügge et al.
strategy and randomly searches the UI for suitable elements. If it can retrieve
a subgoal chunk that corresponds to the currently attended UI element, this
subgoal is carried out and the cycle begins again.
The only declarative knowledge that is hard-coded into the model is that
some UI elements need to be toggled, while others need to be pushed. The
model interacts directly with the HTML interface of the kitchen assistant by the
means of ACT-CV [11].1
Table 2. Semantic mapping between UI and ontology. Inlink count obtained from
DBpedia 3.9 [14]. Subtitle-based word frequency (per 106 words) from [5]
Concept UI label DBpedia entry Inlink count per 106 links Word freq.
German Deutsch Deutschland 113621 2474.3 10.2
Italian Italienisch Italien 56105 1221.8 6.2
Chinese Chinesisch China 10115 220.3 8.2
French Französisch Frankreich 79488 1731.0 17.4
Main Course Hauptgericht Hauptgericht 35 0.8 0.8
Appetizer Vorspeise Vorspeise 72 1.6 1.5
Dessert Nachtisch Dessert 193 4.2 6.5
Pastry Backwaren Gebäck 165 3.6 0.3
In order to assess how cognitive modeling can benefit from ontologies, we took
the barely knowledgeable model and added applicable pieces of information from
Wikipedia to its declarative memory. We propose semantic priming from long-
living general concepts to the short-lived subgoal chunks that are created by the
model when it pursues a goal.
How much priming can we expect, based on the information that is available
within DBpedia? We are using the inlink count as measure of the relevance of a
concept. In ACT-R, this needs to be translated into an activation value of the
chunk that represents the concept (i.e., Wikipedia article). Temporal decay of
activation is modeled in ACT-R using the power law of forgetting [2]. Salvucci
[19] has applied this law to the concepts within DBpedia, assuming that they
have been created long ago and the number of inlinks represents the number of
presentations of the corresponding chunk. The base activation B can be deter-
mined from inlink count n as follows
B = ln(2n) (1)
While we agree with Salvucci’s rationale, deriving the activation from raw inlink
counts is a little too straightforward in our eyes. Numerically, it creates very
1
See [12] for a more detailed description. The source code of the model is available
for download at http://www.tu-berlin.de/?id=135088.
How Can Cognitive Modeling Benefit from Ontologies? 267
high activation values. And as the total number of entries varies between the
language variations of DBpedia, switching language (or ontology) would mean
changing the general activation level.2 In the special case of our model, the use
of (1) caused erratic behavior because the high amount of ontology-based activa-
tion overrode all other activation processes (i.e., activation noise and mismatch
penalties for partial matching of chunks). We therefore introduced a small fac-
tor c that scales the inlink count down to usable values. Together with ACT-R’s
minimum activation constant blc, this results in the following equation
How is the semantic priming to subgoal chunks finally achieved? The declar-
ative memory module of ACT-R 6 only allows priming from buffers (“working
memory”) to declarative (“long term”) memory. We therefore introduced a hook
function that modifies the activation of every subgoal chunk whenever it enters
long term memory according to the general concept that is related to the goal
chunk.
Table 3. Correlations between the empirical data and the model predictions
2
Dependent Variable rbaseline rpriming Δr Rpriming RMSEprim.
Execution time (residual) -.218 .684 .758 .468 78 ms
Omission rate -.390 .640 .824 .410 .027
Intrusion rate -.654 .511 .873 .261 .011
Fig. 2. Click time residuals after Fitts’ law regression, intrusion and omission rates of
the cognitive model with and without priming from the DBpedia concepts. Negative
slopes of the regression line mean worse than chance predictions. Positive slopes mean
better than chance predictions. Squares denote buttons of group “nationality”, triangles
denote “type of dish”.
concepts holds, then this result underlines the benefits of adding ontologies to
cognitive architectures. A closer look at Fig. 2 reveals that the correlation for
intrusions is highly dependent of two outliers, the results should therefore be
interpreted with care.
execution time and omissions mainly lies within the ontology, intrusions can only
be explained by the combination of cognitive model and ontology, highlighting
the synergy between both.
To our knowledge, this is the first time that Salvucci’s approach for adding
world knowledge to a cognitive architecture [19] is empirically validated. The
practical development of the model showed that the activation equation proposed
by Salvucci, while being theoretically sound, creates hurdles for the combination
of world knowledge with existing cognitive models. Therefore, we introduced a
constant scaling factor to the ontology-based activation computation. This goes
in line with the common practice in psycholinguistics to use standardized values
that are independent of the corpus in use. The factor chosen here helped to
keep the influence of the ontology on subgoal activation at par with the other
activation sources applied (i.e., activation noise and partial matching).
It is also informative to compare our approach to research on information
foraging, namely SNIF-ACT [10]. This system uses activation values that are
estimated from word frequencies in online text corpora, which would lead to
general hypotheses similar to the ones given above. But beyond this, a closer
look unveils interesting differences to the DBpedia approach. While word fre-
quency and inlink count are highly correlated (r=.73 in our case, see Table 2),
the word frequency operationalization yields much smaller differences between
the nationality vs. type-of-dish groups. Frequency based-approaches also need to
remove highly frequent, but otherwise irrelevant words beforehand (e.g., “the”,
“and”). In Wikipedia, this relevance filter is already built into the system and
no such kind of preprocessing is necessary. Empirically, we obtained inconclu-
sive results when using word frequency in a large subtitle corpus [5] instead of
Wikipedia inlink count as concept activation estimate.
While the combination of cognitive model and ontology provides some stim-
ulating results, it also has some downsides and limitations. First of all, the small
number of observed errors leads to much uncertainty regarding the computed
intrusion and omission rates. Especially in case of intrusions, the empirical basis
is rather weak. The goodness-of-fit is highly dependent on two outliers. While
one of these matches the high-level predictions given in the introduction (“Ger-
man” being more prone to intrusions), the other one points towards a conceptual
weakness of the model (“Pastry” showing many intrusions in the empirical data
although having just a few inlinks). The “Pastry” intrusions happened dur-
ing trials with the target recipes baked apples (“Bratäpfel”) and baked bananas
(“Gebackene Bananen”). One could speculate that those recipes have primed the
type-of-dish attribute that is linked to baking. This kind of semantic priming is
currently not covered by our system. We are planning to integrate more sophis-
ticated models of long-term memory [20] to allow dynamic priming between
concepts as well.
Besides the conceptual findings, our ontology-backed cognitive model also
provides benefits to applied domains. With its ability to interact with arbitrary
HTML applications, the model could be used for automatic usability evaluation
of user interfaces. Its ability to predict omissions and intrusions could be used to
spot badly labeled UI elements during early development stages.
270 M. Halbrügge et al.
References
1. Altmann, E.M., Trafton, J.G.: Memory for goals: An activation-based model. Cog-
nitive Science 26(1), 39–83 (2002)
2. Anderson, J.R., Bothell, D., Byrne, M.D., Douglass, S., Lebiere, C., Qin, Y.: An
integrated theory of the mind. Psychological Review 111(4), 1036–1060 (2004)
3. Anderson, J.R., Lebiere, C.: The Newell test for a theory of cognition. Behavioral
and Brain Sciences 26(05), 587–601 (2003)
4. Bates, D., Maechler, M., Bolker, B., Walker, S.: lme4: Linear mixed-effects models
using Eigen and S4 (2013), r package version 1.0-5
5. Brysbaert, M., Buchmeier, M., Conrad, M., Jacobs, A.M., Bölte, J., Böhl, A.:
The word frequency effect: A review of recent developments and implications for
the choice of frequency estimates in German. Experimental Psychology 58(5), 412
(2011)
6. Douglass, S., Ball, J., Rodgers, S.: Large declarative memories in ACT-R. Tech.
rep., Manchester, UK (2009)
7. Emond, B.: WN-LEXICAL: An ACT-R module built from the WordNet lexical
database. In: Proceedings of the Seventh International Conference on Cognitive
Modeling (2006)
8. Fellbaum, C.: Wordnet. In: Poli, R., Healy, M., Kameas, A. (eds.) Theory
and Applications of Ontology: Computer Applications, pp. 231–243. Springer,
Dordrecht (2010)
9. Fitts, P.M.: The information capacity of the human motor system in controlling
the amplitude of movement. Journal of Experimental Psychology 47(6), 381–391
(1954)
10. Fu, W.T., Pirolli, P.: SNIF-ACT: A cognitive model of user navigation on the world
wide web. Human-Computer Interaction 22, 355–412 (2007)
11. Halbrügge, M.: ACT-CV: Bridging the gap between cognitive models and
the outer world. In: Brandenburg, E., Doria, L., Gross, A., Günzlera, T.,
Smieszek, H. (eds.) Grundlagen und Anwendungen der Mensch-Maschine-
Interaktion, pp. 205–210. Universitätsverlag der TU Berlin, Berlin (2013)
12. Halbrügge, M., Quade, M., Engelbrecht, K.P.: A predictive model of human error
based on user interface development models and a cognitive architecture. In: Taat-
gen, N.A., van Vugt, M.K., Borst, J.P., Mehlhorn, K. (eds.) Proceedings of the
13th International Conference on Cognitive Modeling, pp. 238–243. University of
Groningen, Groningen (2015)
13. Laird, J.: The Soar cognitive architecture. MIT Press, Cambridge (2012)
14. Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N.,
Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: DBpedia - a
large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web
Journal (2014)
15. Norman, D.A.: Slips of the mind and an outline for a theory of action. Tech. rep.,
Center for Human Information Processing, San Diego, CA (1979)
16. Oltramari, A., Lebiere, C.: Extending Cognitive Architectures with Semantic
Resources. In: Schmidhuber, J., Thórisson, K.R., Looks, M. (eds.) AGI 2011.
LNCS, vol. 6830, pp. 222–231. Springer, Heidelberg (2011)
How Can Cognitive Modeling Benefit from Ontologies? 271
17. Rasmussen, J.: Skills, rules, and knowledge; signals, signs, and symbols, and other
distinctions in human performance models. IEEE Transactions on Systems, Man
and Cybernetics 13, 257–266 (1983)
18. Reason, J.: Human Error. Cambridge University Press, New York (1990)
19. Salvucci, D.D.: Endowing a cognitive architecture with world knowledge. In:
Bello, P., Guarini, M., McShane, M., Scassellati, B. (eds.) Proc. CogSci 2014,
pp. 1353–1358 (2014)
20. Schultheis, H., Barkowsky, T., Bertel, S.: LTM C - an improved long-term memory
for cognitive architectures. In: Proceedings of the Seventh International Conference
on Cognitive Modeling, pp. 274–279 (2006)
C-Tests Revisited: Back and Forth
with Complexity
José Hernández-Orallo(B)
1 Introduction
A first test using algorithmic information theory (AIT) was the C-test [2,9],
where the goal was to find a continuation of a sequence of letters, as in some IQ
tasks , and in the spirit of Solomonoff’s inductive inference problems: “given an
initial segment of a sequence, predict its continuation” (as quoted in [12, p.332]).
Levin’s Kt complexity (see, e.g., [12, sec.7.5]) was used to calculate the difficulty
of a sequence of letters. The performance was measured as an aggregated value
over a range of difficulties:
H
N
1
I(π) he Hit(π, xi,h ) (1)
i=1
N
h=1
where π is the subject, the difficulties range from h = 1 to H and there are N
sequences xi,k per difficulty h. The function hit returns 1 if π is right with the
continuation and 0 otherwise. If e = 0 we have that all difficulties have the same
weight. The N sequences per difficulty were chosen (uniformly) randomly.
This contrasts with a more common evaluation in artificial intelligence based
on average-case performance according to a probability of problems or tasks:
Ψ (π) p(μ) · E[R(π, μ)] (2)
μ∈M
2 Background
AI evaluation has been performed in many different ways (for a recent account
of AI evaluation, see [5]), but a common approach is based on averaging perfor-
mance on a range of tasks, as in eq. 2.
274 J. Hernández-Orallo
h = 9 : a, d, g, j, ... Answer: m
h = 12 : a, a, z, c, y, e, x, ... Answer: g
h = 14 : c, a, b, d, b, c, c, e, c, d, ... Answer: d
Fig. 1. Several series of different difficulties 9, 12, and 14 used in the C-test [2]
In what follows, we will focus on the approaches that are based on AIT. As
mentioned above, the first intelligence test using AIT was the so-called C-test
[2,9]. Figure 1 shows examples of sequences that appear in this test. The diffi-
culty of each sequence was calculated as Levin’s Kt, a time-weighted version of
Kolmogorov complexity K. Some preliminary experimental results showed that
human performance correlated with the absolute difficulty (h) of each exercise
and also with IQ test results for the same subjects ([2,9]). They also show a clear
inverse correlation of results with difficulty (see Figure 2). HitRatio is defined as
the inner sum of eq. 1:
N
1
HitRatio(π, h) Hit(π, xi,h ) (3)
i=1
N
0.4
0.2
0.0
7 8 9 10 11 12 13 14
Fig. 2. Results obtained by humans on task of different difficulty in the C-test [2]
And now we use proposition 4 in [8] that decomposes it. First, we define partial
results for a given difficulty h as follows:
Ψh (π, M, p) p(μ|h) · E[R(π, μ)] (5)
μ∈M,(μ)=h
where p(h) is a discrete probability function for eq. 6. Note that equations 4, 5
and 6 are generalisations, respectively, of equations 2, 3 and 1.
3 Difficulty Functions
Fig. 3. Three approaches to aggregate the results for a set of tasks. Top (A) shows the
classical approach of choosing a probability for the task, according to the properties
of the task. Middle (B) shows the approach where we arrange tasks by difficulty, and
the notion of difficulty is derived from the properties of the policy. Bottom (C) shows
a variation of B where we derive acceptable policies for a given difficulty and then
generate tasks for each policy. Between square brackets some choices we examine in
this paper.
Given the above, we are now ready for a few properties about difficulty functions.
278 J. Hernández-Orallo
We can say a few words about the cases where a truly random agent (choosing
actions at random) gives an acceptable policy for an environment. If this is the
case, we intuitively consider the environment easy. So, in terms, of L, we consider
random agents to be simple, and goes well with our consideration of stochastic
agents and environments having access to some true source of randomness.
Figure 4 (left) shows the distribution of response according to L(π), but
setting = 0.9. We see that the simplest -acceptable policy has L = 12.
Proof. For every policy π, if a task μ has a difficulty [] (μ) > L(π), it means
that π is not -acceptable, because otherwise the difficulty would be L(π) and
not h. Consequently, A[] (π, μ) = 0 for all μ of difficulty [] (μ) > L(π). It is
sufficient to take h > L(π) for every π to see that is strongly bounded.
C-Tests Revisited: Back and Forth with Complexity 279
This is what we see in Fig. 4 (right), where L(π) = 80. With [] in eq. 10,
we can ensure that the values are going to be 0 from h = 80 on.
This may not be the case for other difficulty functions. We can imagine a
situation where the curve never converges to zero. For instance, if the difficulty
function is decoupled from resources (length and/or steps) of the acceptable
policies or we do not use the notion of -acceptability then we cannot avoid that
a very simple policy could eventually score well in a problem with very high
difficulty. This would be counter-intuitive, as if there is a simple policy for a
difficult problem, the latter should not be considered difficult any more.
From here, we can plug it into eq. 9 for the discrete case:
∞
1
Ψw[] (π, M, pM ) = w(h) 2−K(μ) · A[] (π, μ) (11)
ν(h)
h=0 μ∈M,[] (μ)=h
Note that the above is going to be bounded independently of the difficulty function
if w is a probability distribution. Also notice that ν(h)
1
is on the outer sum, and
that ν(h) is lower than 1, so the normalisation term is actually greater than 1.
And if we use any of the difficulty functions in equations 10 we can choose
[]
w(h) = 1 and Ψ⊕ (π, M, pM ) is bounded.
280 J. Hernández-Orallo
One of things of the use of equation 10 is that the number of acceptable policies
per difficulty is finite. This is what happened in the C-test and that is the reason
why a uniform distribution could be used for the inner sum. We could try to
decompose the inner sum by using the policy and get the probability of the task
given the policy.
The interpretation would be as follows: for each difficulty value we aggregate
all the acceptable policies with size equal to that difficulty uniformly and for
each of these policies all the environments where each policy is acceptable with
a universal distribution. This extra complication with respect to eq. 11 can only
be justified if we generate environments and agents and we check them as we
populate P airs, as a way of constructing a test more easily.
6 Discussion
We have gone from eq. 1 taken from C-test to eq. 9. We have seen that difficulties
allow for a more detailed analysis of what happens for a given agent, depending
on whether it succeeds at easy or difficult tasks. For some difficulty functions, we
do not even need to determine the weight for each difficulty and just calculate
C-Tests Revisited: Back and Forth with Complexity 281
the area, as an aggregated performance for all difficulties, and cutting the tail
at some maximum difficulty for practical reasons.
The important thing is that now we do not need to set an a priori distribution
for all tasks p(μ), but just a conditional distribution p(μ|h). Note that if we set
a high h we have the freedom to find simple task that creates that difficulty.
Actually, the choice of p(μ|h) as a universal distribution still depends on the
reference machine and can set most of the probability mass on smaller tasks,
but as it is conditional on h, all trivial, dead or simply meaningless tasks have
usually very extreme values of h (very low or infinite). That means that there is
a range of intersting difficulties, discarding very small values of h and very large
values of h. Figure 2 is a nice example of this, where only difficulties between
1 and 8 were used, and we see also that h = 1 and h > 7 are not really very
discriminating. The bulk of the testing effort must be performed in this range.
Note that the middle (B) and bottom (C) decompositions in Figure 3 can
be done in such a way that the original pM (μ) is preserved, if w(h) is not taken
uniform but slowly decaying. But we can just start with option B or C directly.
This is the alternative in this paper, which we think has several advantages in
terms of agent evaluation, the construction of tests and AGI development, as
we can focus on those tasks of appropriate difficulty and even define adaptive
tests easily. Having said this, we have an infinite set for pM (μ|h) and pM (μ|π ),
and a universal distribution is the appropriate for both, so that Occam’s razor
is still very present. This means that both B and C (using a slowly decaying
w(h)) would lead to a computable aggregated distribution pM (μ), which can be
approximated as a universal distribution, highlighting that universal intelligence
is rather a schema for definitions rather than a specific definition.
References
1. Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environ-
ment: An evaluation platform for general agents. Journal of Artificial Intelligence
Research 47, 253–279 (2013)
2. Hernández-Orallo, J.: Beyond the Turing Test. J. Logic, Language & Information
9(4), 447–466 (2000)
3. Hernández-Orallo, J.: Computational measures of information gain and reinforce-
ment in inference processes. AI Communications 13(1), 49–50 (2000)
4. Hernández-Orallo, J.: On the computational measurement of intelligence factors.
In: Meystel, A. (ed.) Performance metrics for intelligent systems workshop, pp.
1–8. National Institute of Standards and Technology, Gaithersburg (2000)
5. Hernández-Orallo, J.: AI evaluation: past, present and future (2014). arXiv preprint
arXiv:1408.6908
6. Hernández-Orallo, J.: On environment difficulty and discriminating power.
Autonomous Agents and Multi-Agent Systems, 1–53 (2014). http://dx.doi.org/
10.1007/s10458-014-9257-1
282 J. Hernández-Orallo
1 Introduction
In 1948 Edward Tolman [36] reported on a series of behavioral experiments with
rats that led him to hypothesize that the animals had to make use of an inter-
nal, map-like representation of the environment. This idea, which came to be
known as the cognitive map hypothesis, was highly controversial at the time.
Accordingly, the discovery of hippocampal place cells by O’Keefe and Dostro-
vsky [25,27] in the 1970s was met with much excitement as place cells were the
first possible direct evidence for such a representation of the environment in the
brain [26]. Since then a variety of neurons that exhibit spatially correlated activ-
ity were found in the parahippocampal-hippocampal region [11,13,15,32,35]. In
particular the recent discovery of grid cells [11,13] in the entorhinal cortex of rat
strengthened the idea that the involved neuronal structures constitute a kind
of metric for space [23]. Grid cells are neurons that exhibit spatially correlated
activity similar to that of place cells with the distinct difference that grid cells
possess not just one but multiple, discrete firing fields that are arranged in a
regular, hexagonal grid that spans the entire environment (Fig. 1a). Located
just one synapse upstream of the hippocampus grid cells are assumed to be an
important source of spatial information to place cells [29,33]. In particular, grid
cells are generally considered to be a central part of a path integration system
as pointed out by Burgess [5]: “There has been a surprising rapid and general
c Springer International Publishing Switzerland 2015
J. Bieger (Ed.): AGI 2015, LNAI 9205, pp. 283–292, 2015.
DOI: 10.1007/978-3-319-21365-1 29
284 J. Kerdels and G. Peters
agreement that the computational problem to which grid cells provide a solution
is “path integration” within an allocentric reference frame.” This consensus is
reflected by the fact that all computational models of grid cells proposed so far
(except [19]) incorporate mechanisms of path integration as integral parts to
explain the hexagonal firing patterns of grid cells. Although existing computa-
tional models cover a wide range of possible mechanisms and focus on different
aspects of grid cell activity [2,4,12,23,24,38], the models share the common app-
roach of explaining grid cells and their behavior as functional components within
the cognitive map hypothesis.
Complementary to this common approach this paper presents an alternative
grid cell model that treats the observed grid cell behavior as an instance of a
more abstract, general principle by which neurons in the higher-order parts of
the cortex process information.
2 Model Description
(a) (b)
Fig. 1. Comparison of a grid cell firing pattern with a growing neural gas (GNG)
network. (a) Typical visualization of a grid cell’s firing pattern as introduced by Hafting
et al. [13]. Left: trajoctory (black lines) of a rat in a circular environment with marked
locations (red dots) where the observed grid cell fired. Middle: color-coded firing rate
map of the observed grid cell ranging from dark blue (no activity) to red (maximum
activity). Right: color-coded spatial autocorrelation of the firing rate map ranging
from blue (negativ correlation, -1) to red (positive correlation, +1) highlighting the
hexagonal structure of the firing pattern. Figure from Moser et al. [23]. (b) Example of a
GNG network with 25 units that was fed with inputs from a uniformly distributed, two-
dimensional, circular input space. The resulting network forms an induced Delaunay
triangulation of the input space.
(a) (b)
Fig. 2. Illustration of the proposed two-layer model. (a) The top layer is represented
by three units (red, green, blue) connected by dashed lines. The associated sets of
connected nodes in the bottom layer are illustrated by corresponding colors. (b) Top
view on the input space partition induced by the bottom layer sets of nodes.
propose to describe a group of grid cells by a two-layer model1 . The top layer
contains a set of connected units that each represent an individual grid cell. Asso-
ciated with each top layer unit is a set of connected nodes in the bottom layer
representing the set of input patterns that are recognized by the dendritic tree of
the grid cell (Fig. 2a). To this end, each node in the bottom layer possesses a pro-
totype vector that represents the center of a local input space region. Applying a
form of competitive hebbian learning within each set of bottom layer nodes (bot-
tom layer competition) arranges the nodes in a triangular pattern that covers
the entire input space. In addition, competition across the sets of bottom layer
nodes (top layer competition) arranges the different triangular patterns in such
a way that they share a common orientation and spacing. Furthermore, the top
layer competition will also spread the individual triangular patterns uniformly
across the input space (Fig. 2b).
Formally, the proposed model consists of a set of units u ∈ U and a set of
connections c ∈ C located in the top layer, as well as a set of parameters θ. Each
connection c is described by a tuple:
c := (P, t) ∈ C, P ⊆ U ∧ |P | = 2, t ∈ N,
with units p ∈ P linked by the connection and the connection’s age t. Each
unit u is described by a tuple:
u := (V, E) ∈ U,
with the prototype w, the accumulated error aerr , and a refractory factor ref .
Each edge e is described by a tuple:
e := (S, t) ∈ E, S ⊆ V ∧ |S| = 2, t ∈ N,
with nodes s ∈ S linked by the edge and the edge’s age t. The set of parameters θ
consists of:
The model is initialized with Mt fully connected top level units u each starting
with two nodes v that have random prototype vectors as well as accumulated
errors and refractory factors set to zero. An input ξ ∈ Rn at time t is processed
as follows:
st+1 t t t t
1 w := s1 w + b (1 − s1 ref ) (ξ − s1 w) ,
st+1 t t t t
n w := sn w + n (1 − s1 ref ) (ξ − sn w) ,
with tt tt
b.end r n.end r
tb := b.start b.start , tn := n.start n.start ,
The existing edge between nodes j and k is removed and edges between
nodes j and v as well as nodes v and k are added. The accumulated
288 J. Kerdels and G. Peters
errors of nodes j and k are decreased and the accumulated error of the
new node v is set to the decreased accumulated error of node j:
• Finally, decrease the accumulated error of all nodes as well as their refrac-
tory factors:
Δvaerr = −β vaerr ,
Δvref = −γ vref , ∀v ∈ V.
– Identify the two units u1 and u2 whose BMUs were closest to input ξ.
– Increment the age t of all connections to u1 by one.
– If no connection between u1 and u2 exists, create one.
– Reset the age t of the connection between u1 and u2 to zero.
– Adapt the BMUs of u1 and u2 as well as their neighbors:
u1 st+1 t t
1 w := u1 s1 w + b (ξ − u1 s1 w) ,
u1 st+1 t t
n w := u1 sn w + b r (ξ − u1 sn w) ,
u2 st+1 t t
1 w := u2 s1 w + n (ξ − u2 s1 w) ,
u2 st+1 t t
n w := u2 sn w + n r (ξ − u2 sn w) .
Fig. 3. Exemplary rate and autocorrelation maps of simulated grid cells. (a,b) Simu-
lated grid cell with 20 bottom layer nodes. (c,d) Simulated grid cell with 16 bottom
layer nodes.
et al. [31] of a rat foraging for food in a square environment is used. Figure 3
shows exemplary rate and autocorrelation maps of two top layer units with either
16 or 20 bottom layer nodes exhibiting grid like firing patterns. In this example,
the following set of parameters θ was used:
b = 0.05, n = 0.005, r = 0.001, τt = 1000, Mt = 50,
b.start = 0.05, b.end = 0.0005, n.start = 0.01, n.end = 0.0001, τb = 300,
Mb = {16, 20} , λ = 1000, α = 0.5, β = 0.0005, γ = 0.2,
tr = 500000.
4 Discussion
The proposed model describes a putative general principle by which neurons in
higher-order parts of the cortex process information of arbitrary input spaces:
Each neuron aspires to represent its input space as well as possible while
being in competition with its peers.
(a) (b)
Fig. 4. Comparison of strategies to identify specific subregions in input space. (a) Mul-
tiple perceptrons successively partition the input space to identify a specific subregion
(the middle triangle). (b) Top layer units from separate grid cell groups with different
spatial scales identify a specific subregion by coinciding in that region.
Killian et al. [17] report on entorhinal neurons with grid-like firing patterns in
response to saccadic eye movements.
References
1. Azizi, A.H., Schieferstein, N., Cheng, S.: The transformation from grid cells to
place cells is robust to noise in the grid pattern. Hippocampus 24(8), 912–919
(2014)
2. Barry, C., Burgess, N.: Neural mechanisms of self-location. Current Biology 24(8),
R330–R339 (2014)
3. de Berg, M., Cheong, O., van Kreveld, M., Overmars, M.: Computational Geome-
try: Algorithms and Applications. Springer (2008)
4. Burak, Y.: Spatial coding and attractor dynamics of grid cells in the entorhinal
cortex. Current Opinion in Neurobiology 25, 169–175 (2014), theoretical and com-
putational neuroscience
5. Burgess, N.: Grid cells and theta as oscillatory interference: Theory and predictions.
Hippocampus 18(12), 1157–1174 (2008)
6. Chen, T.W., Wardill, T.J., Sun, Y., Pulver, S.R., Renninger, S.L., Baohan, A.,
Schreiter, E.R., Kerr, R.A., Orger, M.B., Jayaraman, V., Looger, L.L., Svoboda,
K., Kim, D.S.: Ultrasensitive fluorescent proteins for imaging neuronal activity.
Nature 499(7458), 295–300 (2013)
7. Delaunay, B.: Sur la sphère vide. Bull. Acad. Sci. URSS 1934(6), 793–800 (1934)
8. Franzius, M., Vollgraf, R., Wiskott, L.: From grids to places. Journal of Computa-
tional Neuroscience 22(3), 297–299 (2007)
9. Fritzke, B.: Unsupervised ontogenetic networks. In: Fiesler, E., Beale, R. (eds.)
Handbook of Neural Computation. Institute of Physics Publishing and Oxford
University Press (1996)
10. Fritzke, B.: A growing neural gas network learns topologies. In: Advances in Neural
Information Processing Systems, vol. 7, pp. 625–632. MIT Press (1995)
11. Fyhn, M., Molden, S., Witter, M.P., Moser, E.I., Moser, M.B.: Spatial representa-
tion in the entorhinal cortex. Science 305(5688), 1258–1264 (2004)
12. Giocomo, L., Moser, M.B., Moser, E.: Computational models of grid cells. Neuron
71(4), 589–603 (2011)
13. Hafting, T., Fyhn, M., Molden, S., Moser, M.B., Moser, E.I.: Microstructure of a
spatial map in the entorhinal cortex. Nature 436(7052), 801–806 (2005)
14. Jia, H., Rochefort, N.L., Chen, X., Konnerth, A.: Dendritic organization of sensory
input to cortical neurons in vivo. Nature 464(7293), 1307–1312 (2010)
15. Jung, M.W., McNaughton, B.L.: Spatial selectivity of unit activity in the hip-
pocampal granular layer. Hippocampus 3(2), 165–182 (1993)
16. Kerdels, J., Peters, G.: A computational model of grid cells based on dendritic
self-organized learning. In: Proceedings of the International Conference on Neural
Computation Theory and Applications (2013)
17. Killian, N.J., Jutras, M.J., Buffalo, E.A.: A map of visual space in the primate
entorhinal cortex. Nature 491(7426), 761–764 (11 2012)
18. Kohonen, T.: Self-organized formation of topologically correct feature maps. Bio-
logical Cybernetics 43(1), 59–69 (1982)
19. Kropff, E., Treves, A.: The emergence of grid cells: Intelligent design or just adap-
tation? Hippocampus 18(12), 1256–1269 (2008)
292 J. Kerdels and G. Peters
20. Martinetz, T.M., Schulten, K.: Topology representing networks. Neural Networks
7, 507–522 (1994)
21. McNaughton, B.L., Battaglia, F.P., Jensen, O., Moser, E.I., Moser, M.B.: Path
integration and the neural basis of the ‘cognitive map’. Nat. Rev. Neurosci. 7(8),
663–678 (2006)
22. Mhatre, H., Gorchetchnikov, A., Grossberg, S.: Grid cell hexagonal patterns formed
by fast self-organized learning within entorhinal cortex (published online 2010).
Hippocampus 22(2), 320–334 (2010)
23. Moser, E.I., Moser, M.B.: A metric for space. Hippocampus 18(12), 1142–1156
(2008)
24. Moser, E.I., Moser, M.B., Roudi, Y.: Network mechanisms of grid cells. Philosoph-
ical Transactions of the Royal Society B: Biological Sciences 369(1635) (2014)
25. O’Keefe, J., Dostrovsky, J.: The hippocampus as a spatial map. preliminary evi-
dence from unit activity in the freely-moving rat. Brain Research 34(1), 171–175
(1971)
26. O’Keefe, J., Nadel, L.: The Hippocampus as a Cognitive Map. Oxford University
Press, Oxford (1978)
27. O’Keefe, J.: Place units in the hippocampus of the freely moving rat. Experimental
Neurology 51(1), 78–109 (1976)
28. Pilly, P.K., Grossberg, S.: How do spatial learning and memory occur in the brain?
coordinated learning of entorhinal grid cells and hippocampal place cells. J. Cog-
nitive Neuroscience, 1031–1054 (2012)
29. Rolls, E.T., Stringer, S.M., Elliot, T.: Entorhinal cortex grid cells can map to
hippocampal place cells by competitive learning. Network: Computation in Neural
Systems 17(4), 447–465 (2006)
30. Rosenblatt, F.: The perceptron: A probabilistic model for information storage and
organization in the brain. Psychological Review 65(6), 386–408 (1958)
31. Sargolini, F., Fyhn, M., Hafting, T., McNaughton, B.L., Witter, M.P., Moser,
M.B., Moser, E.I.: Conjunctive representation of position, direction, and velocity
in entorhinal cortex. Science 312(5774), 758–762 (2006)
32. Solstad, T., Boccara, C.N., Kropff, E., Moser, M.B., Moser, E.I.: Representation of
geometric borders in the entorhinal cortex. Science 322(5909), 1865–1868 (2008)
33. Solstad, T., Moser, E.I., Einevoll, G.T.: From grid cells to place cells: A mathe-
matical model. Hippocampus 16(12), 1026–1031 (2006)
34. Stensola, H., Stensola, T., Solstad, T., Froland, K., Moser, M.B., Moser, E.I.: The
entorhinal grid map is discretized. Nature 492(7427), 72–78 (2012)
35. Taube, J., Muller, R., Ranck, J.: Head-direction cells recorded from the postsubicu-
lum in freely moving rats. i. description and quantitative analysis. The Journal of
Neuroscience 10(2), 420–435 (1990)
36. Tolman, E.C.: Cognitive maps in rats and men. Psychological Review 55, 189–208
(1948)
37. Tóth, L.: Lagerungen in der Ebene: auf der Kugel und im Raum. Die
Grundlehren der Mathematischen Wissenschaften in Einzeldarstellungen mit
besonderer Berücksichtigung der Anwendungsgebiete. Springer (1972)
38. Welinder, P.E., Burak, Y., Fiete, I.R.: Grid cells: The position code, neural network
models of activity, and the problem of learning. Hippocampus 18(12), 1283–1300
(2008)
39. Zhang, K.: Representation of spatial orientation by the intrinsic dynamics of
the head-direction cell ensemble: a theory. The Journal of Neuroscience 16(6),
2112–2126 (1996)
Programming Languages and Artificial General
Intelligence
1 Introduction
For many years researches tried to create programming languages for specific
areas of research. In the history of AI there were many attempts to create lan-
guage that would be the best for artificial intelligence. The two main examples
are Lisp and Prolog. First one is particularly interesting, because some code can
be considered as data in very natural way. Second one contains powerful inference
engine based on Horn logic as part of the language. Since that time significant
progress have been made in theory of programming languages and many bril-
liant languages like Haskell were created. Unfortunately, many of achievements
in this field are not yet widely used neither artificial intelligence, nor mainstream
software development. This paper is related to two advanced techniques: proba-
bilistic programming and partial evaluation. Importance of this techniques will
be briefly discussed in this paper. These ideas can be considered as unconven-
tional and not widely used outside of particular areas of research. Incorporation
of such techniques to programming language may have considerable impact on
artificial general intelligence.
The next section is about core language design, programming paradigm and
basic features like pattern matching. Choice between domain-specific embedded
language and full-featured general purpose language is also discussed.
One of the main issue need to be discussed is application of probabilistic
programming to AGI. Generative models can be very useful in knowledge repre-
sentation, as well as some other aspects of cognitive architectures. Probabilistic
programming is discussed in Section 3.
c Springer International Publishing Switzerland 2015
J. Bieger (Ed.): AGI 2015, LNAI 9205, pp. 293–300, 2015.
DOI: 10.1007/978-3-319-21365-1 30
294 V. Khudobakhshov et al.
In this section we discuss main choices and tradeoffs one faces during program-
ming language design. Our goal is to create programming languages with best
capabilities for artificial general intelligence. We started from the following:
1. Turing-completeness
2. General purpose
3. Ease of use for automatic program transformation, generation and search
4. Mixed-paradigm (the language must support functional and imperative
style)
5. Based on existent language to effectively adopt user experience and legacy
code with minimum changes
6. Easily extendible syntax
7. Simplicity
her own favorite language and provide very high level of extensibility. Neverthe-
less, embedded language obliges to use this particular general purpose language
in which DSL is embedded. Presented language is implemented in Haskell as
general purpose.
Presented language is based on Scheme language with some useful exten-
sions. Bread and butter of modern functional programming is pattern matching.
In Scheme and Clojure this functionality provided by extended library. In this
language we incorporate some syntactic ideas from Haskell to provide pattern
matching in core language. Symbol : used to match cons and underscore as
wildcard symbol:
In this example pattern with dynamic properties has been used. Second pat-
tern contains variable x which is used as argument of function count. Which means
that if first element of lst equals to x, then we will have a match. Moreover,
repeated variables are allowed (in this case, expression will be evaluated to 2):
(match ’(2 3 2)
((a : b : a : ()) a)
(_ 0))
3 Probablistic Programming
According to [3], probabilistic programming languages unify technique of clas-
sical models of computation with the representation of uncertain knowledge. In
spite of the fact that the idea of probabilistic programming is quite old (see refer-
ences in [6]), only in last few years researchers in cognitive sciences and artificial
intelligence started to apply this approach. Many concepts in cognitive stud-
ies ? such as concept learning, causal reasoning, social cognition, and language
understanding ? can be modeled using language with probabilistic programming
support [4].
As usual, we extend deterministic language of general purpose with random
choice primitives. The main obstacle in using probabilistic programming in large
projects is the efficient implementation of inference. In modern probabilistic
languages used various techniques and algorithms are used to solve this problem,
296 V. Khudobakhshov et al.
Here, d is a result of program execution. Suppose one have a program spec which
can be applied to two arguments a program and the first argument and produce
residual program of one argument spec(p, x0) = p’ specialized for specified
argument x0. Residual program p’ satisfied an equation p’(y) = p(x0, y) = d
for every y. But p’ has possible optimizations according to knowledge of partic-
ular value x0 and therefore work much faster.
This approach is very useful for automatic compiler construction. Suppose
we have an interpreter of (source) language S written in (target) language T
defined by int(p, args) = d (for more formal description see book [7]). One
can apply specializer spec to interpreter int with respect to program p. It is
easy to check that this will be the result of compilation from S to T.
In the context of artificial general intelligence this makes a connection
between AGI and classical AI [9]. Here we need some philosophical remarks.
Almost everybody knows a very famous proposition about general intelligence
and specialization:
cycle with game server and play handler makes choices and realizes strategy of
the game.
This interaction can be seen as classical Read–Evaluate–Print Loop of inter-
active interpreter. In such a way one can apply partial evaluation principles to
artificial general intelligence. In the case of general game playing we will deduce
specialized program which can play certain game by partially evaluating general
program according to game rules.
Many players use advanced techniques to optimize program for particular
games up to code generation and compilation [10]. We believe that it can be done
by partial evaluation. It is clear that partial evaluation can not be very useful
in search and do not provide heuristics for search optimization. It is proven that
in many cases only linear speedup is possible [7]. But manipulating with GDL
for computing legal moves and state has huge overhead and it can be removed
by specialization.
Applying the idea to more general case including learning is also possible,
independently of knowledge representation. In the case of procedural or sym-
bolic representation, it is pretty straightforward. Possible applications of partial
evaluation to neural networks are described in [7].
5 Implementation Issues
This section is about implementation details of the project. Besides the decision
to implement general purpose language, choosing of implementation language
is always coupled with some trade-offs. In our case, it was speed, development
difficulty and extensibility. Only two candidates will be considered OCaml and
Haskell. OCaml is good for catching imperative programming with full power
of functional language, including pattern matching and algebraic data types.
Haskell provides a more compact code with very good support of external
libraries via foreign function interface, but it has some drawbacks connected
with imperative issues, such as monads, lifting, and error handling. Choosing
Haskell as implementation language is probably controversial in this case, but
compiler quality and larger community were conclusive issues during the process
of the decision making.
Language tools consist of following parts: interpreter, partial evaluator, and
probabilistic programming support including tracer. All parts share some code
according to language specification.
Interpreter uses Parsec library and support REPL mode. Double precision
floating-point and arbitrary precision integers are supported. Strings are also
supported as built-in type.
The crucial aspect of probablistic programming langauge is implementa-
tion of probablisic inference algorithm. As in many other probablistic languages
Metropolis-Hastings is one of the most important sampling strategies. The imple-
mentation is based on a method carefully described in [14]. There are different
ways to implement ERPs (elementary random primitives) - basic blocks of proba-
blistic programs. To keep things simple we just maintain any key-value stucture
Programming Languages and Artificial General Intelligence 299
References
1. Batischeva, V., Potapov, A.: Genetic programming on program traces as an infer-
ence engine for probabilistic. In: These AGI-15 Proceedings (to appear)
2. Futamura, Y.: Partial evaluation of computation process an approach to a compiler-
compiler. Systems, Computers, Controls 2, 45–50 (1971)
3. Goodman, N.D., Stuhlmüller, A.: The design and implementation of probabilistic
programming languages (retrieved on 2015/3/30). http://dippl.org
4. Goodman, N.D., Tenenbaum, J.B.: Probabilistic models of cognition (retrieved on
2015/3/30). http://probmods.org
5. Goodman, N., Mansinghka, V., Roy, D., Bonawitz, K., Tarlow, D.: Church: a lan-
guage for generative models. In: Proc. 24th Conf. Uncertainty in Artificial Intelli-
gence (UAI), pp. 220–229 (2008)
6. Jones, C., Plotkin, G.D.: A probablistic powerdomain of evaluations. In: Proceed-
ings of Fourth Annual Symposium on Logic in Computer Science, pp. 186–195.
IEEE Computer Society Press (1989)
7. Jones, N., Gomard, C., Sestoft, P.: Partial Evaluation and Automatic Program
Generation. Prentice Hall (1994)
8. Kahn, K.: Partial evaluation, programming methodology, and artificial intelligence.
AI Magazine 5, 53–57 (1984)
9. Khudobakhshov, V.: Metacomputations and program-based knowledge represen-
tation. In: Kühnberger, K.-U., Rudolph, S., Wang, P. (eds.) AGI 2013. LNCS,
vol. 7999, pp. 70–77. Springer, Heidelberg (2013)
10. Kowalski, J., Szykula, M.: Game description language compiler construction. In:
Cranefield, S., Nayak, A. (eds.) AI 2013. LNCS, vol. 8272, pp. 234–245. Springer,
Heidelberg (2013)
11. Love, N., Hinrichs, T., Haley, D., Schkufza, E., Genesereth, M.: General game
playing: game description language specification. Tech. rep., Stanford Logic Group
Computer Science Department Stanford University, Technical Report LG-2006-01
(2008)
12. Potapov, A., Batischeva, V., Rodionov, S.: Optimization framework with mini-
mum description length principle for probabilistic programming. In: These AGI-15
Proceedings (to appear)
13. Potapov, A., Rodionov, S.: Making universal induction efficient by specializa-
tion. In: Goertzel, B., Orseau, L., Snaider, J. (eds.) AGI 2014. LNCS, vol. 8598,
pp. 133–142. Springer, Heidelberg (2014)
14. Wingate, D., Stuhlmüller, A., Goodman, N.D.: Lightweight implementations of
probabilistic programming languages via transformational compilation. In: Proc.
of the 14th Artificial Intelligence and Statistics (2011)
From Specialized Syntax to General Logic:
The Case of Comparatives
1 Introduction
In order for an AI system to reason in a general-purpose way about knowledge
that comes to it in natural language form, the system must somehow transform
the knowledge into a more flexible representation that is not tied to the specific lin-
guistic syntax in which it was originally expressed. There is no consensus in the AI
or computational linguistics fields on the best way to do this; various approaches
are being pursued in a spirit of experimental exploration [8]. We describe here the
approach we have been exploring, in which a sequence of transformations maps
syntactic expressions into abstract logic expressions, in a logical language mix-
ing predicate and term logic as specified in Probabilistic Logic Networks [2] [3].
This language comprehension pipeline has been constructed as part of a broader
project aimed at Artificial General Intelligence, the open-source OpenCog initia-
tive [4] [5]; it has been described previously in a 2012 overview paper [11], but has
advanced considerably in capabilities since that time.
To illustrate the properties of this comprehension pipeline, we focus here
on the case of comparative sentences. We have chosen comparatives for this
purpose because they are an important yet difficult case for any NLP system
to deal with, and hence a more interesting illustration of our NLP concepts
and system than a standard case like SVO constructs, which essentially any
reasonably sensible language processing framework can deal with acceptably
in most cases. Comparatives present a diversity of surface forms, which are yet
c Springer International Publishing Switzerland 2015
J. Bieger (Ed.): AGI 2015, LNAI 9205, pp. 301–309, 2015.
DOI: 10.1007/978-3-319-21365-1 31
302 R. Lian et al.
ultimately mappable into relatively simple logical structures. They are somewhat
confusing from the perspective of modern theoretical linguistics, and also tend
to be handled poorly by existing statistical language processing systems.
The language comprehension pipeline reviewed here is broadly similar in
concept to systems such as Fluid Construction Grammar [14] [13]and Cycorp’s
1
proprietary NLP system. However, it differs from these in important aspects.
The approach given here utilizes a dependency grammar (the link grammar
[12]) rather than a phrase structure grammar, and at the other end involves a
customized logic system combining aspects of term logic and predicate logic.
As reviewed in [11], this combination of dependency grammar and term logic
allows a large amount of ambiguity to be passed through from the surface level
to the logic level, which is valuable if one has a powerful logic engine with a
substantial knowledge base, able to resolve ambiguities based on context in a
way that earlier-stage linguistic processes could not.
We now briefly review the language comprehension pipeline utilized in the work
presented here.
The initial, syntactic phase of our pipeline consists of the link grammar [12].
The essential idea of link grammar is that each word comes with a feature struc-
ture consisting of a set of typed connectors . Parsing consists of matching up
connectors from one word with connectors from another. Consider the sentence:
The link grammar parse structure for this sentence is shown in Figure 1.
1
http://cyc.com
From Specialized Syntax to General Logic: The Case of Comparatives 303
There is a database called the “link grammar dictionary” which contains con-
nectors associated with all common English words. The notation used to describe
feature structures in this dictionary is quite simple. Different kinds of connectors
are denoted by letters or pairs of letters like S or SX. Then if a word W1 has the
connector S+, this means that the word can have an S link coming out to the right
side. If a word W2 has the connector S-, this means that the word can have an S
link coming out to the left side. In this case, if W1 occurs to the left of W2 in a
sentence, then the two words can be joined together with an S link.
The rules of link grammar impose additional constraints beyond the matching
of connectors – e.g. the planarity and connectivity metarules.. Planarity means
that links don’t cross. Connectivity means that the links and words of a sentence
must form a connected graph – all the words must be linked into the other words
in the sentence via some path.
2.2 RelEx
2.3 OpenCog
The next phase of the pipeline, RelEx2Logic, has the purpose of translating the
output of RelEx into a format compatible with the logical reasoning component of
the OpenCog AI engine. OpenCog is a high level cognitive architecture aimed at
exploration of ideas regarding human-level Artificial General Intelligence, in par-
ticular the CogPrime AGI design [4] [5]. OpenCog has been used for commercial
applications in the area of natural language processing and data mining , and has
also been used for research involving controlling virtual agents in virtual worlds,
controlling humanoid robots, genomics data analysis, and many other areas.
The centerpiece of the OpenCog system is a weighted, labeled hypergraph
knowledge store called the Atomspace, which represents information using a com-
bination of predicate and term logic formalism with neural net like weightings. The
NLP comprehension pipeline described here is centrally concerned with mapping
English language text into logical representations within the Atomspace.
The primary component within OpenCog that acts on the output of
RelEx2Logic is Probabilistic Logic Networks (PLN) [2], a framework for uncer-
tain inference intended to enable the combination of probabilistic truth val-
ues with general logical reasoning rules. PLN involves a particular approach to
estimating the confidence values with which these probability values are held
(weight of evidence, or second-order uncertainty). The implementation of PLN
in software requires important choices regarding the structural representation of
inference rules, and also regarding “inference control” – the strategies required to
decide what inferences to do in what order, in each particular practical situation.
PLN is divided into first-order and higher-order sub-theories (FOPLN and
HOPLN). FOPLN is a term logic, involving terms and relationships (links)
between terms. It is an uncertain logic, in the sense that both terms and
relationships are associated with truth value objects, which may come in mul-
tiple varieties. “Core FOPLN” involves relationships drawn from the set: nega-
tion; Inheritance and probabilistic conjunction and disjunction; Member and
fuzzy conjunction and disjunction. Higher-order PLN (HOPLN) is defined as
the subset of PLN that applies to predicates (considered as functions mapping
arguments into truth values). It includes mechanisms for dealing with variable-
bearing expressions and higher-order functions. We will see some simple exam-
ples of the kinds of inference PLN draws below.
2.4 RelEx2Logic
OpenCog also contains a system called RelEx2Logic, that translates RelEx out-
put into logical relationships, utilizing the mix of predicate and term logic codi-
fied in Probabilistic Logic Networks [2]. RelEx2Logic operates via a set of rules
roughly illustrated by the following example:
_subj (y , x )
_obj (y , z )
== >
Evaluation y x z
From Specialized Syntax to General Logic: The Case of Comparatives 305
3 Handling Comparatives
Comparatives provide more interesting examples of this sort of mapping from sur-
face form into logical expressions. Theoretical linguistics is nowhere near a consen-
sus regarding the proper handling of comparatives in English and other languages.
Some theorists posit an ellipsis theory, suggesting that comparative syntax results
from the surface structure of a sentence leaving out certain words that are present in
the deep structure [10] [1]. Others posit a movement theory [6] [7], more inspired by
traditional generative grammar, hypothesizing that comparative syntax involves
a surface structure that rearranges the deep structure.
The link grammar framework essentially bypasses this sort of issue: either
ellipsis or movement would be represented by certain symmetries in the link
grammar dictionary, but these symmetries don’t need to be explicitly recognized
or utilized by the link parser itself, though they may guide the human being
(or AI system) creating the link grammar dictionary. Currently, on an empirical
basis, the link parser handles comparatives reasonably well, but the relevant
dictionary entries are somewhat heterogeneous and not entirely symmetrical in
nature. This suggests that either
We suspect that the truth is “a little of both”, but note that this issue need not
be resolved in order to deploy the link grammar as part of a practical pipeline
for comprehending complex sentences, including comparatives.
306 R. Lian et al.
and derive the conclusion that Bob likes Hendrix more than Menudo.
For the first sentence we obtain
_subj ( like , Bob )
_obj ( like , Hendrix )
than ( Hendrix , Beatles )
_comparative ( like , Hendrix )
== >
TruthValueGreaterThanLink
EvaluationLink like Bob Hendrix
EvaluationLink like Bob Beatles
and for the second, correspondingly
_subj ( like , Americans )
_obj ( like , Menudo )
than ( Beatles , Menudo )
_comparative ( like , Beatles )
== >
TruthValueGreaterThanLink
EvaluationLink like Americans Beatles
EvaluationLink like Americans Menudo
The logical format obtained from these sentences is quite transparent. Sim-
ply via deploying its knowledge that the TruthValueGreaterThan relationship
is transitive, and that Bob is American, the PLN logic system can in two steps
derive the conclusion that
TruthValueGreaterThanLink
EvaluationLink like Bob Hendrix
EvaluationLink like Bob Menudo
Now that we are dealing with knowledge in logical rather than syntactic form,
all sorts of manipulations can be carried out. For instance, suppose we also know
that Bob likes Sinatra more than Menudo,
TruthValueGreaterThanLink
EvaluationLink like Bob Hendrix
EvaluationLink like Bob Menudo
PLN’s abduction rule then concludes that
308 R. Lian et al.
SimilarityLink
Hendrix
Sinatra
This sort of reasoning is very simple in PLN, and that’s as it should be – it
is also commonsensically simple for humans. A major design objective of PLN
was that inferences that are simple for humans, should be relatively compact
and simple in PLN. The task of the language comprehension pipeline we have
designed for OpenCog is to unravel the complexity of natural language syntax
to unveil the logical simplicity of the semantics underneath, which can then
oftentimes be reasoned on in a very simple, straightforward way.
5 Conclusion
We have summarized the operation of a natural language comprehension system
that maps English sentences into sets of logical relationships, in the logic format
utilized by a probabilistic inference engine implemented within a general pur-
pose cognitive architecture. This comprehension system is being utilized within
prototype applications in multiple areas including a non-player character in a
video game, a humanoid robot operating in an indoor environment, and a chat
system running on a smartphone interacting with a user regarding music and
media consumption.
We have focused here on the processing of comparatives, as this is a nontrivial
case that is currently confusing for linguistic theory and handled suboptimally
by many parsing systems. For practical cases of comparatives, as for most other
cases, our system qualitatively appears to give adequate performance.
However, significant work remains before we have a generally robust compre-
hension system capable for use in a wide variety fo dialogue systems. Handling of
conjunctions and quantifiers is one of the primary subjects of our current work,
along with the use of PLN to handle commonsense inferences more subtle than
the simple inference case summarized here.
References
1. Bhatt, R., Takahashi, S.: Winfried lechner, ellipsis in comparatives. The
Journal of Comparative Germanic Linguistics 14(2), 139–171 (2011).
http://dx.doi.org/10.1007/s10828-011-9042-3
2. Goertzel, B., Ikle, M., Goertzel, I., Heljakka, A.: Probabilistic Logic Networks.
Springer (2008)
3. Goertzel, B., Coelho, L., Geisweiller, N., Janicic, P., Pennachin, C.: Real World
Reasoning. Atlantis Press (2011)
4. Goertzel, B., Pennachin, C., Geisweiller, N.: Engineering General Intelligence,
Part 1: A Path to Advanced AGI via Embodied Learning and Cognitive Synergy.
Springer: Atlantis Thinking Machines (2013)
5. Goertzel, B., Pennachin, C., Geisweiller, N.: Engineering General Intelligence, Part
2: The CogPrime Architecture for Integrative, Embodied AGI. Springer: Atlantis
Thinking Machines (2013)
6. Grant, M.: The Parsing and Interpretation of Comparatives: More than Meets the
Eye (2013). http://scholarworks.umass.edu/open access dissertations/689/
7. Izvorski, R.: A dp -shell for comparatives. In: Proceeding of CONSOLE III pp.
99–121 (1995)
8. Jurafsky, D., Martin, J.: Speech and Language Processing. Pearson Prentice Hall
(2009)
9. Klein, D., Manning, C.: Accurate unlexicalized parsing. In: Proceedings of the 41st
Meeting of the Association for Computational Linguistics, pp. 423–430 (2003)
10. Lechner, W.: Ellipsis in Comparatives. Studies in generative grammar, Moulton de
Gruyter (2004). http://books.google.com.hk/books?id=JsqUHHYSXCIC
11. Lian, R., Goertzel, B., Ke, S., O’Neill, J., Sadeghi, K., Shiu, S., Wang, D.,
Watkins, O., Yu, G.: Syntax-semantic mapping for general intelligence: language
comprehension as hypergraph homomorphism, language generation as constraint
satisfaction. In: Bach, J., Goertzel, B., Iklé, M. (eds.) AGI 2012. LNCS, vol. 7716,
pp. 158–167. Springer, Heidelberg (2012)
12. Sleator, D., Temperley, D.: Parsing english with a link grammar. Third Interna-
tional Workshop on Parsing Technologies (1993)
13. Steels, L.: Design Patterns in Fluid Construction Grammar. John Benjamins (2011)
14. Steels, L.: Modeling The Formation of Language in Embodied Agents: Methods
and Open Challenges, pp. 223–233. Springer Verlag (2010)
15. Vepstas, L., Goertzel, B.: Learning language from a large unannotated corpus: A
deep learning approach. Technical Report (2013)
Decision-Making During Language
Understanding by Intelligent Agents
1 Introduction
the results of its language processing. If, by the time it finishes processing a
language input, the agent is confident that it has understood the input, this
should lead to reasoning and action. If, by contrast, the agent has not suffi-
ciently understood the input, then it must select a recovery strategy. One such
strategy is the action of asking its human collaborator for clarification. Incor-
porating such reasoning and action into the perception module, we arrive at the
following, more realistic, workflow, in which parentheses show optionality: Per-
ception and reasoning about perception–(Reasoning about suboptimal perception
processing–Recovery action)–Reasoning–Action.
With respect to language modeling itself, the traditional, theory-driven
Syntax–Semantics–Pragmatics pipeline fails to accommodate the large number
of cross-modular methods available for treating individual linguistic phenom-
ena. To take just one example, many instances of ellipsis – the null referring
expression – can be detected and resolved prior to semantic analysis, with the
results then being available to inform semantic analysis.1 Therefore, just as we
modified the cognitive modeling pipeline above, so must we modify the language
processing pipeline, leading to the more functionally sufficient approach detailed
in Section 4.
the patient’s questions. In each of the trainee’s dialog turns, the agent attempts
to detect something actionable, such as a question it should answer or a recom-
mendation it should respond to. Responding to this actionable input becomes
the agent’s communicative goal of choice, absolving it from the necessity of full
and confident analysis of every element of input.
This type of incomplete processing is not merely an escape hatch for modeling
intelligent agents in the early 21st century. We believe that it models how people
naturally behave in communicative situations: they pay attention to the main
point but often ignore many of the details of what others say. For example, if a
doctor provides exhaustive detail about the potential side effects of a medication,
do live patients pay full attention? Would they understand and remember every
detail even if they did? Selective attention is a manifestation of the principle of
least effort; it represents natural conservation of energy and thus protects against
cognitive overload [15]. So, even though OntoAgents show “focused attention” for
practical reasons, the effects of this behavior in simulation will, we hypothesize,
make agents more human-like.
We will now consider, in turn, how the canonical pipelines introduced above
can be modified to better serve OntoAgents in their quest for actionable language
interpretations.
coreference relations for overt pronouns at this stage enhances the simultaneous
disambiguation of those expressions and their selecting heads. For example, it is
much easier for the agent to disambiguate both the subject and the verb in The
train stopped than to disambiguate these strings in It stopped. So, coreferring it
with train in a context like The train raced toward the station then it suddenly
stopped is of great benefit to semantic analysis.
1bi. Tree Trimming. Before proceeding to semantic analysis, the agent has
the option of carrying out “tree trimming,” also known as syntactic pruning or
sentence simplification. Tree trimming refers to automatically deleting non-core
syntactic structures, such as relative clauses and various types of modification,
so that the core elements can be more effectively treated.2 It has been used in
applications ranging from summarization to information extraction to subtitling.
An agent’s decision about whether or not to trim should be a function of (a)
sentence length, (b) the constituents in the parse tree and the dependency parse,
and (c) situational non-linguistic parameters, such as the agent’s cognitive load
and the importance of the goal being pursued through the communication.
1c. Semantic Analysis. Semantic analysis in OntoAgent is defined as
generating an ontologically-grounded text meaning representation (TMR) that
includes the results of lexical disambiguation and semantic dependency determi-
nation.3 TMRs are written in a metalanguage they share with the ontology and
other knowledge repositories in OntoAgent. For example, the TMR for the input
Dr. Jones diagnosed the patient is shown in Table 1. Small caps indicate onto-
logical concepts and numerical suffixes indicate their instances. The “textstring”
and “from-sense” slots are metadata used for system debugging.
diagnose-1
agent human-1
theme medical-patient-1
time (before find-anchor-time) ; indicates past tense
textstring “diagnosed”
from-sense diagnosed-v1
human-1
agent-of diagnose-1
has-name “Dr. Jones”
textstring “Dr. Jones”
from-sense *personal-name*
medical-patient-1
theme-of diagnose-1
textstring “patient”
from-sense patient-n1
2
For our approach to tree trimming in service of ellipsis resolution see [11].
3
The OntoSem process of semantic analysis is described in [6] and [14].
316 M. McShane and S. Nirenburg
The key to selecting the correct sponsor is consulting the ontology and determin-
ing that the agent of the ontological concept (event) surgery – which was acti-
vated as the contextually appropriate meaning of operate – is typically a surgeon.
This is an example of “reasoning about perception.” Note that if earlier reference
processing had resulted in textual coreference links, true reference resolution to
agent memory would still have to be undertaken at this stage. This would happen,
for example, given the input, After the surgeon completed the surgery, he changed
into street clothes. Here, the grammatical structure strongly suggests the corefer-
ence relationship between he and the surgeon, but this chain of coreference must
still be anchored to the right instance of surgeon in agent memory.
1e. Indirect speech act interpretation. In its current state, our microthe-
ory of non-lexically-supported speech act interpretation covers exclusively
application-specific cases. For example, in the MVP application, if the input
includes reference to a symptom, but the input overall is not recognized as an
instance of asking whether the patient is experiencing that symptom, the patient
nevertheless responds as if it had been asked that question. Work is underway
to extend this microtheory to cover more generic contexts.
By the time the agent reaches this point in language analysis, it will have
carried out all of its basic analysis processes, constructed a TMR, and grounded
concept instances in memory. Its overall analysis is associated with a cumulative
confidence value that is computed as a function of its confidence about every
component decision it has made: each instance of lexical disambiguation, each
instance of reference resolution, etc. If the agent’s overall confidence is above a
threshold, the analysis is declared to be actionable. If not, the agent must decide
how to proceed.
1f. Reasoning about suboptimal perception processing. If the agent
chose earlier not to carry out syntactic trimming, it can choose to invoke it at
this point, in hopes of being able to generate a higher-confidence TMR from
a less complex input. The sequence syntactic analysis – semantic analysis –
tree trimming – semantic analysis is another example of interleaving modules of
processing beyond the rather simplistic original pipeline. If the trimming strategy
is either not available (e.g., it has been carried out already) or is not favored by
the agent (e.g., this is a high-risk situation with no room for error), the agent
can undertake a recovery action.
1fi. Recovery action. If the agent is collaborating with a human, one recov-
ery option is to ask a clarification question. This is particularly well-motivated in
high-risk and/or time-sensitive situations. There are, however, other options as
well. For example, if the analysis problem was due to “unexpected input” – e.g.,
an unknown word – the system can attempt learning by reading, as described in
[2]. Or, the agent can decide to recover passively, by not responding and waiting
for its interlocutor’s next move which, in some cases, might involve linguistic
clarifications, restatements, etc.
2. Post-perception reasoning & 3. Action. These modules of agent cog-
nition take as input whatever results of language processing the agent considered
an appropriate stopping condition.
318 M. McShane and S. Nirenburg
5 Final Thoughts
The recognition that reasoning is needed for language processing is, of course,
not novel. The idea has been addressed and debated from the early days of AI-
NLP and cognitive science in works by Schank [16], Wilks [17], Woods [18], and
many others. Our contribution is an attempt (a) to integrate a larger inventory
of more detailed explanatory models that rely on broader and deeper knowledge
bases, and (b) to arm agents with the ability to reason about their confidence
in language processing and act accordingly. In this regard, it is noteworthy that
a central contributor to the success of the Watson system in the Jeopardy! chal-
lenge was its use of confidence metrics in deciding whether or not to respond to
questions [3].
The idea of interleaving processing stages is also not unknown in computa-
tional linguistics proper. For example, Agirre et al. [1] use semantic information
to help determine prepositional phrase attachment, which is required for produc-
ing the correct output of syntactic analysis. Our work differs from contributions
of this kind in that our ultimate goal is not success of a particular stage of
language processing but, rather, deriving the semantic and discourse/pragmatic
meaning of the input using all available clues.
In this space, we were able to give only a high-level overview of language
understanding in OntoAgent, along with our methods of incorporating reasoning
and decision-making into the process. Naturally, many aspects of this vision of
agent functioning are work in progress. Our practical results, which vary across
microtheories, have been reported in the cited literature. Near-term goals include
both further developing the theoretical substrate of OntoAgent – continuing the
genre of the current contribution – and increasing the breadth of coverage of
all of the microtheories, knowledge bases and processors that contribute to the
functioning of OntoAgents.
References
1. Agirre, E., Baldwin, T., Martinez, D.: Improving parsing and PP attachment per-
formance with sense information. In: Proceedings of ACL-08: HLT, pp. 317–325,
Columbus, Ohio (2008)
2. English, J., Nirenburg, S.: Striking a balance: human and computer contributions
to learning through semantic analysis. In: Proceedings of ICSC-2010. Pittsburgh,
PA (2010)
3. Ferrucci, D., Brown, E., et al.: Building Watson: An Overview of the DeepQA
Project. Association for the Advancement of Artificial Intelligence (2010)
Decision-Making During Language Understanding by Intelligent Agents 319
4. Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S. J., McClosky, D.:
The Stanford CoreNLP natural language processing toolkit. In: Proceedings of the
52nd Annual Meeting of the Association for Computational Linguistics: System
Demonstrations, pp. 55–60 (2014)
5. McShane, M., Jarrell, B., Fantry, G., Nirenburg, S., Beale, S., Johnson, B.: Reveal-
ing the conceptual substrate of biomedical cognitive models to the wider commu-
nity. In: Westwood, J.D., Haluck, R.S., et al. (eds.) Medicine Meets Virtual Reality
16, pp. 281–286. IOS Press, Amsterdam, Netherlands (2008)
6. McShane, M., Nirenburg, S., Beale, S.: Language Understanding With Ontological
Semantics. Advances in Cognitive Systems (forthcoming)
7. McShane, M., Beale, S., Nirenburg, S., Jarrell, B., Fantry, G.: Inconsistency as a
Diagnostic Tool in a Society of Intelligent Agents. Artificial Intelligence in Medicine
(AIIM) 55(3), 137–148 (2012)
8. McShane, M., Nirenburg, S., Jarrell, B.: Modeling Decision-Making Biases.
Biologically-Inspired Cognitive Architectures (BICA) Journal 3, 39–50 (2013)
9. McShane, M., Nirenburg, S.: Use of ontology, lexicon and fact repository for ref-
erence resolution in Ontological Semantics. In: Oltramari, A., Vossen, P., Qin, L.,
Hovy, E. (eds.) New Trends of Research in Ontologies and Lexical Resources: Ideas,
Projects, Systems, pp. 157–185. Springer (2013)
10. McShane, M., Babkin, P.: Automatic ellipsis resolution: recovering covert informa-
tion from text. In: Proceedings of AAAI-15 (2015)
11. McShane, M., Nirenburg, S., Babkin, P.: Sentence trimming in service of verb
phrase ellipsis resolution. In: Proceedings of EAP CogSci 2015 (forthcoming)
12. McShane, M., Nirenburg, S., Beale, S.: The Ontological Semantic Treatment of
Multi-Word Expressions. Lingvisticae Investigationes (forthcoming)
13. Nirenburg, S., McShane, M., Beale, S.: A simulated physiological/cognitive “dou-
ble agent”. In: Beal, J., Bello, P., Cassimatis, N., Coen, M., Winston, P. (eds.)
Papers from the AAAI Fall Symposium, Naturally Inspired Cognitive Architec-
tures, Washington, D.C., Nov. 7–9. AAAI technical report FS-08-06, Menlo Park,
CA: AAAI Press (2008)
14. Nirenburg, S., Raskin, V.: Ontological Semantics. The MIT Press, Cambridge, MA
(2004)
15. Piantadosi, S.T., Tily, H., Gibson, E.: The Communicative Function of Ambiguity
in Language. Cognition 122, 280–291 (2012)
16. Schank, R., Riesbeck, C.: Inside Computer Understanding. Erlbaum, Hillsdale, NJ
(1981)
17. Wilks, Y., Fass, D.: Preference Semantics: A Family History. Computing and Math-
ematics with Applications 23(2) (1992)
18. Woods, W.A.: Procedural Semantics as a Theory of Meaning. Research Report No.
4627. Cambridge, MA: BBN (1981)
Plan Recovery in Reactive HTNs
Using Symbolic Planning
Abstract. Building formal models of the world and using them to plan
future action is a central problem in artificial intelligence. In this work,
we combine two well-known approaches to this problem, namely, reac-
tive hierarchical task networks (HTNs) and symbolic linear planning.
The practical motivation for this hybrid approach was to recover from
breakdowns in HTN execution by dynamically invoking symbolic plan-
ning. This work also reflects, however, on the deeper issue of tradeoffs
between procedural and symbolic modeling. We have implemented our
approach in a system that combines a reactive HTN engine, called Disco,
with a STRIPS planner implemented in Prolog, and conducted a prelim-
inary evaluation.
1 Introduction
Hierarchical task networks (HTNs) are widely used for controlling intelligent
agents and robots in complex, dynamic environments. There are many different
formalizations and graphical notations in use for HTNs. In this paper we use the
simple tree notation shown in Fig. 1, which we will explain in detail in Section 4.1.
HTNs are typically hand-authored and can be quite large, with five or more levels
of task hierarchy and dozens or even hundreds of tasks at the leaves.
All HTNs share the basic structure of decomposing tasks into sequences (or
sometimes partially ordered sets) of subtasks, with alternative decompositions
(sometimes called recipes) for different situations. In addition to the decompo-
sition tree structure, most HTNs also have conditions, such as preconditions
and postconditions, associated with nodes in the tree to control execution of the
HTN.
HTNs were originally a hierarchical extension of classical linear (e.g., STRIPS
[4]) plans, and as in classical plans, the conditions associated with tasks were
symbolic, i.e., they were written in some kind of formal logic and logical inference
was used to reason about them. Later, in response to the difficulties of symbolic
modeling (see Section 3) a variant, called reactive HTNs, was developed in which
the conditions are procedural, i.e., they are written in a programming language
and evaluated by the appropriate programming language interpreter. The idea
c Springer International Publishing Switzerland 2015
J. Bieger (Ed.): AGI 2015, LNAI 9205, pp. 320–330, 2015.
DOI: 10.1007/978-3-319-21365-1 33
Plan Recovery in Reactive HTNs Using Symbolic Planning 321
T Transport
Fig. 1. Breakdown in HTN execution after wind blows door closed and locked. Check
marks indicate successfully executed tasks; “T” indicates a condition that has been
evaluated and returned true; “F” indicates a condition that has returned false.
of reactive HTNs has also been used in game development, where they are called
behavior trees.1
This work is focused on reactive HTNs, and specifically on recovering from
breakdowns in their execution. The basic idea is to add a small proportion of
symbolic conditions to a reactive HTN in order to support a linear planner
performing local plan recovery. Section 2 below starts with a simple, motivating
example.
The problem of plan recovery has been studied in symbolic HTNs (see [1,2,
7,10]). This work is inspirational, but not directly relevant, because these plan
repair techniques rely upon all of the conditions in the HTN being symbolically
expressed, which obviates the use of a reactive HTN.
Others have proposed adding some kind of symbolic planning to reactive
HTNs. For example, Firby [5] proposed using a planner to reorder tasks in the
HTN execution or to help choose between alternative decompositions. Brom [3]
proposed using planning to help execute tasks with time constraints. However,
no one has yet developed a complete hybrid procedural/symbolic algorithm (see
Section 4.2) similar to ours.
Finally, this work is preliminary because, although we have implemented and
tested our algorithm on synthetically generated data (see Section 5), how well
it will work in practice is still an open question.
2 A Motivating Example
To further introduce and motivate our work, we first consider a small, intuitive
example of reactive HTN execution breakdown and recovery. The basic idea of
this example, shown in Fig. 1 is that a robot has been programmed using an
HTN to transport an object through a locked door. In this HTN, the toplevel
task, Transport, is decomposed into three steps: pickup, Navigate and putdown.
1
See http://aigamedev.com/open/article/popular-behavior-tree-design
322 L.O. Ouali et al.
Fig. 2. Sequence of two primitive tasks (in bold) added to plan for Navigate to recover
from breakdown in Fig. 1.
Navigate is further decomposed into three steps: unlock, open and walkthrough.
Each of these tasks is represented by an oval in Fig. 1. (The square boxes in the
HTN are there to support alternative decompositions, which can be ignored in
this example.)
At the moment in time depicted in Fig. 1, the robot has successfully picked up
the object, unlocked the door and opened it. However, before the precondition
of the walkthru step is evaluated, the wind blows the door closed and the door
locks. The walkthru precondition checks that the door is open and thus returns
false. At this point, there are then no executable tasks in the HTN, which is
what we call a breakdown.
Such breakdowns are not unusual in reactive HTNs, especially when they
are executing in complex, dynamic environments. In fact, something similar to
this actually happened recently to the winning robot in the prestigious DARPA
Robotics Challenge2 (emphasis added): “However, team Schaft lost points when a
gust of wind blew a door out of the robot’s hand and the robot was unable to exit a
vehicle after navigated a driving course successfully.” It can be hard to anticipate
all possible things that can go wrong; and trying to incorporate all possible recovery
plans into the HTN in advance can lead to an explosion of programming effort.
However, looking at this breakdown in particular, the recovery solution,
shown in Fig. 2, is obvious, namely to unlock and open the door. Furthermore,
this would be a trivial problem for a symbolic linear (e.g., STRIPS) planner to
solve if only the pre- and postconditions of the relevant primitives were specified
symbolically.
In a reactive HTN, pre- and postconditions are written in a procedural (pro-
gramming) language and evaluated by the appropriate programming language
interpreter. Fig. 3a shows the relevant procedural conditions in the Navigate
plan as they might typically be written, for example, in JavaScript. For exam-
ple, “isOpen()” would call code in the robot’s sensory system to check whether
the door is currently open. In comparison, Fig. 3b shows how the same primitive
tasks would typically be formalized for a STRIPS planner using symbolic features.
Suppose that when the breakdown in Fig. 1 occurred, the HTN execu-
tion engine somehow had available the symbolic modeling knowledge shown in
Fig. 3b. Recovering from the breakdown could then be formulated as a STRIPS
planning problem (see Fig. 4) in which the initial state is the current world
state, i.e., the door is not open and is locked, and the final state is the failed
2
https://herox.com/news/148-the-darpa-robotics-challenge
Plan Recovery in Reactive HTNs Using Symbolic Planning 323
{+open}
Final State: {+open}
Initial State: {-open, +locked} {...} {...}
open
Operators
unlock: +locked -locked
open: -open, -locked +open {-open, -locked}
walkthru: +open ... {...} {...}
... unlock
{-open, +locked}
4 A Hybrid Approach
In this section we generalize the motivating example in Section 2 in two ways by
considering: (1) other types of breakdowns and (2) a larger set of possible final
states. We will first present a general plan recovery algorithm and then discuss
the modeling methodology that goes along with it.
executed (typically changing the world state); otherwise (i.e., for abstract tasks)
the applicability conditions of the children (decomposition) nodes are evalu-
ated in order until the first one that returns true—execution continues with this
decomposition node. If all of the applicability conditions of the children return
false, then execution is halted (a breakdown).
When execution of a task node is completed, its postcondition, if any, is eval-
uated. If the postcondition returns false, then execution is halted (a breakdown);
otherwise execution continues.
If the current execution node is a decomposition, then the children (task)
nodes are executed in order.
Fig. 5 summarizes the three types of execution breakdowns that are possible
in reactive HTN execution. The motivating example in Section 2 was a failed pre-
condition, as in Fig. 5a Notice that this taxonomy does not distinguish between
different possible underlying causes of a breakdown. A breakdown can be caused
by an external, i.e., unmodeled, agency unexpectedly changing the environment
(e.g., the wind in Section 2); or it can be due to a programming bug, such as
incorrect tree structure or an incorrectly coded condition. The fact these differ-
ent causes are indistinguishable in the breakdown is an inherent limitation of
reactive HTNs.
Continuing with this line of thought, suppose that there was no symbolic
postcondition provided for walkthru, but the symbolic postcondition of Navigate
specified the desired location of the robot. In that case, the postcondition of
Navigate would be a good candidate recovery target.
Similarly, suppose the original breakdown in the example had instead
occurred due to the postcondition of unlock failing. In this situation, the sym-
bolic precondition of walkthru and the symbolic postconditions of walkthru and
Navigate, if they are provided, are still good recovery targets.
Based on this reasoning, in the algorithm below we consider the largest pos-
sible set of pre- and postconditions in the tree as candidate recovery targets,
excluding only those that have already been used in the execution process and
have evaluated to true. We suspect this is an over-generalization, but need more
practical experience to determine a better approach.
The recovery target issue for applicability conditions is somewhat different.
The only time that an applicability condition should be a recovery target is when
its and all of its siblings’ conditions have evaluated to false, as in Fig. 5c.
Fig. 6 shows the pseudocode for the hybrid system we have designed. The
toplevel procedure, Execute, executes an HTN until either it is successfully com-
pleted, or there is a breakdown with no possible recovery. The plan recovery algo-
rithm starts at line 5. The main subroutine, FindCandidates, recursively walks
the HTN tree, accumulating candidate target conditions for the recovery plan-
ning. Notice that SymbolicPlanner is not defined here, since any symbolic lin-
ear planner can be used (our implementation is described in Section 5). Notice also
that since the set of operators used for symbolic planning doesn’t change during
execution of the HTN, it is not an explicit argument to the symbolic planner (see
further discussion regarding symbolic operators in Section 4.3).
In more detail, notice on line 6 that our approach requires a method for
computing from the current world state an initial state representation in the
formalism used by the symbolic planner. For example, for the STRIPS planner
in Section 2 this means that for every feature, such as “open,” there must be an
associated procedure, such as “isOpen(),” to compute its value in the current
world state. This association is a basic part of the hybrid modeling methodology
discussed in next section.
Notice on line 8 that the candidate conditions are sorted by distance from
the current node in the tree (closest first), using a simple metric such as the
length of the shortest path between them in the undirected graph. The reason
for this is to give preference to recovery plans that keep more closely to the
structure of the original HTN. We do not yet have any experience with how well
this heuristic works in practice.
Finally in Execute, notice on line 12 that when a recovery plan is found,
it must be properly spliced into the HTN. In the simple example in Fig. 2, this
is merely a matter of inserting a sequence of nodes as children of the common
parent between the initial and final nodes. However, if the initial and final nodes
are more distant in the tree, more complicated changes are needed to replace
the intervening tree structure with the new plan.
Plan Recovery in Reactive HTNs Using Symbolic Planning 327
1: procedure Execute(htn)
2: while htn is not completed do
3: current ← next executable node in htn
4: if current = null then execute current
5: else [breakdown occurred]
6: initial ← symbolic description of current world state
7: candidates ← F indCandidates(htn)
8: sort candidates by distance from current
9: for f inal ∈ candidates do
10: plan ← SymbolicP lanner(initial, f inal)
11: if plan = null then
12: splice plan into htn between current and f inal
13: continue while loop above
14: Recovery failed!
the task depth (see Fig. 9). For each test, we randomly sampled from the very
large space (millions) of all possible combinations of symbolic knowledge at three
overall levels: 25%, 50% and 75% (percentage of conditions in the tree that are
symbolically specified). We did not test larger trees because the experimental
running times became too long.
...
ner which it solved, which also increased as the symbolic
knowledge increased. (For this experiment, we made a D
References
1. Ayan, N.F., Kuter, U., Yaman, F., Goldman, R.: HOTRiDE: Hierarchical ordered
task replanning in dynamic environments. In: Planning and Plan Execution for
Real-World Systems-Principles and Practices for Planning in Execution: Papers
from the ICAPS Workshop, Providence, RI (2007)
2. Boella, G., Damiano, R.: A replanning algorithm for a reactive agent architecture.
In: Scott, D. (ed.) AIMSA 2002. LNCS (LNAI), vol. 2443, pp. 183–192. Springer,
Heidelberg (2002)
3. Brom, C.: Hierarchical reactive planning: where is its limit. In: Proceedings of
MNAS: Modelling Natural Action Selection, Edinburgh, Scotland (2005)
4. Fikes, R.E., Nilsson, N.J.: STRIPS: A new appraoch to the application of theorem
proving to problem solving. Artificial Intelligence 2, 189–208 (1971)
5. Firby, R.: An investigation into reactive planning in complex domains. In: AAAI,
pp. 202–206 (1987)
330 L.O. Ouali et al.
1 Introduction
Occam’s razor is the crucial component of universal algorithmic intelligence models
[1], in which it is formalized in terms of algorithmic information theory. In practice,
Occam’s razor is most widely used in the form of the Minimum Description/Message
Length (MDL/MML) principles [2, 3], which can also be grounded in algorithmic
information theory [4], but usually are applied loosely using heuristic coding schemes
instead of universal reference machines [5].
Another form of Occam’s razor is the Bayesian Occam’s razor. In its simplest
form, it penalizes complex models assigning them lower prior probabilities. However,
these priors can be difficult to define non-arbitrarily. Some additional principles such
as the maximum entropy principle were traditionally used to define priors, but
algorithmic information theory providing universal priors resolves this difficulty more
generally and elegantly [6] absorbing this simple form of the Bayesian Occam’s razor.
Real alternative to the information-theoretic interpretation of Occam’s razor is ‘a
modern Bayesian approach to priors’ [7], in which model complexity is measured by
© Springer International Publishing Switzerland 2015
J. Bieger (Ed.): AGI 2015, LNAI 9205, pp. 331–340, 2015.
DOI: 10.1007/978-3-319-21365-1_34
332 A. Potapov et al.
its flexibility (possibility to fit to or generate different data instances) estimated on the
second level of inference.
Interestingly, Bayesian Occam’s razor arises naturally without special implementa-
tion in probabilistic programming languages (PPLs) with posterior probability infe-
rence [8]. Programs in PPLs are generative models. They require a programmer to
define prior probabilities for some basic random variables, but the total probability
distribution is derived from the program. One can easily obtain universal priors by
writing a function like (define (gen) (if (flip) '() (cons (if (flip) 0 1) (gen)))), where
(flip) equiprobably returns #t or #f, and interpreting generated binary lists as programs
for Universal Turing Machine (UTM).
Universal priors appear here from the natural structure of the program, and a con-
crete form of the selected distributions for the basic random choices only shifts them
as the choice of a concrete UTM does. Similar situation appears in the case of models
specifying Turing-incomplete spaces – higher-order polynomials with concrete
coefficients will naturally have smaller prior probabilities than lower-order polyno-
mials even if the degree of polynomials is uniformly sampled from a certain range.
Inference methods implemented in PPLs are intended for evaluating posterior
probabilities incorporating priors defined by a program. Thus, instead of manually
applying the MDL principle, one can simply use PPLs, which provide both the over-
learning-proof criterion and automatic inference methods.
However, existing PPLs don’t solve the problem of efficient inference in a general
case, although they provide more efficient inference procedures than blind search.
Now, different attempts to improve inference procedures are being made (e.g. [9,
10]). Most of them are done within the full Bayesian framework. The optimization
framework, in which only maximum of posterior distribution (or other criterion) is
sought, can be much more efficient and is enough in many practical tasks, but is much
less studied in probabilistic programming.
Optimization queries require some criterion function to be defined instead of a
strict condition. It is usually straightforward to define precision-based criteria.
Actually, in some tasks, strict conditions are defined as stochastic equality based on
likelihood (otherwise it will be necessary to blindly generate and fit noise), so the
latter is more basic. Of course, if there is no appropriate quantitative criterion, the
optimization framework is not applicable. However, if one uses stochastic equality,
priors will be automatically taken into account by conditional sampling (since sam-
ples will be generative in accordance with prior probabilities and then kept propor-
tionally to likelihood), while optimization queries will directly maximize the given
criterion and will be prone to overfitting if this criterion is precision-based.
Thus, the necessity for MDL-like criteria arises in the optimization approach to
probabilistic programming. Necessity for manual specification of such criteria, which
incorporate not only precision, but also complexity, makes optimization queries much
less usable and spoils the very idea of probabilistic programming. Thus, optimization
queries should be designed in such a form that user-defined likelihood criteria are
modified using automatically estimated priors.
In this work, we re-implement a functional PPL with optimization queries in the
form of C++ library, which have been implemented in Scheme and described in the
Optimization Framework with Minimum Description 333
companion paper [11]. We add a wrapper for OpenCV to this library in order to deal
with non-toy problems. In these settings, we develop a procedure to calculate prior
probabilities of instantiations of generative models in the form of computation traces
used in optimization queries, and study its applicability to avoid overlearning.
2 Background
Minimum Description Length Principle
Universal induction and prediction models are based on algorithmic complexity and
probability, which are incomputable and cannot be directly applied in practice. In-
stead, the Minimum Description (or Message) Length principle (MDL) is usually
applied. Initially, these principles were introduced in some specific strict forms [2, 3],
but now are utilized in many applied methods (e.g. [5]) in the form of the following
loose general definition [4]: the best model of the given data source is the one which
minimizes the sum of
• the length, in bits, of the model description;
• the length, in bits, of data encoded with the use of the model.
Its main purpose is to avoid overfitting by penalizing models on the base of their
complexity that is calculated within heuristically defined coding schemes. Such “ap-
plied MDL principle” is quite useful, but mostly in the context of narrow AI. Bridging
the gap between Kolmogorov complexity and applications of the MDL principle can
also be a step towards bridging the gap between general and narrow AI.
Probabilistic Programming
In traditional semantics, a program with random choices being evaluated many times
yields different results. The main idea behind probabilistic programming is to asso-
ciate the result of program evaluation not with such particular outcomes, but with the
distribution of all possible outcomes. Of course, the problem is to represent and com-
pute such distributions for arbitrary programs with random choices. It can be done
directly only for some Turing-incomplete languages. In general case, the simplest way
to deal with this problem is via sampling, in which a distribution is represented by the
samples generated by a program evaluated many time using traditional semantics.
Crucial feature of PPLs is conditioning, which allows a programmer to impose
some conditions on (intermediate or final) results of program evaluation. Programs
with such conditions are evaluated to conditional (posterior) distributions, which are
the core of Bayesian inference. The simplest implementation of conditional inference
is rejection sampling, in which outcomes of the program evaluation, which don’t meet
the given condition, are rejected (not included into the generated set of outcomes
representing conditional distribution). Such rejection sampling can be easily added to
most existing programming languages as a common procedure, but it is highly ineffi-
cient, so it is usable only for very low-dimensional models. Consequently, more
advanced inference techniques are being applied. For example, Metropolis-Hastings
method is quite popular. In particular, it is used in Church [8], which extends Scheme
with such sampling functions as rejected-query, mh-query, and some others.
334 A. Potapov et al.
PPLs extend traditional programming languages also adding to them some func-
tions to sample from different distributions. In Church, such functions as flip, ran-
dom-integer, gaussian, multinomial, and some others are implemented.
Bayesian Occam’s Razor in Probabilistic Programming
As was mentioned, such PPLs as Church naturally support the Bayesian Occam’s
razor [8]. Let us consider the following very simple example.
(mh-query 1000 100
(define n (+ (random-integer 10) 1))
(define xs (repeat n (lambda () (random-integer 10))))
n
(= (sum xs) 12))
Here, we want a sum of unknown number n of random digits xs be equal to the
given number, 12. Values of n belonging to the specified range are equiprobable a
priori. However, the derived posterior probabilities are highly non-uniform –
P(n=2|sum=12)≈0.9; P(n=3|sum=12)≈0.09; P(n=4|sum=12)≈0.009.
Underlying combinatorics is quite obvious. However, this is exactly the effect of
“penalizing complex solutions” that works in less obvious cases [8], e.g. polynomial
approximation using polynomials of arbitrary degree, or clustering with unknown
number of clusters.
We also wrapped some OpenCV functions and data structures in our library. Sup-
port for cv::Mat as the basic type was added, so it is possible to write something like
Define(S(“image”), V(cv::imread(“test.jpg”))). All basic overloaded operations with
cv::Mat are inherited, so values corresponding to cv::Mat can be summed or multip-
lied with other values.
To avoid huge program traces while filling image pixels with random values (each
such value will become a node in a program trace), we introduced such classes as
MatGaussian and MatRndInt for generating random matrices as holistic values. These
random matrices can be also generated as deviations from given data.
The mentioned constructors of different classes are used simply to create expres-
sions and arrange them into trees. Evaluation of such expressions was also imple-
mented. A given expression tree is expanded into a program trace during evaluation.
This program trace is also an expression tree, but with values assigned to its nodes.
Evaluation process and program traces implemented in our C++ library are similar to
that implemented in Scheme and described in the companion paper [11], so we will
not go into detail here. Also, we re-implemented the optimization queries based on
simulated annealing and genetic programming over computation traces. For example,
one can write the following program with the result of evaluation shown in Fig. 1
Symbol imr, imb;
AnnealingQuery(List()
<< Define(imr, MatRndInt(img.rows, img.cols, CV_8UC3, 256, img))
<< Define(imb, GaussianBlur(imr, V(11.), V(3.)))
<< imr
<< (MatDiff2(imb, V(img)) + MatDiff2(imb, imr) * 0.3));
Here, img is some cv::Mat loaded beforehand, List() << x << y << z … is equiva-
lent to (list x y z …). Operator << can be used to put additional elements to the list on
the step of expression tree creation (not evaluation). imr is created as the random
3-channel image with img as the initial value. MatDiff2 calculates RMSE per pixel
between two matrices. AnnealingQuery is the simulated annealing optimization query,
which minimizes the value of its last child, and its return value is set to the corres-
ponding value of its last but one child. Here, the second term in the optimization func-
tion prevents from too noisy results. Also, GPQuery based on genetic programming is
implemented.
Simulated annealing is not really suitable to perform search in the space of images,
but reasonable result is obtained here in few seconds. It can also be seen that general
C++ code can be easily used together with our probabilistic programming library. Of
course, this code is executed before or during construction of expression tree or after
its evaluation, but not during the process of evaluation. The latter can be done by
336 A. Potapov et al.
extending the library with new classes that is relatively simple, but slightly more
involved.
Expression trees can be used not as fixed programs written by a programmer, but
as dynamic data structures built automatically. So, such a library can easily be made a
part of a larger system (e.g. a cognitive architecture).
Our library is under development and is used in this paper as the research tool, so
we will not go into more detail. Nevertheless, the current version can be downloaded
from https://github.com/aideus/prodeus
Undesirable Behavior
Optimization framework is suitable for many tasks, and optimization queries even
without complexity penalty can be applied in probabilistic programming (see some
examples in our companion paper [11]). However, even very simple generative mod-
els can be inappropriate in this framework. Consider the following program
Symbol xobs, centers, sigmas, n, xgen;
AnnealingQuery(List()
<< Define(xobs, V(4.))
<< Define(centers, List(3, -7., 2., 10.))
<< Define(sigmas, List(3, 1., 1., 1.))
<< Define(n, RndInt(Length(centers)))
<< Define(xgen, Gaussian(ListRef(centers, n), ListRef(sigmas, n)))
<< n
<< (xobs – xgen) * (xobs – xgen));
Intuitively, this program should simply return the number of the center closest to
xobs since AnnealingQuery will minimize the distance from the generated value to the
class center. However, evaluation of this program yields almost random indices of
centers. The same model works fine in Church. The following query will return the
distribution with p(n=1)≈1; and in the case of (define centers '(-7., -2., 10.)) it will
return p(n=1)≈p(n=2)≈0.5.
(define (noisy-equal? x y)
(flip (exp (* -1 (– x y) (– x y)))))
(mh-query 100 100
(define xobs 4)
(define centers '(-7., 2., 10.))
(define sigmas '(1., 1., 1.))
(define n (random-integer (length centers)))
(define xgen (gaussian (list-ref centers n) (list-ref sigmas n)))
n
(noisy-equal? xobs xgen))
It should be noted that noisy-equal? should apply flip to the correctly estimated li-
kelihood, if one wants e.g. to get correct posterior probabilities for xgen. In particular,
it should include such parameter as dispersion or precision. That is, these programs in
C++ and Church really include the same information.
Inappropriate result of AnnealingQuery originates from its possibility to reduce the
given criterion adjusting values of all random variables including both n and xgen in
this model. It is much easier to adjust xgen directly since its probability is not taken
Optimization Framework with Minimum Description 337
into account in the criterion. This problem can be easily fixed here, if we will tell
AnnealingQuery to minimize the distance from the n-th center to xobs. The program
will be simpler, and its result will be correct. However, the general problem will
remain. It will reveal itself in the form of overfitting, impossibility to select an appro-
priate number of cluster or segments in the tasks of clustering and segmentation,
necessity to manually define ad hoc criteria, and so on. These are exactly the
problems, which are solved with the use of the MDL principle.
Complexity Estimation
Apparently, if we want optimization queries to work similarly to sampling queries, we
need to account for probabilities, with which candidate solutions are generated. Here,
we assume that the criterion fed to optimization queries can be treated as the negative
log-likelihood. Then, it will be enough to automatically calculate and add minus
logarithm of prior probability of a candidate solution to achieve the desirable
behavior.
We calculate these prior probabilities by multiplying probabilities in those nodes of
the program trace subtree starting from AnnealingQuery or GPQuery, in which basic
random choices are made. Here, we assume that the list of expressions fed to queries
is relevant. As the result, each such choice is taken into account only once, even if a
variable referring to this choice is used many times.
AnnealingQuery and GPQuery were modified and tested on the program presented
above, and they returned n=1 in all cases, so they behave desirably. Of course, opti-
mization queries give less information than sampling queries. For example, in the
case of centers '(-7., -2., 10.) the former will return n=1 or n=2 randomly, while the
latter will return their probabilities. However, optimization queries can be much more
efficient, and can be used to find the first point, from which methods like mh-query
can start.
4 Evaluation
Since we aim at practical probabilistic programming for Turing-complete languages,
we consider image analysis tasks which are computationally quite heavy. To the best
of our knowledge, the only example of such application is the work [12] (and unfor-
tunately it lacks information about computation time). Thus, possibility to solve im-
age analysis tasks in a reasonable time can be used as a sufficient demonstration of
efficiency of the optimization framework. This is also our goal in addition to verifica-
tion of the automatic MDL criterion calculation procedure.
Consider the task of detection of erythrocytes (our system wasn’t aimed to solve
this specific task, and it is taken simply as an example; other tasks could be picked).
The typical image is shown in Fig. 2. The task is to detect and count cells. This task is
usually solved by detecting edge pixels and applying Hough transform, or by tracking
contours and fitting circles. Direct application of existing implementations of image
processing methods is not enough, and application of non-trivial combinations of
different processing functions or even ad hoc implementation of these functions is
needed (e.g. [13]).
338 A. Potapov et al.
Fig. 3. The result yielded by GPQuery (population size = 300, number of generations = 1100,
mutation rate = 0.005)
Ta
able 1. Total description lengths, bits
n
Image # 1 2 3 4 5 6
1 14650.4 14
4038.0 13131.2 12687.3 12689.3 12690.00
2 20201.3 19
9612.1 18888.2 17955.2 17104.2 17115.22
3 14680.3 13
3995.2 12808.1 12391.7 12316.6 12321.00
4 9270.7 81
155.1 8160.6 8162.6 8163.2 8168.5
It can be seen that the to otal description length starts to slowly increase from soome
number of circles for each image. Each circle adds around 10 bits of complexity. So,
negative log-likelihood slig ghtly decreases, but slower than increase of complexxity.
Actually, since blood cells are not perfectly circular, additional circles fitted to unnco-
vered parts of cells can inccrease model complexity lesser than decrease of negattive
log-likelihood in some cases. However, in these cases, queries calculating posterior
probability will also give a strong peak at the same number of circles. In other worrds,
the origin of this result is no
ot in query procedures or criteria, but in the model. In ggen-
eral, the found minima of th he description length criteria correspond to the real num
mber
of blood cells, and partiallyy presented cells are reliably detected.
5 Conclusion
The developed method for automatic usage of the Minimum Description Length pprin-
ciple in probabilistic programming both reduces the gap between the looselyappplied
340 A. Potapov et al.
MDL principle and the theoretically grounded, but impractical Kolmogorov complexity,
and helps to avoid overfitting in optimization queries making them an efficient alterna-
tive to more traditional queries estimating conditional probabilities. Experiments con-
ducted on the example of an image analysis task confirmed availability of this approach.
However, even optimization queries being not specialized cannot efficiently solve
arbitrary induction tasks especially connected to AGI. Actually, the task of such effi-
cient inference can itself be considered as the “AI-complete” problem. Thus, deeper
connections between AGI and probabilistic programming fields are to be established.
Acknowledgements. This work was supported by Ministry of Education and Science of the
Russian Federation, and by Government of Russian Federation, Grant 074-U01.
References
1. Hutter, M.: Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic
Probability. Springer (2005)
2. Wallace, C.S., Boulton, D.M.: An Information Measure for Classification. Computer
Journal 11, 185–195 (1968)
3. Rissanen, J.J.: Modeling by the Shortest Data Description. Automatica-J.IFAC 14,
465–471 (1978)
4. Vitanyi, P.M.B., Li, M.: Minimum Description Length Induction, Bayesianism, and
Kolmogorov complexity. IEEE Trans. on Information Theory 46(2), 446–464 (2000)
5. Potapov, A.S.: Principle of Representational Minimum Description Length in Image Anal-
ysis and Pattern Recognition. Pattern Recognition and Image Analysis 22(1), 82–91 (2012)
6. Solomonoff, R.: Does Algorithmic Probability Solve the Problem of Induction?. Oxbridge
Research, Cambridge (1997)
7. MacKay, D.J.C.: Bayesian Methods for Adaptive Models. PhD thesis, California Institute
of Technology (1991)
8. Goodman, N.D., Tenenbaum, J.B.: Probabilistic Models of Cognition.
https://probmods.org/
9. Stuhlmüller, A., Goodman, N. D.: A dynamic programming algorithm for inference in re-
cursive probabilistic programs. In: Second Statistical Relational AI Workshop at UAI 2012
(StaRAI-12), arXiv:1206.3555 [cs.AI] (2012)
10. Chaganty, A., Nori A.V., Rajamani, S.K.: Efficiently sampling probabilistic programs via
program analysis. In: Proc. Artificial Intelligence and Statistics, pp. 153–160 (2013)
11. Potapov, A., Batishcheva, V.: Genetic Programming on Program Traces as an Inference
Engine for Probabilistic Languages. In: LNAI (2015)
12. Mansinghka, V., Kulkarni, T., Perov, Y., Tenenbaum, J.: Approximate Bayesian Image
Interpretation using Generative Probabilistic Graphics Programs. Advances in Neural
Information Processing Systems, arXiv:1307.0060 [cs.AI] (2013)
13. Zhdanov, I.N., Potapov, A.S., Shcherbakov, O.V.: Erythrometry method based on a
modified Hough transform. Journal of Optical Technology 80(3), 201–203 (2013)
Can Machines Learn Logics?
1 Introduction
Logic-based AI systems perform logical inferences to get solutions given input
formulas. Such systems have been developed in the field of automated theorem
proving or logic programming [10]. In those systems, however, a logic used in
the system is specified and built-in by human engineers. Our question in this
paper is whether it is possible to develop artificial (general) intelligence that
automatically produces a logic underlying any given data set.
In his argument on “learning machines” in [14], Alan Turing wrote:
Instead of trying to produce a programme to simulate the adult mind,
why not rather try to produce one which simulates the child’s? If this
were then subjected to an appropriate course of education one would
obtain the adult brain [14, p. 456].
According to Piaget’s theory of cognitive development, children begin to under-
stand logical or rational thought at age around seven [12]. If one can develop
AI that automatically acquires a logic of human reasoning, it verifies Turing’s
assumption that a child’s brain can grow into an adult’s one by learning an
appropriate logic. Recent advances in robotics argue possibilities of robots’ rec-
ognizing objects in the world, categorizing concepts, and associating names to
them (physical symbol grounding) [3]. Once robots successfully learn concepts
c Springer International Publishing Switzerland 2015
J. Bieger (Ed.): AGI 2015, LNAI 9205, pp. 341–351, 2015.
DOI: 10.1007/978-3-319-21365-1 35
342 C. Sakama and K. Inoue
and associate symbols to them, the next step is to learn relations between con-
cepts and logical or physical rules governing the world.
In this study, we will capture learning logics as a problem of inductive learn-
ing. According to [9], “(t)he goal of (inductive) inference is to formulate plau-
sible general assertions that explain the given facts and are able to predict new
facts. In other words, inductive inference attempts to derive a complete and
correct description of a given phenomenon from specific observations of that
phenomenon or of parts of it” [9, p. 88]. A logic provides a set of axioms and
inference rules that underlie sentences representing the world. Then given a set
of sentences representing the world, one could inductively construct a logic gov-
erning the world. This is in fact a work for mathematicians who try to find an
axiomatic system that is sound and complete with respect to a given set of the-
orems. Induction has been used as an inference mechanism of machine learning,
while little study has been devoted to the challenging topic of learning logics.
In this paper, we first describe an abstract framework for learning logics
based on inductive learning. Next we provide two simple case studies: learning
deductive inference rules and learning cellular automata (CAs) rules. In the
former case, the problem of producing deductive inference rules from formulas
and their logical consequences is considered. In the second case, the problem of
producing transition rules from CA configurations is considered. In each case, we
use machine learning techniques together with metalogic programming. The rest
of this paper is organized as follows. Section 2 introduces an abstract framework
for learning logics. Section 3 presents a case of learning deductive inference rules
and Section 4 presents a case of learning CA rules. Section 5 discusses further
issues and Section 6 summarizes the paper.
2 Learning Logics
To consider the question “Can machines learn logics?”, suppose the following
problem. There is an agent A and a machine M. The agent A, which could be a
human or a computer, is capable of deductive reasoning: it has a set L of axioms
and inference rules in classical logic. Given a (finite) set S of formulas as an
input, the agent A produces a (finite) set of formulas T such that T ⊂ T h(S)
where T h(S) is the set of logical consequences of S. On the other hand, the
machine M has no axiomatic system for deduction, while it is equipped with a
machine learning algorithm C. Given input-output pairs (S1 , T1 ), . . . , (Si , Ti ), . . .
(where Ti ⊂ T h(Si )) of A as an input to M, the problem is whether one can
develop an algorithm C which successfully produces an axiomatic system K for
deduction. An algorithm C is sound wrt L if it produces an axiomatic system K
such that K ⊆ L. An algorithm C is complete wrt L if it produces an axiomatic
system K such that L ⊆ K. Designing a sound and complete algorithm C is called
a problem of learning logics (Figure 1). In this framework, an agent A plays the
role of a teacher who provides training examples representing premises along
with entailed consequences. The output K is refined by incrementally providing
examples. We consider a deduction system L while it could be a system of
Can Machines Learn Logics? 343
Si - deduction - Ti (⊂ T h(Si ))
system L
??
Input Machine M Output
- learning - K
(Si , Ti ) system C
arbitrary logic, e.g. nonmonotonic logic, modal logic, fuzzy logic, as far as it
has a formal system of inference. Alternatively, we can consider a framework in
which a teacher agent A is absent. In this case, given input-output pairs (Si , Ti )
as data, the problem is whether a machine M can find an unknown logic (or
axiomatic system) that produces a consequence Ti from a premise Si .
The abstract framework provided in this section has challenging issues of AI
including the questions:
1. Can we develop a sound and complete algorithm C for learning a classical or
non-classical logic L?
2. Is there any difference between learning axioms and learning inference rules?
3. Does a machine M discover a new axiomatic system K such that K F iff
L F for any formula F ?
The first question concerns the possibility of designing machine learning algo-
rithms that can learn existing logics from given formulas. The second question
concerns differences between learning Gentzen-style logics and Hilbert-style log-
ics. The third question is more ambitious: it asks the possibility of AI’s discov-
ering new logics that are unknown to human mathematicians.
In this paper, we provide simple case studies concerning the first question.
To this end, we represent a formal system L using metalogic programming which
allows object-level and meta-level representation to be amalgamated [2].
where p and q are propositional variables. In this case, given a finite set S of
atoms as an input, A outputs the set:
We now consider the machine M that can produce deductive inference rules
from S and T as follows. Given each pair (S, T ) as an input, we first consider a
learning system C which constructs a rule:
A← Bi (2)
Bi ∈S
two atoms hold(q) and hold(s) are in T \ S. Then the following two rules are
constructed by (2):
The body of each rule contains atoms which do not contribute to deriving the
atom in the head. To distinguish atoms which contribute to deriving the con-
sequence, the agent A is used as follows. For a pair (S, T ) from A such that
T \ S = ∅, assume that a rule R of the form (2) is constructed. Then, select a
subset Si of S and give it as an input to A. If its output Ti still contains the
atom A of head(R), replace R with
A← Bi .
Bi ∈Si
S1 = { hold(p), hold(p ⊃ q) }
Can Machines Learn Logics? 345
that satisfies hold(q) ∈ T1 , and there are two minimal sets that contain the atom
hold(s) in their outputs:
S2 = { hold(r), hold(r ⊃ s) },
S3 = { hold(p), hold(p ⊃ r), hold(r ⊃ s) }.
Then the following three rules are obtained by replacing S with Si in (2):
The rules (3) and (4) represent Modus Ponens, and (5) represents Multiple Modus
Ponens. As such, unnecessary atoms in the body of a rule are eliminated by the
minimization technique.
Unnecessary atoms in the bodies are also eliminated using the generalization
technique developed in [6].1 Suppose an agent A with an inference system L that
performs the following inference:
In this case, given a finite set S of atoms as an input, A outputs the set:
Given a sequence of input-output pairs from the agent A, the machine M con-
structs a rule R of the form (2) each time it receives a new pair (Si , Ti ) from
A. Suppose two rules R and R such that (i) head(R) = head(R ); (ii) there is
a formula F such that hold(F ) ∈ body(R) and hold(¬F ) ∈ body(R ); and (iii)
(body(R ) \ {hold(¬F )}) ⊆ (body(R) \ {hold(F )}). Then, a generalized rule of R
and R (upon F ) is obtained as
A← Bi .
Bi ∈(body(R)\{hold(F )})
For example, given the two pairs, (S1 , T1 ) and (S2 , T2 ), where
We can also obtain a rule for abductive inference [11] by this method. For
example, given the pair (S, T ) = ({hold(q), hold(p ⊃ q)}, {hold(q), hold(p ⊃
q), hold(p)}), we can construct the Fallacy of Affirming the Consequent:
In this way, the method in this section could be used for learning non-deductive
inferences.
4 Learning CA Rules
In this section, we address another example of learning logics. Cellular automata
(CAs) [15] are discrete and abstract computational models that have been used
for simulating various complex systems in the real world. A CA consists of a
regular grid of cells, each of which has a finite number of possible states. The
state of each cell changes synchronously in discrete time steps (or generations)
according to a local and identical transition rule. The state of a cell in the next
time step is determined by its current state and the states of its surrounding cells
(called neighbors). The collection of all cellular states in the grid at some time
step is called a configuration. An elementary CA consists of a one-dimensional
array of (possibly infinite) cells, and each cell has one of two possible states 0
or 1. A cell and its two adjacent cells form a neighbor of three cells, so there
are 23 = 8 possible patterns for neighbors. A transition rule describes for each
pattern of a neighbor, whether the central cell will be 0 or 1 at the next time
step. Then 28 = 256 possible rules are considered and 256 elementary CAs
are defined accordingly. Stephen Wolfram gave each rule a number 0 to 255
(called the Wolfram code), and analyzed their properties [15]. The evolution
of an elementary CA is illustrated by starting with the initial configuration in
the first row, the configuration at the next time step in the second row, and so
on. Figure 2 shows the Rule 30 and one of its evolution where the black cell
represents the state 1 and the white cell represents the state 0. The figure shows
the first 16 generations of the Rule 30 starting with a single black cell. It is
known that the Rule 30 displays aperiodic and random patterns in a chaotic
manner.
Each transition rule is considered a logic of CA, that is, every pattern appear-
ing in a configuration is governed by one transition rule. Then we consider the
Can Machines Learn Logics? 347
We represent the state of a cell at each time step by an atom as: hold(xti )
if xti = 1 and hold(¬xti ) if xti = 0. Then the initial configuration of Figure 2 is
represented by the (infinite) set of atoms:
To cope with the problem using a finite set, we consider the five cells:
in each time step. Table 1 represents evolution of those five cells in the first four
time steps.
Corresponding to the framework provided in Section 2, an agent A produces
S t+1 from an input S t . Given input-output pairs (S 0 , S 1 ), . . . , (S t , S t+1 ), . . . of A
as an input to a machine M, the problem is whether M can identify the transition
rule of this CA. For a pair of configurations (S 0 , S 1 ), the machine M produces a
rule R that represents the states of the cell x0j (i − 1 ≤ j ≤ i + 1) and its neighbors
348 C. Sakama and K. Inoue
in the body of R and represents the state of the cell x1j in the head of R. There are
three such rules:
Since a transition rule does not change during the evolution and it is equally
applied to each cell, the above eight rules are rewritten as
hold(xt+1
i ) ← hold(¬xti−1 ) ∧ hold(¬xti ) ∧ hold(xti+1 ). (6)
hold(xt+1
i ) ← hold(¬xti−1 ) ∧ hold(xti ) ∧ hold(¬xti+1 ). (7)
hold(xt+1
i ) ← hold(xti−1 ) ∧ hold(¬xti ) ∧ hold(¬xti+1 ). (8)
hold(xt+1
i ) ← hold(¬xti−1 ) ∧ hold(xti ) ∧ hold(xti+1 ). (9)
hold(¬xt+1
i ) ← hold(xti−1 ) ∧ hold(xti ) ∧ hold(xti+1 ). (10)
hold(¬xt+1
i ) ← hold(xti−1 ) ∧ hold(xti ) ∧ hold(¬xt+1
i+1 ). (11)
t+1
hold(¬xi ) ← hold(xi−1 ) ∧ hold(¬xi ) ∧ hold(xti+1 ).
t t
(12)
hold(¬xt+1
i ) ← hold(¬xti−1 ) ∧ hold(¬xti ) ∧ hold(¬xti+1 ). (13)
Can Machines Learn Logics? 349
The eight rules (6)–(13) represent the transition rule of the Rule 30. Further, we
get the following rules:
hold(xt+1
i ) ← hold(¬xti−1 ) ∧ hold(xti ). (by (7) and (9))
hold(xt+1
i ) ← hold(¬xti−1 ) ∧ hold(xti+1 ). (by (6) and (9))
hold(¬xt+1
i ) ← hold(xti−1 ) ∧ hold(xti ). (by (10) and (11))
hold(¬xt+1
i ) ← hold(xti−1 ) ∧ hold(xti+1 ). (by (10) and (12))
hold(xt+1
i ) ← (hold(xti−1 ) ∧ hold(¬xti ) ∧ hold(¬xti+1 ))
∨ (hold(¬xti−1 ) ∧ (hold(xti ) ∨ hold(xti+1 ))). (14)
hold(¬xt+1
i ) ← (hold(¬xti−1 ) ∧ hold(¬xti ) ∧ hold(¬xti+1 ))
∨ (hold(xti−1 ) ∧ (hold(xti ) ∨ hold(xti+1 ))). (15)
The rules (14) and (15) represent the Wolfram’s Rule 30.
Learning elementary CA rules is implemented in [6]. Learning elementary
CA rules is simple because it is one-dimensional, two-state, and has the fixed
neighborhood size. On the other hand, identifying CA rules in practice is difficult
because configurations are observed phenomena in the real-world and there is
no teacher agent A in general.
5 Discussion
This paper argues the possibility of discovering logics using AI. Logic is consid-
ered as meta-mathematics here, so the task is to find meta-laws given pairs of
premises and consequences in mathematical or physical domain. On the other
hand, discovering mathematical theorems or scientific laws in the objective theo-
ries has been studied in AI. For instance, Lenat [7] develops the automated math-
ematician (AM) that automatically produces mathematical theorems including
Goldbach’s Conjecture and the Unique Factorization Theorem. Schmidt and
Lipson [13] develop AI that successfully deduces the laws of motion from a pen-
dulum’s swings without a shred of knowledge about physics or geometry. To the
best of our knowledge, however, there are few studies that aim at discovering
logics or meta-theorems.
In Section 2 we address an abstract framework of learning formal systems
based on logics. An interesting question is whether the same or a similar frame-
work can be applied for learning non-logical systems. In this case, a set of input-
output pairs (or premise-consequence pairs) are not given from a teacher agent
A in general, but can be implicitly hidden in log files of dynamic systems or
in dialogues with unknown agents. The machine M has to identify those input-
output relations automatically to output a set of meta-theoretical inference rules
for the domain or inference patterns of those agents. Non-logical inferences are
also used in pragmatics [8]. In conversation or dialogue, the notion of conversa-
tional implicature [4] is known as a pragmatic inference to an implicit meaning
350 C. Sakama and K. Inoue
6 Summary
Answering the question “can machines learn logics?” is one of the challenging
topics in artificial general intelligence. We argued the possibility of realizing
such AI and provided some case studies. A number of questions remain open,
for instance, whether the goal is achieved using existing techniques of machine
learning or AI; which logics are to be learned and which logics are not; whether
non-logical rules are learned as well, etc. Exploring those issues would contribute
to better understanding human intelligence and take us one step closer to real-
izing “strong AI.” Although the abstract framework provided in this paper is
conceptual and case studies are rather simple, the current study serves as a kind
of base-level and would contribute to opening the topic.
References
1. Adamatzky, A.: Identification of Cellular Automata. Taylor & Francis, London
(1994)
2. Bowen, K.A., Kowalski, R.A.: Amalgamating language and metalanguage in
logic programming. In: Clark, K., Tarnlund, S.A. (eds.) Logic Programming,
pp. 153–172. Academic Press (1983)
3. Coradeschi, S., Loutfi, A., Wrede, B.: A short review of symbol grounding in robotic
and intelligent systems. KI - Kunstliche Intelligenz 27, 129–136 (2013)
4. Grice, H.P.: Logic and conversation. In: Cole, P., Morgan, J. (eds.) Syntax and
Semantics, 3: Speech Acts, pp. 41–58. Academic Press (1975)
5. Inoue, K.: Meta-level abduction. IFCoLog Journal of Logic and their Applications
(in print) (2015)
6. Inoue, K., Ribeiro, T., Sakama, C.: Learning from interpretation transition.
Machine Learning 94, 51–79 (2014)
7. Lenat, D.B.: On automated scientific theory formation: a case study using the AM
program. In: Hayes, J.E., Michie, D., Mikulich, O.I. (eds.) Machine Intelligence,
vol. 9, pp. 251–283. Ellis Horwood (1979)
8. Levinson, S.C.: Pragmatics. Cambridge University Press (1983)
Can Machines Learn Logics? 351
1 Introduction
Over the last decade, there has been growing interest in computer models solving
intelligence test problems. Especially, the proposal to establish a psychometric
artificial intelligence (PAI; [3,6]) with the aim to evaluate the intelligence of an
artificial cognitive system based on its performance on a set of tests of intelligence
and mental abilities motivated research in this domain [2].
One of the mental abilities considered by researchers as a fundamental con-
stituent of general intelligence is inductive reasoning [22]. A well established,
culture free test in this domain is Raven Progressive Matrices (RPM; [18]) where
regularities have to be identified in a two-dimensional matrix of geometrical pat-
terns. Another problem domain is inductive reasoning with numbers. In contrast
to RPM, problems are represented in one dimension, that is, as a sequence, and
a certain amount of mathematical knowledge is presupposed. Number series are,
for example, included in two well known intelligence test batteries, namely the
IST [1] and the MIT [25]. To solve RPM as well as number series problems, one
has to analyze the given components, construct a hypothesis about the regu-
larity characterizing all components, generalize this regularity and apply it to
generate a solution.
c Springer International Publishing Switzerland 2015
J. Bieger (Ed.): AGI 2015, LNAI 9205, pp. 352–361, 2015.
DOI: 10.1007/978-3-319-21365-1 36
Computer Models Solving Number Series Problems 353
Table 1. Examples for number series problems. The numbers in brackets represent for
the given series two possible successor sequences.
(such as E6) but difficult for machines and vice versa depending on their under-
lying algorithmic principles. Based on these considerations, Number series may
be characterized according to the following features:
Necessary background knowledge: To solve series, only knowledge of basic arith-
metic operators (or even only of the successor function) is necessary. But series can
become more efficiently solvable with mathematical knowledge such as knowing the
factorial or checksum-functions.
Numerical values: Numbers are typically small in the context of psychometric tests.
We can assume that humans have no problems with large values if they can be
represented in a simple form, such as decimal multiples and we can assume that
numerical values have less impact on performance of computer systems than of
humans.
Structural complexity: Series can be solvable by application of one basic operation
to the predecessor or might depend on complex relations between several prede-
cessors.
Existence of a closed formula: Most number series of interest can be characterized
by a closed formula as given in Table 1. However, some series, such as E6 in Table
1 can be easily described verbally while a closed form is highly sophisticated or
even not known. Other problems even need a switch of perspective, such as 3, 3, 5,
4, 4, 3 which gives the number of letters of the verbal representation of the index.
Rule-Based Systems. In the last four years, two rule-based systems for solving
number series were proposed. Siebers and Schmid [21] presented a semi-analytical
approach where the term structure defining a given number series is guessed
based on heuristic enumeration of term structure. To evaluate the approach,
a generator of number series was realized (see also [5]) and the system was
evaluated with 25,000 randomly created number series resulting in an accuracy
of 93%.
A system based on similar principles is ASolver. However, this system
takes into account plausible restrictions of working memory [23,24]. Systems
356 U. Schmid and M. Ragni
performance was evaluated with 11 (non published) problems from the IQ test
PJP and shown to outperform mathematical tools such as Maple and Wol-
framAlpha.
Responses
ID Number Series Rule f (n) = Human IGOR2 ANN
05 2,5,8,11,14,17,20,23 f (n − 1) + 3 9/3/5 + +
07 25,22,19,16,13,10,7,4 f (n − 1) − 3 16/0/1 + +
19 8,12,16,20,24,28,32,36 f (n − 1) + 4 15/0/2 + +
13 54,48,42,36,30,24,18 f (n − 1) − 6 16/1/0 + +
08 28,33,31,36,34,39,37 f (n − 2) + 3 17/0/0 + +
14 6,8,5,7,4,6,3,5 f (n − 2) − 1 16/0/1 + +
20 9,20,6,17,3,14,0,11 f (n − 2) − 3 16/0/1 - +
01 12,15,8,11,4,7,0,3 f (n − 2) − 4 15/0/2 + +
11 4,11,15,26,41,67,108 f (n − 1) + f (n − 2) 8/1/8 + +
09 3,6,12,24,48,96,192 f (n − 1) × 2 13/1/3 + -
16 7,10,9,12,11,14,13,16 if (even, f (n − 1) + 3, f (n − 1) − 1) 14/0/3 + +
18 8,12,10,16,12,20,14,24 if (even, f (n − 2) + 4, f (n − 2) + 2) 17/0/0 + +
15 6,9,18,21,42,45,90,93 if (even, f (n − 1) + 3, f (n − 1) × 2) 14/1/2 - +
17 8,10,14,18,26,34,50,66 if (even, f (n − 2) + 6 × 2i , f (n − 2) + 8) 13/1/3 - +
10 3,7,15,31,63,127,255 f (n) = 2 × f (n − 1) + 1 12/3/2 + -
04 2,3,5,9,17,33,65,129 f (n − 1) + f (n − 1) − 1 13/1/3 + +
03 2,12,21,29,36,42,47,51 f (n − 1) + 12 − n 14/1/2 - +
02 148,84,52,36,28,24,22 (f (n − 1)/2) + 10 12/2/3 + +
06 2,5,9,19,37,75,149,299 f (n − 1) × 2 + (−1)n 6/4/7 - -
12 5,6,7,8,10,11,14,15 no squares 10/1/6 - +
The number of input nodes from 1 to 3 are varied, but the number of nodes
within the hidden layer from 1 to 20. On average, over all number series, an
increasing number of training iterations was counter-effective, that is, the num-
ber of solvable series was reduced. For 500 iterations 19 number series could
not be solved by any configuration. For 15 000 iterations this number rose to
22. Over all types of configurations there remain 13 number series unsolved.
Furthermore, the ANN approach was applied to number series given in intelli-
gence tests: the 20 problems of IST, also investigated by Strannegård et al. [23]
and the 14 problems of the MIT. Again number of input nodes, hidden layers,
and learning rate were varied as above. Over all configurations 19 out of the
20 IST number series could be solved, one remains unsolved. For the MIT over
all configurations 12 out of the 14 number series could be solved, two remain
unsolved. Analyzing the networks show again, that 3 input nodes and about 5-6
hidden nodes with a low learning rate are the most successful ones. This pattern
appears in all our benchmarks. Ragni and Klein [16] developed 20 number series
as a benchmark for the ANN approach given in Table 3. The problems differed
in the underlying construction principle and varied from simple additions and
multiplications to combinations of these operations. One series (S12) is of the
type studied by Hofstadter [10]: it is composed of the numbers which are not
squares.
Computer Models Solving Number Series Problems 359
2
For more information please refer to [16]
360 U. Schmid and M. Ragni
problems and to invite researchers to discuss and propose other number series
problems – towards a systematic competition in this domain.
References
1. Amthauer, R., Brocke, B., Liepmann, D., Beauducel, A.: Intelligenz-Struktur-Test
2000 (I-S-T 2000). Hogrefe, Goettingen (1999)
2. Besold, T., Hernández-Orallo, J., Schmid, U.: Can machine intelligence be mea-
sured in the same way as human intelligence? KI - Künstliche Intelligenz (2015)
3. Bringsjord, S.: Psychometric artificial intelligence. Journal of Experimental & The-
oretical Artificial Intelligence 23(3), 271–277 (2011)
4. Burghardt, J.: E-generalization using grammars. Artificial Intelligence 165, 1–35
(2005)
5. Colton, S., Bundy, A., Walsh, T.: Automatic invention of integer sequences. In:
AAAI/IAAI, pp. 558–563 (2000)
6. Hernández-Orallo, J., Dowe, D.L., Hernández-Lloreda, M.V.: Universal psycho-
metrics: Measuring cognitive abilities in the machine kingdom. Cognitive Systems
Research 27, 50–74 (2014)
7. Hofmann, J.: Automatische Induktion über Zahlenreihen - Eine Fallstudie zur
Analyse des induktiven Programmiersystems IGOR2 (Automated induction of
number series - A case study analysing the inductive programming system IGOR2).
Master’s thesis, University of Bamberg (December 2012)
8. Hofmann, J., Kitzelmann, E., Schmid, U.: Applying inductive program
synthesis to induction of number series a case study with IGOR2. In: Lutz, C.,
Thielscher, M. (eds.) KI 2014. LNCS, vol. 8736, pp. 25–36. Springer, Heidelberg
(2014)
9. Hofmann, M., Kitzelmann, E., Schmid, U.: A unifying framework for analysis
and evaluation of inductive programming systems. In: Goerzel, B., Hitzler, P.,
Hutter, M. (eds.) Proceedings of the Second Conference on Artificial General Intel-
ligence (AGI-09, Arlington, Virginia, March 6–9 2009), pp. 55–60. Atlantis Press,
Amsterdam (2009)
10. Hofstadter, D.: Fluid Concepts and Creative Analogies. Basic Books, New York
(1995)
11. Holland, J., Holyoak, K., Nisbett, R., Thagard, P.: Induction - Processes of Infer-
ence, Learning, and Discovery. MIT Press, Cambridge (1986)
12. Holzman, T.G., Pellegrino, J.W., Glaser, R.: Cognitive variables in series comple-
tion. Journal of Educational Psychology 75(4), 603–618 (1983)
13. Mahabal, A.A.: Seqsee: A concept-centred architecture for sequence perception.
Ph.D. thesis, Indiana University Bloomington (2009)
14. Meredith, M.J.E.: Seek-whence: a model of pattern perception. Tech. rep., Indiana
Univ., Bloomington (USA) (1986)
15. Milovec, M.: Applying Inductive Programming to Solving Number Series Problems
- Comparing Performance of IGOR with Humans. Master’s thesis, University of
Bamberg (September 2014)
Computer Models Solving Number Series Problems 361
16. Ragni, M., Klein, A.: Predicting numbers: an AI approach to solving number series.
In: Bach, J., Edelkamp, S. (eds.) KI 2011. LNCS, vol. 7006, pp. 255–259. Springer,
Heidelberg (2011)
17. Ragni, M., Klein, A.: Solving number series - architectural properties of successful
artificial neural networks. In: Madani, K., Kacprzyk, J., Filipe, J. (eds.) NCTA 2011
- Proceedings of the International Conference on Neural Computation Theory and
Applications, pp. 224–229. SciTePress (2011)
18. Raven, J., et al.: Raven progressive matrices. In: Handbook of nonverbal assess-
ment, pp. 223–237. Springer (2003)
19. Sanghi, P., Dowe, D.L.: A computer program capable of passing I.Q. tests. In:
Slezak, P.P. (ed.) Proc. Joint 4th Int. Conf. on Cognitive Science, & 7th Conf. of
the Australasian Society for Cognitive Science (ICCS/ASCS-2003), pp. 570–575.
Sydney, NSW, Australia (2003)
20. Schmid, U., Kitzelmann, E.: Inductive rule learning on the knowledge level. Cog-
nitive Systems Research 12(3), 237–248 (2011)
21. Siebers, M., Schmid, U.: Semi-analytic natural number series induction. In:
Glimm, B., Krüger, A. (eds.) KI 2012. LNCS, vol. 7526, pp. 249–252. Springer,
Heidelberg (2012)
22. Sternberg, R.J. (ed.): Handbook of Intelligence. Cambridge University Press (2000)
23. Strannegård, C., Nizamani, A.R., Sjöberg, A., Engström, F.: Bounded Kolmogorov
complexity based on cognitive models. In: Kühnberger, K.-U., Rudolph, S.,
Wang, P. (eds.) AGI 2013. LNCS, vol. 7999, pp. 130–139. Springer, Heidelberg
(2013)
24. Strannegård, C., Amirghasemi, M., Ulfsbäcker, S.: An anthropomorphic method
for number sequence problems. Cognitive Systems Research 22–23, 27–34 (2013)
25. Wilhelm, O., Conrad, W.: Entwicklung und Erprobung von Tests zur Erfassung
des logischen Denkens. Diagnostica 44, 71–83 (1998)
Emotional Concept Development
1 Introduction
One strategy toward artificial general intelligence (AGI) uses mathematical
methods developed without regard to natural intelligence [23]. A second strategy
imitates the mechanisms of human psychology [3,18]. A third tries to simulate
the human brain at the neural level – as attempted in the BRAIN Initiative and
the Human Brain Project. A fourth tries to imitate computational mechanisms
that are present in nervous systems across the animal kingdom [1,6].
Bees have less than a million neurons in their brains, yet they are able to learn
new concepts with the help of reward and punishment and adapt to a wide range
of environments [9,24]. Bees are arguably more flexible and better at adapting
to new environments than present-day AI systems, so it might be possible to
create more flexible AI systems by mimicking certain of their computational
mechanisms.
c Springer International Publishing Switzerland 2015
J. Bieger (Ed.): AGI 2015, LNAI 9205, pp. 362–372, 2015.
DOI: 10.1007/978-3-319-21365-1 37
Emotional Concept Development 363
new nodes’ utility predictions are statistically different from the current ones.
This and similar models are not limited to sequential partitions of observations:
it is possible to generate trees using an arbitrary metric, to compare histories [22]
within a fully Bayesian framework.
Marsella and colleagues [14] survey computational models of emotion, includ-
ing models based on appraisal theory; while Bach [2] offers a framework for
modeling emotions.
Section 2 presents our network model and Section 3 describes computations
in such models. Section 4 offers an algorithm for developing these networks
automatically. Section 5 presents results. Section 6 draws some preliminary con-
clusions.
2 Transparent Networks
Definition 1 (Network). A (transparent) network is a finite, labeled, directed,
and acyclic graph (V, E) where nodes a ∈ V may be labeled:
– SEN SORi , where i ∈ ω (fan-in 0)
– M OT OR (fan-in 1, fan-out 0)
– AN D (fan-in 2)
– OR (fan-in 2)
– DELAY (fan-in 1)
– REV ERB (fan-in 1)
The fan-in and fan-out conditions in parentheses are restrictions on E. Each
(a, b) ∈ E has an associated weight w(a, b) ∈ [0, 1].
Nodes labeled SEN SORi model sensors of modality i. SEN SORi could e.g.
model a receptor cell with ion channels sensitive to cold temperature, mechani-
cal pressure, or acidity. Nodes labeled M OT OR model muscle-controlling motor
neurons. Nodes labeled AN D and OR model nerve cells with high and low
thresholds respectively. Nodes labeled DELAY model nerve cells that re-
transmit action potentials with a delay. Nodes labeled REV ERB model nerve
cells or nerve-cell clusters that stay active (i.e., reverberate) for some time after
they have been excited. Figure 1 provides example networks. Note that some
nodes that appear in figures throughout this paper have labels that do not appear
in Definition 1. They represent sensors or more complex networks computing the
concept indicated by the label.
3 Network Computation
Definition 2 (Stimulus). Let G = (V, E) be a network and let S(V ) consist
of the sensors of V , i.e. those nodes that are labeled SEN SORi , for some i.
A stimulus for G is a function σ : S(V ) → {0, 1}.
AND AND
MOTOR
Fig. 1. Examples of transparent networks. (a) The tentacle of an anemone that retracts
upon being touched. (b) The letter H immediately followed by the letter I. (c) Lightning
followed by thunder (within ten time steps of the system).
æ æ æ
4 Network Development
Next, we define the network-development mechanism that generates a sequence
of networks G0 , G1 , . . . from input stream σ0 , σ1 , . . . and initial network G0 .
The initial graph G0 is called the genotype; all graphs Gn+1 are phenotypes.
For each n, Gn+1 is obtained either by extending Gn or trimming Gn . As in
natural nervous systems, activity continues to flow in the networks while they
are being modified. The definitions of activity propagation can be taken directly
from fixed graphs and applied to graph sequences. First, we must introduce some
basic concepts pertaining to networks.
Positive reward signals model reward; negative reward signals model punishment.
AND
AND
Fig. 4. Unimodal spatial construction: formation of a memory structure for the taste
of a certain apple. (a) The sensors for low bitterness, low sourness, and high sweetness
are activated. (b) Two of the top active nodes are randomly selected and joined. (c)
The only top active nodes are joined.
AND
DELAY I
H I H I H
AND
AND AND
DELAY p
Fig. 6. Multimodal spatial construction: when the top node was formed, the two nodes
representing apple taste and the phonetic sequence [æpl] were active and the level
of arousal was sufficiently high. At present, only apple taste is active, giving rise to
imagination in the form of the word [æpl].
With the terminology in place, we are ready to define the network devel-
opment algorithm: see Algorithm 1, where f lip(p) is the result of flipping a
weighted coin that produces outcome 1 with probability p.
Figures 4 and 6 offer examples of network development processes generated
by Algorithm 1. Figure 4 shows the formation of a memory of apple taste. Figure
5 shows the formation of a memory of the written word ”HI”. A memory of the
spoken word [æpl], shown in Figure 2 (a), can be formed analogously, but it
requires one repetition of the sequence [æpl]. Figure 6, finally, shows how the
apple taste and apple word networks are joined.
370 C. Strannegård et al.
5 Results
Algorithm 1 was implemented in Python 2.7 using the graphic package Graphviz
for visualization. All of the development processes described in this paper were
obtained using this program and straightforward input streams.
Figures 1–6 illustrate how networks are formed by the algorithm. In this case
the algorithm develops exactly the desired memory structures with no undesir-
able structures as side effects. The algorithm gravitates toward memories that
are emotionally intense, frequently repeated, or both.
6 Conclusion
Our study indicates that artificial emotions are well suited for guiding the devel-
opment of dynamic networks by regulating the quality and quantity of memories
formed and removed. The presented network model and network development
mechanism are relatively simple and were mainly devised for presenting the idea
of emotional concept development. Both can clearly be improved and elabo-
rated in several directions. We conclude that artificial emotions can be fruitful,
not only for guiding behavior, but also for controlling concept development.
Emotional Concept Development 371
References
1. Abbeel, P., Coates, A., Quigley, M., Ng, A.Y.: An application of reinforcement
learning to aerobatic helicopter flight. Advances in Neural Information Processing
Systems 19, 1 (2007)
2. Bach, J.: A framework for emergent emotions, based on motivation and cognitive
modulators. International Journal of Synthetic Emotions (IJSE) 3(1), 43–63 (2012)
3. Bach, J.: MicroPsi 2: The Next Generation of the MicroPsi Framework. In:
Bach, J., Goertzel, B., Iklé, M. (eds.) AGI 2012. LNCS, vol. 7716, pp. 11–20.
Springer, Heidelberg (2012)
4. Bechara, A., Damasio, H., Damasio, A.R.: Role of the amygdala in decision-making.
Annals of the New York Academy of Sciences 985(1), 356–369 (2003)
5. Begleiter, R., El-Yaniv, R., Yona, G.: On prediction using variable order markov
models. Journal of Artificial Intelligence Research, 385–421 (2004)
6. Bengio, Y.: Learning deep architectures for ai. Foundations and trends in Machine
Learning 2(1), 1–127 (2009)
7. Blum, A.L., Langley, P.: Selection of relevant features and examples in machine
learning. Artificial Intelligence 97(1), 245–271 (1997)
8. Colton, S., Bundy, A., Walsh, T.: Automatic concept formation in pure mathe-
matics (1999)
9. Gould, J.L., Gould, C.G., et al.: The honey bee. Scientific American Library (1988)
10. Johansen, J.P., Diaz-Mataix, L., Hamanaka, H., Ozawa, T., Ycu, E., Koivumaa, J.,
Kumar, A., Hou, M., Deisseroth, K., Boyden, E.S., et al.: Hebbian and neuromod-
ulatory mechanisms interact to trigger associative memory formation. Proceedings
of the National Academy of Sciences 111(51), E5584–E5592 (2014)
11. Lebovitz, M.: Experiments with incremental concept formation. Machine Learning
2, 103–138 (1987)
12. LeDoux, J.: Emotion circuits in the brain (2003)
13. LeDoux, J.E.: Emotional memory systems in the brain. Behavioural Brain Research
58(1), 69–79 (1993)
14. Marsella, S., Gratch, J., Petta, P.: Computational models of emotion. A Blueprint
for Affective Computing-A sourcebook and Manual, 21–46 (2010)
15. McCallum, R.A.: Instance-based utile distinctions for reinforcement learning with
hidden state. In: ICML, pp. 387–395 (1995)
16. Pickett, M., Oates, T.: The Cruncher: Automatic Concept Formation Using Min-
imum Description Length. In: Zucker, J.-D., Saitta, L. (eds.) SARA 2005. LNCS
(LNAI), vol. 3607, pp. 282–289. Springer, Heidelberg (2005)
17. Richter-Levin, G., Akirav, I.: Emotional tagging of memory formationØl’ the search
for neural mechanisms. Brain Research Reviews 43(3), 247–256 (2003)
18. Rosenbloom, P.S.: The sigma cognitive architecture and system. AISB Quarterly
136, 4–13 (2013)
19. Schmidt, M., Niculescu-Mizil, A., Murphy, K., et al.: Learning graphical model
structure using l1-regularization paths. In: AAAI. vol. 7, pp. 1278–1283 (2007)
372 C. Strannegård et al.
20. Strannegård, C., von Haugwitz, R., Wessberg, J., Balkenius, C.: A Cognitive
Architecture Based on Dual Process Theory. In: Kühnberger, K.-U., Rudolph, S.,
Wang, P. (eds.) AGI 2013. LNCS, vol. 7999, pp. 140–149. Springer, Heidelberg
(2013)
21. Tenenbaum, J.B., Kemp, C., Griffiths, T.L., Goodman, N.D.: How to grow a mind:
Statistics, structure, and abstraction. Science 331(6022), 1279–1285 (2011)
22. Tziortziotis, N., Dimitrakakis, C., Blekas, K.: Cover tree bayesian reinforcement
learning. The Journal of Machine Learning Research 15(1), 2313–2335 (2014)
23. Veness, J., Ng, K.S., Hutter, M., Uther, W., Silver, D.: A monte-carlo aixi approx-
imation. Journal of Artificial Intelligence Research 40(1), 95–142 (2011)
24. Witthöft, W.: Absolute anzahl und verteilung der zellen im him der honigbiene.
Zeitschrift für Morphologie der Tiere 61(1), 160–184 (1967)
The Cyber-Physical System Approach Towards
Artificial General Intelligence: The Problem
of Verification
1 Introduction
Cyber-physical systems (CPSs) are in the forefront of algorithmic, software,
and hardware developments. They are goal oriented. In the typical setting they
are distributed, have physical components, and can include e.g., sensory, com-
putational and robotic units. Given their complexity, testing may become the
bottleneck, especially for safety- and time-critical applications. In case of any
unexpected event or anomaly in the behavior, fast workflow management may
become a necessity and might involve changes of the plan and thus communi-
cation of new subtasks, new roles, and new methods of communication, among
other things. We say that a simple instruction or a more complex subtask make
sense in a given context, if the responsible actors can execute them given the
information provided. Successful completion of an instruction or a subtask ver-
ifies a portion of a larger plan. The larger the plan and the more complex the
system, the more serious anomalies may occur. In turn, stochastic formulation
is required.
We shall put forth the factored event-learning framework (fELF), a special
form of reinforcement learning (RL), that has polynomial time learning char-
acteristics and the maximal number of concurrent and dependent factors limits
c Springer International Publishing Switzerland 2015
J. Bieger (Ed.): AGI 2015, LNAI 9205, pp. 373–383, 2015.
DOI: 10.1007/978-3-319-21365-1 38
374 Z. Tősér and A. Lőrincz
the exponent of the state space (Sect. 2). We illustrate fELF via a toy mockup
explosive device (ED) removal task (Sect. 3). Up to the number of variables, the
solution is ‘hard to find ’. In the discussion section (Sect. 4) we will argue that
this problem is ‘easy to verify’ by following the steps in time as prescribed by
the solution. Such solutions are worth to communicate. We conjecture that IQ
tests are of similar nature. Conclusions will be drawn in Sect. 5.
2 Theoretical Background
We propose the MDP framework for CPSs. We utilize the generalized MDP
(gMDP) formulation. Its ε-gMDP extension concerns ε-precise quantities and
can exploit robust controllers if they meet the ε-precise condition. We review
the event-learning framework (ELF) [8,16] that breaks tasks into subtasks, can
admit ε-precise robust controllers and can hide some of the variables. An ELF
extended with robust controllers is an ε-gMDP. The factored formulation of MDP
gives rise to polynomial time optimization. Taken together, a factored generalized
ELF with a robust controller is an ε-gMDP with polynomial time optimization.
Execution requires the communication of instructions to the subcomponents
making verification linear in time for deterministic systems.
where X, A, R are defined as above; :(X × A × X → R) → (X × A → R)
is an ‘expected value-type’ operator and : (X × A → R) → (X → R) is a
‘maximization-type’ operator. We want to find the value function V ∗ , where
V ∗ (x) = (R(x, a, y) + γV ∗ (y)), for all x ∈ X.
or in short form V ∗ = (R + γV ∗ ). The optimal value function can be
interpreted as the total reward received by an agent behaving optimally in a
non-deterministic environment.
The operator describes the effect of the envi-
ronment. The operator describes
the action-selection of an optimal agent.
When 0 ≤ γ < 1, and both and are non-expansions, the optimal solution
V ∗ of the equations exists and it is unique.
Generalized ε-MDP (ε-gMDP)
a prescribed ε > 0 and is defined by
assumes
the tupleX, A, R, { t }, { t }, with t : (X × A × X → R) → (X × A →
R) and t : (X × A → R)
→ (X → R), t = 1, 2, 3, .
. .,
if there
exists a
generalized MDP X, A, R, , such that lim supt→∞ t t − ≤ ε.
ε-MDPs have been first introduced in [7].
Event learning turns the MDP into a hierarchical problem via the event-value
function E : X × X → R [16]. Pairs of states (x, y) and (x, y d ) are called events
and desired events, respectively: for a given initial state x, y d denotes the desired
next state. The formalism remains the same, but any event can be seen as an
MDP subtask : the ed = (x, y d ) state sequence can be a subtask to be optimized.
E(x, y d ) is the value of trying to get from actual state x to next state y d . Note
that state y reached could differ from desired state y d .
Assume that a state space X and a velocity field v d : X → Ẋ are given. At time
t, the system is in state xt with velocity vt . We are looking for a control action
that modifies the actual velocity to v d (xt ) with maximum probability:
Φ(xt , vtd ) is called the inverse dynamics and it can be approximated. Under
certain conditions, one can bound the tracking error to the desired level (see [16]
and the references therein).
If time is discrete, like here, then prescribing the desired velocity v d is equiv-
alent to prescribing the desired successor state y d . The controller can be directly
inserted into an ELF by setting πtA (xt , ytd , a) = 1 if a = ut (xt , ytd ) and 0 otherwise
(Fig. 1).
376 Z. Tősér and A. Lőrincz
Fig. 1. MDP models. (a): MDP, (b): One step. Input: index of the action, output:
experienced next state, (c): ELF, (d): One step. Input: desired output and the output
can be the ε-precise version of the desired output.
In the generalized ε-MDP, X denotes the set of states and the action
corresponds to selecting a new desired state; the set of actions A is
also equal to X. Reward function R is R(x, y d , y) and it gives the
reward
for arriving at y from x, when the desired statewas y d . Now,
( t E)(x) = maxyd E(x, y d ), independently ofA t, and ( t E)(x, y d ) =
d d d d
y pt (y|x, y )E(x, y , y), where
pt (y|x, y ) = u πt (x, y , u)P (x, u, y). Finally,
we
define the operators
and as ( E)(x) = maxyd E(x, y d ) and
( E)(x, y d ) = y u π A (x, y d , u)P (x, u, y)E(x, y d , y). In turn, if robust con-
trollers are introduced into an ELF, then we still have an ε-gMDP problem with
errors that can be bounded.
as the initial values of the MDP, then the number of time steps when the agent
makes non-near-optimal moves, i.e., when E fOIM (xt , ytd ) < E × (xt , ytd ) − ε , is
bounded by
2
Rmax m4 Nf |A| mNf |A|
O ε4 (1−γ)4 log3 1δ log2 ε
cleared’, ‘ED collected’, ‘terrain cleared’, ‘track is left’, ‘ED removed’, among a
few others. Some tasks are concurrent. The low-complexity RL task in [14] is
similar and thus direct policy optimization is also possible, in case if the Markov
property is questionable. FMDP description is like in [6]: transitions are limited
to the possible ones.
Explosion time has a distribution. The fELF makes decisions at discrete time
steps according to the time elapsed, the subtasks executed, the ongoing subtasks,
and the time-discretized distributions.
3.1 Results
According to the results (Fig. 3), there are three typical groups in the time
variable: execution time is shorter than 2 min, it is longer than 2 min 20 sec and
it is between these two values. We used these values for the discretization of the
execution time.
The size of the state space in the fMDP depends on the number of factored
variables at decision points. This number can be decreased if controllers are
precise. For example, the NXT robot is sufficiently precise and direction uncer-
tainties are neglected. NXT can clear away obstacles with certain probabilities,
but it remains uncertain if it succeeded to move an obstacle out of the way of
the WP robot or not. The motion of the WP robot is straight, but its direction
is somewhat imprecise. We left it like this and that made uncertain the success
of each obstacle clear-away subtask. Uncertainties measured experimentally and
the computed direction uncertainties are used in decision making. The number
of obstacles is randomly chosen from 2, 3 and 4 and are placed quasi-randomly
over the terrain.
The Cyber-Physical System Approach Towards AGI 379
Fig. 3. Examples for estimating distributions. Green S: success. Red F: failure. For
start positions, see Fig. 2(a). Failures in order: obstacle 1 is not cleared away, NXT-
WP crashed, obstacles 3 and 4 are not cleared away.
‘easy’. Tasks can be hard or easy if they scale exponentially or polynomially with
the number of variables, respectively. Out of the four cases, problems belong-
ing to the hard to solve, but easy to verify category are particularly worth to
communicate. Such solutions can provide large savings in time and efforts for
teammates.
Our example has both procedural and episodic components. Any event is an
episode and it can be saved in episodic memory for data mining, anomaly detec-
tion, model construction, and for learning to predict and control the event. The
method of dealing with an ongoing event is the procedure. It is made of actions
and sub-events. The ‘ED removal story’ is an ‘ED removal event’ brought off
by the ‘ED removal procedure’. This event may be concurrent with other events
and it is probably embedded into a larger one. The event, as described here is
independent from the other ongoing concurrent events, which in principle, could
disturb it. However, such disturbance is also an event and it is limited in space
and time. New concepts, new sensors and additional control tools can be intro-
duced to overcome disturbances of the events provided that the details of the
event are knowable, time is available and if the related costs and savings justify
the effort.
From the point of view of a larger system, ‘ED removal’ could be one of its
capabilities. Capabilities, i.e., the number of different events that can be invoked
by the agent, correspond to desired states in a fELF and they make the variables
of decision making. The number of events that can be invoked in a given state
enters exponent of state space. The size of the state space can be decreased by
learning and optimizing new capabilities made of smaller ones. The number of
variables can be decreased by introducing robust controllers. For example, the
measurement of the weight of the load can be neglected by adding a robust
controller to increase the range of the capability, see. e.g. the example presented
in [16]. Communication towards the decision making unit can be limited to the
experienced state after execution of a sub-task and to an instruction towards the
unit that has the capability to execute the next step. Such instruction contains
the desired state and possibly (some of) the steps towards the desired state, i.e.,
(part of) the ‘solution’ .
In turn, a fELF with robust controllers efficiently decreases both the number
of variables and the data to be communicated. From the point of view of verifi-
cation, deterministic solutions are easy to verify if a model of the environment
is available. For stochastic problems, stochasticity indicates limited knowledge
about a knowable universe and may call for further exploration and learning.
If more knowledge cannot be acquired in due course or if the collection of such
information is costly, then solutions and verifications may require high costs since
risks can be overestimated. Model based experimental methods of risk estimation
are in the focus of ongoing research [1].
382 Z. Tősér and A. Lőrincz
5 Conclusions
We have used an illustrative CPS mockup experiment in the factored event
learning framework (fELF). The problem involved recognition, planning, deci-
sion making, work sharing, and risk estimation. We included distributions of
execution times and success rates either via computational estimations or by
measuring those experimentally.
We have argued that a fELF with a robust controller decreases combinatorial
explosion. From the point of view of deterministic CPS problems, verification is
polynomial in the number of states [2]. If we can afford non-tight bounds and
additional resources, then experimental verification can be fast, if a model of the
environment is available [1].
It has been noted that the problem of verification is alleviated by subtask con-
struction provided that the subtasks can be executed with high fidelity. Robust
controllers suit such demands and can save task execution even in the case of
environmental disturbances. Any subtask can be viewed as a fELF problem and
as such, it can be the subject of optimization. In the same vein, optimized fELF
solutions can be embedded into larger tasks. In turn, fELF makes a partially
ordered hierarchical RL in a natural fashion.
We note that time critical cyber-physical systems require easy to verify solu-
tions. Such solutions are of high importance for interacting intelligences, since
they offer combinatorial gains for teammates. Furthermore, communication can
be limited to meta-level instructions about the states to be reached and meta-
level information about the states that have been reached upon the execution of
the instructions. CPS verification assumes approximately non-interacting sub-
events that can run concurrently or may follow each other.
We conclude that CPS tasks concern fluid intelligence and — for large dis-
tributed systems — model based real-time verification is required and the time
of verification is critical. Finding and learning potentially concurrent, but barely
interacting, i.e., independently and robustly executable sub-tasks derived from
the task space itself offer both exponential gains in the state space and flexibil-
ity in multi-tasking. Evolution demonstrates the feasibility of such constructs [5]
and engineered solutions may follow similar routes. However, from the point of
view of artificial general intelligence this is an unsolved problem. This problem is
closely related to task oriented episodic and procedural memories and it deserves
further investigations.
Acknowledgments. Thanks are due to Richárd Bellon, Dávid Hornyák, Mike Olasz,
and Róbert Rill for running the experiments. Research was supported by the European
Union and co-financed by the European Social Fund (grant no. TÁMOP 4.2.1./B-
09/1/KMR-2010-0003) and by the EIT ICTLabs grant on CPS for Smart Factories.
The Cyber-Physical System Approach Towards AGI 383
References
1. Altmeyer, S., Cucu-Grosjean, L., Davis, R.I.: Static probabilistic timing analysis
for real-time systems using random replacement caches. Real-Time Systems 51(1),
77–123 (2015)
2. Angluin, D.: A note on the number of queries needed to identify regular languages.
Information and Control 51(1), 76–87 (1981)
3. Boutilier, C., Dearden, R., Goldszmidt, M., et al.: Exploiting structure in policy
construction. IJCAI 14, 1104–1113 (1995)
4. Carroll, J.B.: The higher-stratum structure of cognitive abilities. In: The Scientific
Study of General Intelligence, ch., pp. 5–21. Pergamon (2003)
5. Graziano, M.: The organization of behavioral repertoire in motor cortex. Annu.
Rev. Neurosci. 29, 105–134 (2006)
6. Gyenes, V., Bontovics, Á., Lőrincz, A.: Factored temporal difference learning in
the New Ties environment. Acta Cybern. 18(4), 651–668 (2008)
7. Kalmár, Z., Szepesvári, C., Lőrincz, A.: Module-based reinforcement learning:
Experiments with a real robot. Machine Learning 31, 55–85 (1998)
8. Lőrincz, A., Pólik, I., Szita, I.: Event-learning and robust policy heuristics. Cogni-
tive Systems Research 4(4), 319–337 (2003)
9. Orlosky, J., Toyama, T., Sonntag, D., Sárkány, A., Lőrincz, A.: On-body multi-
input indoor localization for dynamic emergency scenarios. In: IEEE Int. Conf. on
Pervasive Comp. Comm. Workshop, pp. 320–325. IEEE (2014)
10. Puterman, M.: Markov decision processes. John Wiley & Sons, New York (1994)
11. Ribeiro, L., Rocha, A., Veiga, A., Barata, J.: Collaborative routing of products
using a self-organizing mechatronic agent framework - a simulation study. Comp.
Ind. 68, 27–39 (2015)
12. Schmidhuber, J.: Deep learning in neural networks: An overview. Neural Networks
61, 85–117 (2015)
13. Szepesvári, C., Littman, M.L.: Generalized Markov decision processes. In: Pro-
ceedings of International Conference of Machine Learning 1996, Bari (1996)
14. Szita, I., Lőrincz, A.: Learning to play using low-complexity rule-based policies. J.
Artif. Int. Res. 30, 659–684 (2007)
15. Szita, I., Lőrincz, A.: Optimistic initialization and greediness lead to polynomial
time learning in factored MDPs. In: Int. Conf. Mach. Learn., pp. 1001–1008. Omni-
press (2009)
16. Szita, I., Takács, B., Lőrincz, A.: Epsilon-MDPs. J. Mach. Learn. Res. 3, 145–174
(2003)
Analysis of Types of Self-Improving Software
Roman V. Yampolskiy()
1 Introduction
Since the early days of computer science, visionaries in the field anticipated creation
of a self-improving intelligent system, frequently as an easier pathway to creation of
true artificial intelligence1. As early as 1950 Alan Turing wrote: “Instead of trying to
produce a programme to simulate the adult mind, why not rather try to produce one
which simulates the child’s? If this were then subjected to an appropriate course of
education one would obtain the adult brain. Presumably the child-brain is something
like a notebook as one buys from the stationers. Rather little mechanism, and lots of
blank sheets... Our hope is that there is so little mechanism in the child-brain that
something like it can be easily programmed. The amount of work in the education we
can assume, as a first approximation, to be much the same as for the human child” [1].
Turing’s approach to creation of artificial (super)intelligence was echoed by
I.J. Good, Marvin Minsky and John von Neumann, all three of whom published on it
(interestingly in the same year, 1966): Good - “Let an ultraintelligent machine be
defined as a machine that can far surpass all the intellectual activities of any man
however clever. Since the design of machines is one of these intellectual activities, an
ultraintelligent machine could design even better machines; there would then unques-
tionably be an ‘intelligence explosion,’ and the intelligence of man would be left far
behind. Thus the first ultraintelligent machine is the last invention that man need ever
make” [2]. Minsky - “Once we have devised programs with a genuine capacity for
self-improvement a rapid evolutionary process will begin. As the machine improves
both itself and its model of itself, we shall begin to see all the phenomena associated
1
This paper is based on material excerpted, with permission, from the book - Artificial Superintelligence: a
Futuristic Approach © 2015 CRC Press.
© Springer International Publishing Switzerland 2015
J. Bieger (Ed.): AGI 2015, LNAI 9205, pp. 384–393, 2015.
DOI: 10.1007/978-3-319-21365-1_39
Analysis of Types of Self-Improving Software 385
with the terms “consciousness,” “intuition” and “intelligence” itself. It is hard to say
how close we are to this threshold, but once it is crossed the world will not be the
same” [3]. Von Neumann - “There is thus this completely decisive property of com-
plexity, that there exists a critical size below which the process of synthesis is dege-
nerative, but above which the phenomenon of synthesis, if properly arranged, can
become explosive, in other words, where syntheses of automata can proceed in such a
manner that each automaton will produce other automata which are more complex
and of higher potentialities than itself” [4]. Similar types of arguments are still being
made today by modern researchers and the area of RSI research continues to grow in
popularity [5-7], though some [8] have argued that recursive self-improvement
process requires hyperhuman capability to “get the ball rolling”, a kind of “Catch 22”
2.1 Self-Modification
Self-Modification does not produce improvement and is typically employed for code
obfuscation to protect software from being reverse engineered or to disguise self-
replicating computer viruses from detection software. While a number of obfuscation
techniques are known to exist [9], ex. self-modifying code [10], polymorphic code, me-
tamorphic code, diversion code [11], none of them are intended to modify the underlying
algorithm. The sole purpose of such approaches is to modify how the source code looks
to those trying to understand the software in questions and what it does [12].
2.2 Self-Improvement
Self-Improvement or Self-adaptation [13] is a desirable property of many types of
software products [14] and typically allows for some optimization or customization
of the product to the environment and users it is deployed with. Common examples of
such software include evolutionary algorithms such as Genetic Algorithms [15-20] or
Genetic Programming which optimize software parameters with respect to some well
understood fitness function and perhaps work over some highly modular program-
ming language to assure that all modifications result in software which can be com-
piled and evaluated. The system may try to optimize its components by creating inter-
nal tournaments between candidate solutions. Omohundro proposed the concept of
efficiency drives in self-improving software [21]. Because of one of such drives,
balance drive, self-improving systems will tend to balance the allocation of resources
between their different subsystems. If the system is not balanced overall performance
386 R.V. Yampolskiy
of the system could be increased by shifting resources from subsystems with small
marginal improvement to those with larger marginal increase [21]. While perfor-
mance of the software as a result of such optimization may be improved the overall
algorithm is unlikely to be modified to a fundamentally more capable one.
Additionally, the law of diminishing returns quickly sets in and after an initial sig-
nificant improvement phase, characterized by discovery of “low-hanging fruit”, future
improvements are likely to be less frequent and less significant, producing a Bell
curve of valuable changes. Metareasoning, metalearning, learning to learn, and
lifelong learning are terms which are often used in the machine learning literature to
indicate self-modifying learning algorithms or the process of selecting an algorithm
which will perform best in a particular problem domain [22]. Yudkowsky calls such
process non-recursive optimization – a situation in which one component of the
system does the optimization and another component is getting optimized [23].
In the field of complex dynamic systems, aka chaos theory, positive feedback sys-
tems are well known to always end up in what is known as an attractor- a region
within system’s state space that the system can’t escape from [24]. A good example of
such attractor convergence is the process of Metacompilation or Supercompilation
[25] in which a program designed to take source code written by a human program-
mer and to optimize it for speed is applied to its own source code. It will likely
produce a more efficient compiler on the first application perhaps by 20%, on the
second application by 3%, and after a few more recursive iterations converge to a
fixed point of zero improvement [24].
It is believed that AI systems will have a number of advantages over human pro-
grammers making it possible for them to succeed where we have so far failed. Such ad-
vantages include [26]: longer work spans (no breaks, sleep, vocation, etc.), omniscience
(expert level knowledge in all fields of science, absorbed knowledge of all published
works), superior computational resources (brain vs processor, human memory vs RAM),
communication speed (neurons vs wires), increased serial depth (ability to perform se-
quential operations in access of about a 100 human brain can manage), duplicability
(intelligent software can be instantaneously copied), editability (source code unlike DNA
can be quickly modified), goal coordination (AI copies can work towards a common goal
without much overhead), improved rationality (AIs are likely to be free from human
cognitive biases) [27], new sensory modalities (native sensory hardware for source code),
blending over of deliberative and automatic processes (management of computational
resources over multiple tasks), introspective perception and manipulation (ability to ana-
lyze low level hardware, ex. individual neurons), addition of hardware (ability to add new
memory, sensors, etc.), advanced communication (ability to share underlying cognitive
representations for memories and skills) [28].
Chalmers [29] uses logic and mathematical induction to show that if an AI0 system
is capable of producing only slightly more capable AI1 system generalization of that
process leads to superintelligent performance in AIn after n generations. He articu-
lates, that his proof assumes that the proportionality thesis, which states that increases
in intelligence lead to proportionate increases in the capacity to design future genera-
tions of AIs, is true.
Nivel et al. proposed formalization of RSI systems as autocatalytic sets – collec-
tions of entities comprised of elements, each of which can be created by other ele-
ments in the set making it possible for the set to self-maintain and update itself. They
also list properties of a system which make it purposeful, goal-oriented and self-
organizing, particularly: reflectivity – ability to analyze and rewrite its own structure;
autonomy – being free from influence by system’s original designers (bounded auton-
omy – is a property of a system with elements which are not subject to self-
modification); endogeny – an autocatalytic ability [30]. Nivel and Thorisson also
attempt to operationalize autonomy by the concept of self-programming which they
insist has to be done in an experimental way instead of a theoretical way (via proofs
of correctness) since it is the only tractable approach [31].
Yudkowsky writes prolifically about recursive self-improving processes and
suggests that introduction of certain concepts might be beneficial to the discussion,
specifically he proposes use of terms - Cascades, Cycles and Insight which he defines
as: Cascades – when one development leads to another; Cycles – repeatable cascade
in which one optimization leads to another which in turn benefits the original optimi-
zation; Insight – new information which greatly increases one’s optimization ability
[32]. Yudkowsky also suggests that the goodness and number of opportunities in the
space of solutions be known as Optimization Slope while optimization resources and
optimization efficiency refer to how much of computational resources an agent has
access to and how efficiently the agent utilizes said resources. An agent engaging in
an optimization process and able to hit non-trivial targets in large search space [33] is
described as having significant optimization power [23].
388 R.V. Yampolskiy
4 Conclusions
Recursively Self-Improving software is the ultimate form of artificial life and creation
of life remains one of the great unsolved mysteries in science. More precisely, the
390 R.V. Yampolskiy
problem of creating RSI software is really the challenge of creating a program capable
of writing other programs [64], and so is an AI-Complete problem as has been
demonstrated by Yampolskiy [65, 66]. AI-complete problems are by definition most
difficult problems faced by AI researchers and it is likely that RSI source code will be
so complex that it would be difficult or impossible to fully analyze [51]. Also, the
problem is likely to be NP-Complete as even simple metareasoning and metalearning
[67] problems have been shown by Conitzer and Sandholm to belong to that class. In
particular they proved that allocation of deliberation time across anytime algorithms
running on different problem instances is NP-Complete and a complimentary problem
of dynamically allocating information gathering resources by an agent across multiple
actions is NP-Hard, even if evaluating each particular action is computationally
simple. Finally, they showed that the problem of deliberately choosing a limited
number of deliberation or information gathering actions to disambiguate the state of
the world is PSPACE Hard in general [68].
This paper is a part of a two paper set presented at AGI2015 with the complemen-
tary paper being: “On the Limits of Recursively Self-Improving AGI” [69].
References
1. Turing, A.: Computing Machinery and Intelligence. Mind 59(236), 433–460 (1950)
2. Good, I.J.: Speculations Concerning the First Ultraintelligent Machine. Advances in Com-
puters 6, 31–88 (1966)
3. Minsky, M.: Artificial Intelligence. Scientific American 215(3), 257 (1966)
4. Burks, A.W., Von Neumann, J.: Theory of Self-Reproducing Automata. University of
Illinois Press (1966)
5. Pearce, D.: The biointelligence explosion. In: Singularity Hypotheses, pp. 199–238. Sprin-
ger (2012)
6. Omohundro, S.M.: The nature of self-improving artificial intelligence. In: Singularity
Summit, San Francisco, CA (2007)
7. Waser, M.R.: Bootstrapping a structured self-improving & safe autopoietic self. In: Annual
International Conference on Biologically Inspired Cognitive Architectures, Boston,
Massachusetts, November 9, 2014
8. Hall, J.S.: Engineering utopia. Frontiers in Artificial Intelligence and Applications 171,
460 (2008)
9. Mavrogiannopoulos, N., Kisserli, N., Preneel, B.: A taxonomy of self-modifying code for
obfuscation. Computers & Security 30(8), 679–691 (2011)
10. Anckaert, B., Madou, M., De Bosschere, K.: A model for self-modifying code. In:
Camenisch, J.L., Collberg, C.S., Johnson, N.F., Sallee, P. (eds.) IH 2006. LNCS,
vol. 4437, pp. 232–248. Springer, Heidelberg (2007)
11. Petrean, L.: Polymorphic and Metamorphic Code Applications in Portable Executable
Files Protection. Acta Technica Napocensis, 51(1) (2010)
12. Bonfante, G., Marion, J.-Y., Reynaud-Plantey, D.: A computability perspective on self-
modifying programs. In: Seventh IEEE International Conference on Software Engineering
and Formal Methods, pp. 231–239. IEEE (2009)
13. Cheng, B.H., et al.: Software engineering for self-adaptive systems: a research roadmap. In:
Cheng, B.H., de Lemos, R., Giese, H., Inverardi, P., Magee, J. (eds.) Software Engineering for
Self-Adaptive Systems. LNCS, vol. 5525, pp. 1–26. Springer, Heidelberg (2009)
Analysis of Types of Self-Improving Software 391
14. Ailon, N., et al.: Self-improving algorithms. SIAM Journal on Computing 40(2), 350–375
(2011)
15. Yampolskiy, R., et al.: Printer model integrating genetic algorithm for improvement of
halftone patterns. In: Western New York Image Processing Workshop (WNYIPW). IEEE
Signal Processing Society, Rochester, NY (2004)
16. Yampolskiy, R.V., Ashby, L., Hassan, L.: Wisdom of Artificial Crowds—A Metaheuristic
Algorithm for Optimization. Journal of Intelligent Learning Systems and Applications
4(2), 98–107 (2012)
17. Yampolskiy, R.V., Ahmed, E.L.B.: Wisdom of artificial crowds algorithm for solving
NP-hard problems. International Journal of Bio-Inspired Computation (IJBIC) 3(6),
358–369
18. Ashby, L.H., Yampolskiy, R.V.: Genetic algorithm and wisdom of artificial crowds
algorithm applied to light up. In: 16th International Conference on Computer Games: AI,
Animation, Mobile, Interactive Multimedia, Educational & Serious Games, Louisville,
KY, USA, pp. 27–32, July 27–30, 2011
19. Khalifa, A.B., Yampolskiy, R.V.: GA with Wisdom of Artificial Crowds for Solving Mas-
termind Satisfiability Problem. International Journal of Intelligent Games & Simulation
6(2), 6 (2011)
20. Port, A.C., Yampolskiy, R.V.: Using a GA and Wisdom of Artificial Crowds to solve soli-
taire battleship puzzles. In: 17th International Conference on Computer Games
(CGAMES), pp. 25–29. IEEE, Louisville (2012)
21. Omohundro, S.: Rational artificial intelligence for the greater good. In: Singularity Hypo-
theses, pp. 161–179. Springer (2012)
22. Anderson, M.L., Oates, T.: A review of recent research in metareasoning and metalearn-
ing. AI Magazine 28(1), 12 (2007)
23. Yudkowsky, E.: Intelligence explosion microeconomics. In: MIRI Technical Report.
www.intelligence.org/files/IEM.pdf
24. Heylighen, F.: Brain in a vat cannot break out. Journal of Consciousness Studies 19(1–2),
1–2 (2012)
25. Turchin, V.F.: The concept of a supercompiler. ACM Transactions on Programming
Languages and Systems (TOPLAS) 8(3), 292–325 (1986)
26. Sotala, K.: Advantages of artificial intelligences, uploads, and digital minds. International
Journal of Machine Consciousness 4(01), 275–291 (2012)
27. Muehlhauser, L., Salamon, A.: Intelligence explosion: evidence and import. In: Singularity
Hypotheses, pp. 15–42. Springer (2012)
28. Yudkowsky, E.: Levels of organization in general intelligence. In: Artificial General Intel-
ligence, pp. 389–501. Springer (2007)
29. Chalmers, D.: The Singularity: A Philosophical Analysis. Journal of Consciousness Stu-
dies 17, 7–65 (2010)
30. Nivel, E., et al.: Bounded Recursive Self-Improvement. arXiv preprint arXiv:1312.6764
(2013)
31. Nivel, E., Thórisson, K.R.: Self-programming: operationalizing autonomy. In: Proceedings
of the 2nd Conf. on Artificial General Intelligence (2008)
32. Yudkowsky, E., Hanson, R.: The Hanson-Yudkowsky AI-foom debate. In: MIRI
Technical Report (2008). http://intelligence.org/files/AIFoomDebate.pdf
33. Yampolskiy, R.V.: The Universe of Minds. arXiv preprint arXiv:1410.0369 (2014)
34. Hall, J.S.: Self-improving AI: An analysis. Minds and Machines 17(3), 249–259 (2007)
392 R.V. Yampolskiy
35. Yampolskiy, R.V.: Efficiency Theory: a Unifying Theory for Information, Computation
and Intelligence. Journal of Discrete Mathematical Sciences & Cryptography 16(4–5),
259–277 (2013)
36. Gagliolo, M.: Universal search. Scholarpedia 2(11), 2575 (2007)
37. Levin, L.: Universal Search Problems. Problems of Information Transmission 9(3), 265–
266 (1973)
38. Steunebrink, B., Schmidhuber, J.: A Family of Gödel Machine implementations. In: Fourth
Conference on Artificial General Intelligence (AGI-11), Mountain View, California (2011)
39. Schmidhuber, J.: Gödel machines: fully self-referential optimal universal self-improvers.
In: Artificial General Intelligence, pp. 199–226. Springer (2007)
40. Schmidhuber, J.: Gödel machines: towards a technical justification of consciousness. In:
Adaptive Agents and Multi-Agent Systems II, pp. 1–23. Springer (2005)
41. Schmidhuber, J.: Gödel machines: self-referential universal problem solvers making prov-
ably optimal self-improvements. In: Artificial General Intelligence (2005)
42. Schmidhuber, J.: Ultimate cognition à la Gödel. Cognitive Computation 1(2), 177–193
(2009)
43. Schmidhuber, J.: Completely self-referential optimal reinforcement learners. In: Duch, W.,
Kacprzyk, J., Oja, E., Zadrożny, S. (eds.) ICANN 2005. LNCS, vol. 3697, pp. 223–233.
Springer, Heidelberg (2005)
44. Schmidhuber, J.: Optimal ordered problem solver. Machine Learning 54(3), 211–254
(2004)
45. Schmidhuber, J., Zhao, J., Wiering, M.: Shifting inductive bias with success-story algo-
rithm, adaptive Levin search, and incremental self-improvement. Machine Learning 28(1),
105–130 (1997)
46. Schmidhuber, J.: A general method for incremental self-improvement and multiagent
learning. Evolutionary Computation: Theory and Applications, 81–123 (1999)
47. Schmidhuber, J.: Metalearning with the Success-Story Algorithm (1997).
http://people.idsia.ch/~juergen/ssa/sld001.htm
48. Schmidhuber, J.: A neural network that embeds its own meta-levels. In: IEEE International
Conference on Neural Networks, pp. 407–412. IEEE (1993)
49. Younger, A.S., Hochreiter, S., Conwell, P.R.: Meta-learning with backpropagation. In: In-
ternational Joint Conference on Neural Networks (IJCNN 2001). IEEE (2001)
50. Hochreiter, S., Younger, A., Conwell, P.: Learning to learn using gradient descent. In: Ar-
tificial Neural Networks—ICANN 2001, pp. 87–94 (2001)
51. Osterweil, L.J., Clarke, L.A.: Continuous self-evaluation for the self-improvement of soft-
ware. In: Robertson, P., Shrobe, H.E., Laddaga, R. (eds.) IWSAS 2000. LNCS, vol. 1936,
pp. 27–39. Springer, Heidelberg (2001)
52. Beck, M.B., Rouchka, E.C., Yampolskiy, R.V.: Finding data in DNA: computer forensic
investigations of living organisms. In: Rogers, M., Seigfried-Spellar, K.C. (eds.) ICDF2C
2012. LNICST, vol. 114, pp. 204–219. Springer, Heidelberg (2013)
53. Beck, M., Yampolskiy, R.: DNA as a medium for hiding data. BMC Bioinformatics
13(Suppl. 12), A23 (2012)
54. Yampolskiy, R.V.: Leakproofing Singularity - Artificial Intelligence Confinement Prob-
lem. Journal of Consciousness Studies (JCS) 19(1–2), 194–214 (2012)
55. Majot, A.M., Yampolskiy, R.V.: AI safety engineering through introduction of self-
reference into felicific calculus via artificial pain and pleasure. In: 2014 IEEE International
Symposium on Ethics in Science, Technology and Engineering. IEEE (2014)
56. Yampolskiy, R., Fox, J.: Safety Engineering for Artificial General Intelligence, pp. 1–10.
Topoi (2012)
Analysis of Types of Self-Improving Software 393
57. Yampolskiy, R.V., Fox, J.: Artificial general intelligence and the human mental model. In:
Singularity Hypotheses: A Scientific and Philosophical Assessment, p. 129 (2013)
58. Sotala, K., Yampolskiy, R.V.: Responses to catastrophic AGI risk: A survey. Physica
Scripta. 90, December 2015
59. Yampolskiy, R.V.: What to do with the singularity paradox? In: Müller, V.C. (ed.) Philos-
ophy and Theory of Artificial Intelligence. SAPERE, vol. 5, pp. 397–413. Springer,
Heidelberg (2012)
60. Yampolskiy, R., Gavrilova, M.: Artimetrics: Biometrics for Artificial Entities. IEEE
Robotics and Automation Magazine (RAM) 19(4), 48–58 (2012)
61. Yampolskiy, R., et al.: Experiments in Artimetrics: Avatar Face Recognition. Transactions
on Computational Science XVI, 77–94 (2012)
62. Ali, N., Schaeffer, D., Yampolskiy, R.V.: Linguistic profiling and behavioral drift in chat
bots. In: Midwest Artificial Intelligence and Cognitive Science Conference, p. 27 (2012)
63. Gavrilova, M., Yampolskiy, R.: State-of-the-Art in Robot Authentication [From the Guest
Editors]. Robotics & Automation Magazine, IEEE 17(4), 23–24 (2010)
64. Hall, J.S.: VARIAC: an Autogenous Cognitive Architecture. Frontiers in Artificial Intelli-
gence and Applications 171, 176 (2008)
65. Yampolskiy, R.V.: Turing test as a defining feature of ai-completeness. In: Yang, X.-S.
(ed.) Artificial Intelligence, Evolutionary Computing and Metaheuristics. SCI, vol. 427,
pp. 3–17. Springer, Heidelberg (2013)
66. Yampolskiy, R.V.: AI-Complete, AI-Hard, or AI-Easy–Classification of problems in AI.
In: The 23rd Midwest Artificial Intelligence and Cognitive Science Conference, Cincinnati,
OH, USA (2012)
67. Schaul, T., Schmidhuber, J.: Metalearning. Scholarpedia 5(6), 4650 (2010)
68. Conitzer, V., Sandholm, T.: Definition and complexity of some basic metareasoning prob-
lems. In: Proceedings of the Eighteenth International Joint Conference on Artificial
Intelligence (IJCAI), Acapulco, Mexico, pp. 1099–1106 (2003)
69. Yampolskiy, R.V.: On the limits of recursively self-improving AGI. In: The Eighth
Conference on Artificial General Intelligence, Berlin, Germany, July 22–25, 2015
On the Limits of Recursively Self-Improving AGI
Roman V. Yampolskiy()
1 Introduction
Intuitively most of us have some understanding of what it means for a software sys-
tem to be self-improving, however we believe it is important to precisely define such
notions and to systematically investigate different types of self-improving software1.
First we need to define the notion of improvement. We can talk about improved effi-
ciency – solving same problems faster or with less need for computational resources
(such as memory). We can also measure improvement in error rates or finding closer
approximations to optimal solutions, as long as our algorithm is functionally equiva-
lent from generation to generation. Efficiency improvements can be classified as ei-
ther producing a trivial improvement as between different algorithms in the same
complexity class (ex. NP), or as producing a fundamental improvement as between
different complexity classes (ex. P vs NP) [1]. It is also very important to remember
that complexity class notation (Big-O) may hide significant constant factors which
while ignorable theoretically may change relative order of efficiency in practical ap-
plications of algorithms.
This type of analysis works well for algorithms designed to accomplish a particular
task, but doesn’t work well for general purpose intelligent software as an improve-
ment in one area may go together with decreased performance in another domain.
This makes it hard to claim that the updated version of the software is indeed an im-
provement. Mainly, the major improvement we want from self-improving intelligent
software is higher degree of intelligence which can be approximated via machine
friendly IQ tests [2] with a significant G-factor correlation.
1
This paper is based on material excerpted, with permission, from the book - Artificial Superintelligence:
a Futuristic Approach © 2015 CRC Press.
© Springer International Publishing Switzerland 2015
J. Bieger (Ed.): AGI 2015, LNAI 9205, pp. 394–403, 2015.
DOI: 10.1007/978-3-319-21365-1_40
On the Limits of Recursively Self-Improving AGI 395
classes of problems will always remain only approximately solvable and any im-
provements in solutions will come from additional hardware resources not higher
intelligence.
Wiedermann argues that cognitive systems form an infinite hierarchy and from a
computational point of view human-level intelligence is upper-bounded by the ∑2
class of the Arithmetic Hierarchy [16]. Because many real world problems are com-
putationally infeasible for any non-trivial inputs even an AI which achieves human
level performance is unlikely to progress towards higher levels of the cognitive hie-
rarchy. So while theoretically machines with super-Turing computational power are
possible, in practice they are not implementable as the non-computable information
needed for their function is just that – not computable. Consequently Wiedermann
states that while machines of the future will be able to solve problems, solvable by
humans, much faster and more reliably they will still be limited by computational
limits found in upper levels of the Arithmetic Hierarchy [16, 17].
Mahoney attempts to formalize what it means for a program to have a goal G and
to self-improve with respect to being able to reach said goal under constraint of time,
t [18]. Mahoney defines a goal as a function G: N R mapping natural numbers N to
real numbers R. Given a universal Turing machine L, Mahoney defines P(t) to mean
the positive natural number encoded by output of the program P with input t running
on L after t time steps, or 0 if P has not halted after t steps. Mahoney’s representation
says that P has goal G at time t if and only if there exists t’ > t such that G(P(t’)) >
G(P(t)) and for all t’ > t, G(P(t’) ≥ G(P(t)). If P has a goal G, then G(P(t)) is a mono-
tonically increasing function of t with no maximum for t > C. Q improves on P with
respect to goal G if and only if all of the following condition are true: P and Q have
goal Q. ∃t, G(Q(t)) > G(P(t)) and ~∃t, t’ > t, G(Q(t)) > G(P(t)) [18]. Mahoney then
defines an improving sequence with respect to G as an infinite sequence of program
P1, P2, P3, … such that for ∀i, i > 0, Pi+1 improves Pi with respect to G. Without the
loss of generality Mahoney extends the definition to include the value -1 to be an
acceptable input, so P(-1) outputs appropriately encoded software. He finally defines
P1 as an RSI program with respect to G iff Pi(-1) = Pi+1 for all i > 0 and the sequence
Pi, i = 1, 2, 3 … is an improving sequence with respect to goal G [18]. Mahoney also
analyzes complexity of RSI software and presents a proof demonstrating that the al-
gorithmic complexity of Pn (the nth iteration of an RSI program) is not greater than
O(log n) implying a very limited amount of knowledge gain would be possible in
practice despite theoretical possibility of RSI systems [18]. Yudkowsky also considers
possibility of receiving only logarithmic returns on cognitive reinvestment: log(n) +
log(log(n)) + … in each recursive cycle [19].
Other limitations may be unique to the proposed self-improvement approach. For
example Levin type search through the program space will face problems related to
Rice’s theorem [20] which states that for any arbitrarily chosen program it is impossi-
ble to test if it has any non-trivial property such as being very intelligent. This testing
is of course necessary to evaluate redesigned code. Also, universal search over the
space of mind designs which will not be computationally possible due to the No Free
Lunch theorems [21] as we have no information to reduce the size of the search
space [22]. Other difficulties related to testing remain even if we are not taking about
On the Limits of Recursively Self-Improving AGI 397
arbitrarily chosen programs but about those we have designed with a specific goal in
mind and which consequently avoid problems with Rice’s theorem. One such difficul-
ty is determining if something is an improvement. We can call this obstacle – “multi-
dimensionality of optimization”.
No change is strictly an improvement; it is always a tradeoff between gain in some
areas and loss in others. For example, how do we evaluate and compare two software
systems one of which is better at chess and the other at poker? Assuming the goal is
increased intelligence over the distribution of all potential environments the system
would have to figure out how to test intelligence at levels above its own a problem
which remains unsolved. In general the science of testing for intelligence above level
achievable by naturally occurring humans (IQ < 200) is in its infancy. De Garis raises
a problem of evaluating quality of changes made to the top level structures responsi-
ble for determining the RSI’s functioning, structures which are not judged by any
higher level modules and so present a fundamental difficulty in accessing their per-
formance [23].
Other obstacles to RSI have also been suggested in the literature. Löb’s theorem
states that a mathematical system can’t assert its own soundness without becoming
inconsistent [24], meaning a sufficiently expressive formal system can’t know that
everything it proves to be true is actually so [24]. Such ability is necessary to verify
that modified versions of the program are still consistent with its original goal of get-
ting smarter. Another obstacle, called procrastination paradox will also prevent the
system from making modifications to its code since the system will find itself in a
state in which a change made immediately is as desirable and likely as the same
change made later [25, 26]. Since postponing making the change carries no negative
implications and may actually be safe this may result in an infinite delay of actual
implementation of provably desirable changes.
Similarly, Bolander raises some problems inherent in logical reasoning with self-
reference, namely, self-contradictory reasoning, exemplified by the Knower Paradox
of the form - “This sentence is false” [27]. Orseau and Ring introduce what they call
“Simpleton Gambit” a situation in which an agent will chose to modify itself towards
its own detriment if presented with a high enough reward to do so [28]. Yampolskiy
reviews a number of related problems in rational self-improving optimizers, above a
certain capacity, and concludes, that despite opinion of many, such machines will
choose to “wirehead” [29]. Chalmers [30] suggests a number of previously unana-
lyzed potential obstacles on the path to RSI software with Correlation obstacle being
one of them. He describes it as a possibility that no interesting properties we would
like to amplify will correspond to ability to design better software.
Yampolskiy is also concerned with accumulation of errors in software undergoing
an RSI process, which is conceptually similar to accumulation of mutations in the
evolutionary process experienced by biological agents. Errors (bugs) which are not
detrimental to system’s performance are very hard to detect and may accumulate from
generation to generation building on each other until a critical mass of such errors
leads to erroneous functioning of the system, mistakes in evaluating quality of the
future generations of the software or a complete breakdown [31].
398 R.V. Yampolskiy
The self-reference aspect in self-improvement system itself also presents some se-
rious challenges. It may be the case that the minimum complexity necessary to be-
come RSI is higher than what the system itself is able to understand. We see such
situations frequently at lower levels of intelligence, for example a squirrel doesn’t
have mental capacity to understand how a squirrel’s brain operates. Paradoxically, as
the system becomes more complex it may take exponentially more intelligence to
understand itself and so a system which starts capable of complete self-analysis may
lose that ability as it self-improves. Informally we can call it the Munchausen ob-
stacle, inability of a system to lift itself by its own bootstraps. An additional problem
may be that the system in question is computationally irreducible [32] and so can’t
simulate running its own source code. An agent cannot predict what it will think
without thinking it first. A system needs 100% of its memory to model itself, which
leaves no memory to record the output of the simulation. Any external memory to
which the system may write becomes part of the system and so also has to be mod-
eled. Essentially the system will face an infinite regress of self-models from which it
can’t escape. Alternatively, if we take a physics perspective on the issue, we can see
intelligence as a computational resource (along with time and space) and so producing
more of it will not be possible for the same reason why we can’t make a perpetual
motion device as it would violate fundamental laws of nature related to preservation
of energy. Similarly it has been argued that a Turing Machine cannot output a ma-
chine of greater algorithmic complexity [14].
We can even attempt to formally prove impossibility of intentional RSI process via
proof by contradiction: Let’s define RSI R1 as a program not capable of algorithmical-
ly solving a problem of difficulty X, say Xi. If R1 modifies its source code after which
it is capable of solving Xi it violates our original assumption that R1 is not capable of
solving Xi since any introduced modification could be a part of the solution process,
so we have a contradiction of our original assumption, and R1 can’t produce any mod-
ification which would allow it to solve Xi, which was to be shown. Informally, if an
agent can produce a more intelligent agent it would already be as capable as that new
agent. Even some of our intuitive assumptions about RSI are incorrect. It seems that it
should be easier to solve a problem if we already have a solution to a smaller instance
of such problem [33] but in a formalized world of problems belonging to the same
complexity class, re-optimization problem is proven to be as difficult as optimization
itself [34-37].
3 Analysis
A number of fundamental problems remain open in the area of RSI. We still don’t
know the minimum intelligence necessary for commencing the RSI process, but we
can speculate that it would be on par with human intelligence which we associate with
universal or general intelligence [38], though in principal a sub-human level system
capable of self-improvement can’t be excluded [30]. One may argue that even human
level capability is not enough because we already have programmers (people or their
intellectual equivalence formalized as functions [39] or Human Oracles [40, 41]) who
have access to their own source code (DNA), but who fail to understand how DNA
On the Limits of Recursively Self-Improving AGI 399
(nature) works to create their intelligence. This doesn’t even include additional com-
plexity in trying to improve on existing DNA code or complicating factors presented
by the impact of learning environment (nurture) on development of human intelli-
gence. Worse yet, it is not obvious how much above human ability an AI needs to be
to begin overcoming the “complexity barrier” associated with self-understanding.
Today’s AIs can do many things people are incapable of doing, but are not yet capa-
ble of RSI behavior.
We also don’t know the minimum size of program (called Seed AI [42]) necessary
to get the ball rolling. Perhaps if it turns out that such “minimal genome” is very small
a brute force [43] approach might succeed in discovering it. We can assume that our
Seed AI is the smartest Artificial General Intelligence known to exist [44] in the
world as otherwise we can simply delegate the other AI as the seed. It is also not ob-
vious how the source code size of RSI will change as it goes through the improvement
process, in other words what is the relationship between intelligence and minimum
source code size necessary to support it. In order to answer such questions it may be
useful to further formalize the notion of RSI perhaps by representing such software as
a Turing Machine [45] with particular inputs and outputs. If that could be successfully
accomplished a new area of computational complexity analysis may become possible
in which we study algorithms with dynamically changing complexity (Big-O) and
address questions about how many code modification are necessary to achieve certain
level of performance from the algorithm.
This of course raises the question of speed of RSI process, are we expecting it to
take seconds, minutes, days, weeks, years or more (hard takeoff VS soft takeoff) for
the RSI system to begin hitting limits of what is possible with respect to physical
limits of computation [46]? Even in suitably constructed hardware (human baby) it
takes decades of data input (education) to get to human-level performance (adult). It
is also not obvious if the rate of change in intelligence would be higher for a more
advanced RSI, because it is more capable, or for a “newbie” RSI because it has more
low hanging fruit to collect. We would have to figure out if we are looking at im-
provement in absolute terms or as a percentage of system’s current intelligence score.
Yudkowsky attempts to analyze most promising returns on cognitive reinvestment
as he considers increasing size, speed or ability of RSI systems. He also looks at dif-
ferent possible rates of return and arrives at three progressively steeper trajectories for
RSI improvement which he terms: “fizzle”, “combust” and “explode” aka “AI go
FOOM” [19]. Hall [47] similarly analyzes rates of return on cognitive investment and
derives a curve equivalent to double the Moore’s Law rate. Hall also suggest that an
AI would be better of trading money it earns performing useful work for improved
hardware or software rather than attempt to directly improve itself since it would not
be competitive against more powerful optimization agents such as Intel corporation.
Fascinatingly, by analyzing properties which correlate with intelligence, Chalmers
[30] is able to generalize self-improvement optimization to properties other than
intelligence. We can agree that RSI software as we describe it in this work is getting
better at designing software not just at being generally intelligent. Similarly other
properties associated with design capacity can be increased along with capacity to
design software for example capacity to design systems with sense of humor and so in
addition to intelligence explosion we may face an explosion of funniness.
400 R.V. Yampolskiy
5 Conclusions
Intelligence is a computational resource and as with other physical resources (mass,
speed) its behavior is probably not going to be just a typical linear extrapolation of
what we are used to, if observed at high extremes (IQ > 200+). It may also be subject
to fundamental limits such as the speed limit on travel of light or fundamental limits
we do not yet understand or know about (unknown unknowns). In this work we re-
viewed a number of computational upper limits to which any successful RSI system
will asymptotically strive to grow, we can note that despite existence of such upper
On the Limits of Recursively Self-Improving AGI 401
bounds we are currently probably very far from reaching them and so still have plenty
of room for improvement at the top. Consequently, any RSI achieving such signifi-
cant level of enhancement, despite not creating an infinite process, will still seem like
it is producing superintelligence with respect to our current state [56].
The debate regarding possibility of RSI will continue. Some will argue that while it
is possible to increase processor speed, amount of available memory or sensor resolu-
tion the fundamental ability to solve problems can’t be intentionally and continuously
improved by the system itself. Additionally, critics may suggest that intelligence is
upper bounded and only differs by speed and available info to process [57]. In fact
they can point out to such maximum intelligence, be it a theoretical one, known as
AIXI, an agent which given infinite computational resources will make purely ration-
al decisions in any situation.
Others will say that since intelligence is the ability to find patterns in data, intelli-
gence has no upper bounds as the number of variables comprising a pattern can
always be greater and so present a more complex problem against which intelligence
can be measured. It is easy to see that even if in our daily life the problems we en-
counter do have some maximum difficulty it is certainly not the case with theoretical
examples we can derive from pure mathematics. It seems likely that the debate will
not be settled until a fundamental unsurmountable obstacle to RSI process is found or
a proof by existence is demonstrated. Of course the question of permitting machines
to undergo RSI transformation is a separate and equally challenging problem.
This paper is a part of a two paper set presented at AGI2015 with the
complementary paper being: “Analysis of Types of Self-Improving Software” [58].
References
1. Yampolskiy, R.V., Construction of an NP Problem with an Exponential Lower Bound
(2011). Arxiv preprint arXiv:1111.0305
2. Yonck, R.: Toward a Standard Metric of Machine Intelligence. World Future Review 4(2),
61–70 (2012)
3. Bremermann, H.J.: Quantum noise and information. In: Proceedings of the Fifth Berkeley
Symposium on Mathematical Statistics and Probability (1967)
4. Bekenstein, J.D.: Information in the holographic universe. Scientific American 289(2),
58–65 (2003)
5. Lloyd, S.: Ultimate Physical Limits to Computation. Nature 406, 1047–1054 (2000)
6. Sandberg, A.: The physics of information processing superobjects: daily life among the
Jupiter brains. Journal of Evolution and Technology 5(1), 1–34 (1999)
7. Aaronson, S.: Guest column: NP-complete problems and physical reality. ACM Sigact
News 36(1), 30–52 (2005)
8. Shannon, C.E.: A Mathematical Theory of Communication. Bell Systems Technical
Journal 27(3), 379–423 (1948)
9. Krauss, L.M., Starkman, G.D.: Universal limits on computation (2004). arXiv preprint
astro-ph/0404510
10. Fox, D.: The limits of intelligence. Scientific American 305(1), 36–43 (2011)
11. Einstein, A.: Does the inertia of a body depend upon its energy-content? Annalen der
Physik 18, 639–641 (1905)
402 R.V. Yampolskiy
12. Wheeler, J.A.: Information, Physics, Quantum: The Search for Links. Univ. of Texas
(1990)
13. Schaeffer, J., et al.: Checkers is Solved. Science 317(5844), 1518–1522 (2007)
14. Mahoney, M.: Is there a model for RSI?. In: SL4, June 20, 2008. http://www.sl4.org/
archive/0806/19028.html
15. Turing, A.: On computable numbers, with an application to the Entscheidungsproblem.
Proceedings of the London Mathematical Society 2(42), 230–265 (1936)
16. Wiedermann, J.: A Computability Argument Against Superintelligence. Cognitive Compu-
tation 4(3), 236–245 (2012)
17. Wiedermann, J.: Is There Something Beyond AI? Frequently Emerging, but Seldom
Answered Questions about Artificial Super-Intelligence, p. 76. Artificial Dreams, Beyond AI
18. Mahoney, M.: A Model for Recursively Self Improving Programs (2010).
http://mattmahoney.net/rsi.pdf
19. Yudkowsky, E., Intelligence Explosion Microeconomics. In: MIRI Technical Report.
www.intelligence.org/files/IEM.pdf
20. Rice, H.G.: Classes of recursively enumerable sets and their decision problems. Transac-
tions of the American Mathematical Society 74(2), 358–366 (1953)
21. Wolpert, D.H., Macready, W.G.: No free lunch theorems for optimization. IEEE Transac-
tions on Evolutionary Computation 1(1), 67–82 (1997)
22. Melkikh, A.V.: The No Free Lunch Theorem and hypothesis of instinctive animal beha-
vior. Artificial Intelligence Research 3(4), p43 (2014)
23. de Garis, H.: The 21st. Century Artilect: Moral Dilemmas Concerning the Ultra Intelligent
Machine. Revue Internationale de Philosophie 44(172), 131–138 (1990)
24. Yudkowsky, E., Herreshoff, M.: Tiling agents for self-modifying AI, and the Löbian
obstacle. In: MIRI Technical Report (2013)
25. Fallenstein, B., Soares, N.: Problems of self-reference in self-improving space-time
embedded intelligence. In: MIRI Technical Report (2014)
26. Yudkowsky, E.: The Procrastination Paradox (Brief technical note). In: MIRI Technical
Report (2014). https://intelligence.org/files/ProcrastinationParadox.pdf
27. Bolander, T.: Logical theories for agent introspection. Comp. Science 70(5), 2002 (2003)
28. Orseau, L.: Ring, M.: Self-modification and mortality in artificial agents. In: 4th interna-
tional conference on Artificial general intelligence, pp. 1–10. Mount. View, CA. (2011)
29. Yampolskiy, R.V.: Utility Function Security in Artificially Intelligent Agents. Journal of
Experimental and Theoretical Artificial Intelligence (JETAI), 1–17 (2014)
30. Chalmers, D.: The Singularity: A Philosophical Analysis. Journal of Consciousness
Studies 17, 7–65 (2010)
31. Yampolskiy, R.V.: Artificial intelligence safety engineering: Why machine ethics is a
wrong approach. In: Philosophy and Theory of Artificial Intelligence, pp. 389–396,
Springer (2013)
32. Wolfram, S.: A New Kind of Science. Wolfram Media, Inc., May 14, 2002
33. Yampolskiy, R.V.: Computing Partial Solutions to Difficult AI Problems. In: Midwest
Artificial Intelligence and Cognitive Science Conference, p. 90 (2012)
34. Böckenhauer, H.-J., Hromkovič, J., Mömke, T., Widmayer, P.: On the hardness of
reoptimization. In: Geffert, V., Karhumäki, J., Bertoni, A., Preneel, B., Návrat, P.,
Bieliková, M. (eds.) SOFSEM 2008. LNCS, vol. 4910, pp. 50–65. Springer, Heidelberg
(2008)
35. Ausiello, G., Escoffier, B., Monnot, J., Paschos, V.T.: Reoptimization of minimum and
maximum traveling salesman’s tours. In: Arge, L., Freivalds, R. (eds.) SWAT 2006.
LNCS, vol. 4059, pp. 196–207. Springer, Heidelberg (2006)
On the Limits of Recursively Self-Improving AGI 403
36. Archetti, C., Bertazzi, L., Speranza, M.G.: Reoptimizing the traveling salesman problem.
Networks 42(3), 154–159 (2003)
37. Ausiello, G., Bonifaci, V., Escoffier, B.: Complexity and approximation in reoptimization.
Imperial College Press/World Scientific (2011)
38. Loosemore, R., Goertzel, B.: Why an intelligence explosion is probable. In: Singularity
Hypotheses, pp. 83–98. Springer (2012)
39. Shahaf, D., Amir, E.: Towards a theory of AI completeness. In: 8th International
Symposium on Logical Formalizations of Commonsense Reasoning. California, March
26–28, 2007
40. Yampolskiy, R.V.: Turing test as a defining feature of AI-completeness. In: Yang, X.-S.
(ed.) Artificial Intelligence, Evolutionary Computing and Metaheuristics. SCI, vol. 427,
pp. 3–17. Springer, Heidelberg (2013)
41. Yampolskiy, R.V.: AI-complete, AI-hard, or AI-easy–classification of problems in AI. In:
The 23rd Midwest Artificial Intelligence and Cognitive Science Conference, OH, USA
(2012)
42. Yudkowsky, E.S.: General Intelligence and Seed AI (2001). http://singinst.org/ourresearch/
publications/GISAI/
43. Yampolskiy, R.V.: Efficiency Theory: a Unifying Theory for Information, Computation
and Intelligence. J. of Discrete Math. Sciences & Cryptography 16(4–5), 259–277 (2013)
44. Yampolskiy, R.V.: AI-Complete CAPTCHAs as Zero Knowledge Proofs of Access to an
Artificially Intelligent System. ISRN Artificial Intelligence 271878 (2011)
45. Turing, A.M.: On Computable Numbers, with an Application to the Entscheidungsproblem.
Proceedings of the London Mathematical Society 42, 230–265 (1936)
46. Bostrom, N.: Superintelligence: Paths, dangers, strategies. Oxford University Press (2014)
47. Hall, J.S.: Engineering utopia. Frontiers in AI and Applications 171, 460 (2008)
48. Hutter, M.: Universal algorithmic intelligence: A mathematical top→ down approach. In:
Artificial general intelligence, pp. 227–290. Springer (2007)
49. Kolmogorov, A.N.: Three Approaches to the Quantitative Definition of Information.
Problems Inform. Transmission 1(1), 1–7 (1965)
50. Yampolskiy, R.V.: The Universe of Minds (2014). arXiv:1410.0369
51. Yudkowsky, E.: Levels of organization in general intelligence. In: Artificial general
intelligence, pp. 389–501. Springer (2007)
52. Bostrom, N.: What is a Singleton? Linguistic and Philosophical Invest. 5(2), 48–54 (2006)
53. Yudkowsky, E.: Timeless decision theory. The Singularity Institute, San Francisco (2010)
54. LessWrong: Acausal Trade, September 29, 2014. http://wiki.lesswrong.com/wiki/
Acausal_trade
55. Yudkowsky, E.S.: Coherent Extrapolated Volition. Singularity Institute for Artificial Intel-
ligence, May 2004. http://singinst.org/upload/CEV.html
56. Yudkowsky, E.: Recursive Self-Improvement. In: Less Wrong, December 1, 2008.
http://lesswrong.com/lw/we/recursive_selfimprovement/, September 29, 2014
57. Hutter, M.: Can Intelligence Explode? J. of Consciousness Studies 19(1–2), 1–2 (2012)
58. Yampolskiy, R.V.: Analysis of types of self-improving software. In: The Eighth
Conference on Artificial General Intelligence, Berlin, Germany, July 22–25, 2015
Gödel Agents in a Scalable Synchronous
Agent Framework
We call this agent framework locally synchronous, because the time struc-
tures of the agent and the environment are independent, as if they would exist
in different universes, but are locally interconnected via percepts and actions.
For a detailed discussion of local synchrony, which lies between global synchrony
and asynchrony, see [3]. The locally synchronous framework was also used by
R. Solomonoff in his seminal articles on universal induction [12,13]. Full
Solomonoff induction is incomputable, but in [14] it is outlined how effective
and universal induction is possible when the agent and the environment are
embedded into a synchronous time structure. This is one example for surprising
implications resulting from seemingly small changes to a conceptual framework,
stressing the point that some results are not as absolute as they might appear,
but depend crucially on the details of the chosen framework.
Here we want to modify the locally synchronous framework in two ways, call-
ing the new framework globally synchronous or just synchronous: First we replace
Turing machines by Moore machines (see below), and second we do not assume
the agent or environment are suspended while the other one is computing, but
the Moore agent and the Moore environment are interacting in a simultaneous
fashion: their transitions are synchronized, the output of one machine is the
input of the other, and the output is generated and read in each cycle (see figure
1). A Moore agent can conduct complex calculations using its internal states
and multiple cycles, but during these calculations the last output (or whatever
the Moore agent produces as preliminary output while the complex calculation
is running) of the Moore agent is used as input for the environment. Thus the
Moore agent has to act in real-time, but on the other hand the environment is
406 J. Zimmermann et al.
scanned in real-time, too, excluding the possibility that the environment takes
more and more time to generate the next percept. In fact, in the locally syn-
chronous framework, the agent does not know whether the current percept was
generated within a second or one billion years.
Moore machines are finite state machines which read in an input symbol and
generate an output symbol in each cycle. They do not terminate, but trans-
late a stream of input symbols into a stream of output symbols, accordingly
they are also called finite state transducers. Moore machines are named after
E. F. Moore, who introduced the concept in 1956 [8].
A Moore machine is a 6-tuple (S, S0 , Σ, Λ, T, G) where:
2 Gödel Agents
An arena is a triple (A, E, S), where A is a family of agents, E is a family of
environments, and S : A × E → R is a score function assigning every pair of
agent A and environment E a real number measuring the performance of agent A
in environment E. First we assume that the agent and the environment families
are finite, the cases where agent or environment families or both become infinite
is discussed in section 4. In the finite case, the following notions are well-defined:
Definition 1. For all environments E ∈ E the score S pre (E) = maxA∈A S(A, E)
is called the pre-established score of E. An agent A ∈ A is called an pre-established
agent of E if S(A, E) = S pre (E).
Gödel Agents in a Scalable Synchronous Agent Framework 407
design decisions. One such parameter is the “horizon”, a finite lifespan or maxi-
mal planning interval often necessary to define for an agent in order to get well-
defined reward-values for agent policies. But especially in open environments
existing for an indefinite timespan, this is an ad hoc parameter containing con-
tingent aspects which may prevent the agent from optimal behavior. To stress
this point, we quote M. Hutter ([5], p. 18):
“The only significant arbitrariness in the AIXI model lies in the choice of the
lifespan m.”
where AIXI is a learning agent aiming to be as general as possible.
In order to eliminate this parameter and to tackle the horizon problem from
a foundational point of view, we will look into the notion of an infinite game,
and, in a probabilistic context, into ruin theory.
An infinite game is a game which potentially has no end, but could go on
forever. And for at least one of the players this is exactly the goal: to stay in
the game till the end of time. A good illustration of this abstract concept is the
Angel and Devils Game, introduced by J. H. Conway in 1982 [2]. The game is
played by two players called the angel and the devil. The playground is Z × Z,
an infinite 2D lattice. The angel gets assigned a power k (a natural number 1
or higher), which is fixed before the game starts. At the beginning, the angel is
located at the origin. On each turn, the angel has to jump to an empty square
which has at most a distance of k units from the angel’s current square in the
infinity norm. The devil, on its turn, can delete any single square not containing
the angel. The angel can jump over deleted squares, but cannot move to them.
The devil wins if the angel cannot move anymore. The angel wins by moving, i.e.,
surviving, indefinitely. In [2] it was proved that an angel of power 1 can always
be trapped by the devil, but it took 25 years to show that an angel of power 2
has a winning strategy [6], i.e., an angel of power 2 using the right strategy can
survive forever.
This game nicely illustrates that the angel has not a definite or finite goal it
wants to reach, but aspires to avoid certain states of the world. This seemingly
innocuous transition from a positive goal, an attractor, to a negative goal, a
repellor, solves the horizon problem from a foundational point of view, avoiding
the introduction of arbitrary parameters. Now actions do not have to be scored
with regard to the positive goals they can reach within a certain time frame, but
according to the probability they entail for avoiding the repellor states forever.
A classical probabilistic example to illustrate the concept of an infinite hori-
zon is from ruin theory. Ruin theory was developed as a mathematical model for
the problem an insurance company is typically facing: there is an incoming flow
of premiums and an outgoing flow of claims [1]. Assuming that the flow of pre-
miums is constant and the time and size of claims is exponentially distributed,
the net capital position of an insurance company can be modeled as a biased
random walk. Ruin is defined as a negative net capital position. Now the maybe
surprising fact is that there are parameter values for which ruin probability even
for an infinite time horizon stays below 1, i.e., an indefinite survival has a pos-
itive probability. For the above model of exponentially distributed claims and
Gödel Agents in a Scalable Synchronous Agent Framework 409
Fig. 2. In the insurance example, different actions (investing in either stocks or cash)
lead to different capital position outcomes (survival or ruin) while getting the same
premiums and the same claims occur in both scenarios. The safer cash investment
scenario initially fares better but in the end ruin occurs (in month 273), while the
riskier stock investment scenario is able to accumulate enough reserves over a longer
horizon to survive. The simulation uses 302 monthly periods from 1990 to 2015, an
initial capital of 1, a constant premium of 0.01 per month, exponentially distributed
claim sizes (λ = 5) occurring with a probability of 0.10 per period and investment in
either a stock performance index (DAX) or interest-free cash.
interclaim times, there is an analytical formula for the ruin probability ψ with
infinite time horizon [9]:
μ μ
ψ(u) = exp(( − λ)u),
cλ c
where u > 0 is the initial capital, c > 0 is the premium received continuously per
unit time, interclaim times Ti are distributed according to Exp(μ), μ > 0 and the
sizes of claims Yi according to Exp(λ), λ > 0. For example, if the initial capital is
u = 1, premium rate is c = 0.2, the expected interclaim time E(Ti ) = 2 (μ = 0.5),
and the expected size of claims E(Yi ) = 0.2 (λ = 5), we get an infinite horizon
ruin probability of ψ = 0.04, i.e., in this case the probability to stay in business
forever, the liveness, is 1 − ψ = 96%.
In a more general example one can imagine that the insurance company can
invest its capital in stocks. In figure 2, beginning from the same initial capital,
two scenarios for the development of the net capital are shown: one conservative,
where all the capital is kept as cash, and one aggressive, where all the capital
is invested in stocks. In this case, the risky strategy prevails over the less risky
one, but the best strategy is probably a smart mix of cash and stocks which is
reallocated periodically, i.e., the investment strategy of the insurance company
would decide on ruin or indefinite survival.
410 J. Zimmermann et al.
Both examples, the angel problem and the insurance problem, show how to
avoid the horizon problem by switching the definition of goal from reaching a
world state to avoiding a world state. In this sense, the accumulation of reward
is only relevant as long as it helps to stay away from the repellor.
The above discussion of negative goals or repellors should serve as an illustra-
tion of a principled solution of the horizon problem and should inspire to search
for new goal systems of agents. We do not claim that all agent policies should
strive to avoid repellors. Negative goals should be seen as complementing, not
replacing positive goals.
The above discussion addresses only the existence of Gödel agents in certain
situations, not how to construct or approximate them. At least we now know
that there is something worthwhile to search for.
5.9
5.8
GoedelLoss
5.7
5.6
5.5
5.4
2 3 4 5 6 7 8
AgentStates
Fig. 3. Increasing agent complexity leads to lower Gödel losses as seen in these prelim-
inary results from a simulation performed with 500 fixed Moore environments (having
5 states, 4 inputs and 6 outputs with random transition tables and random outputs), a
fixed score function (using random scores in [0-10] depending on the environment state,
the final score is given as average score per simulation step), and 100000 random Moore
agents drawn per agent state number, all evaluated for 100 steps per environment-agent
pair.
For example, we want to know how the Gödel loss varies if we increase the
number of states in the agent family. Is there a “bang per state” effect and how
large is it? In figure 3 the estimated Gödel losses for a fixed environment family,
fixed score function, and increasing number of agent states are displayed. We
can see a “bang per state” effect, but, like in many saturation phenomena, it
finally gets smaller for every added state. Of course these phenomena have to
be investigated much more extensive, both theoretically and empirically, but
that is exactly what we hope for: that the proposed framework is the starting
point for the detailed exploration of the landscape of arenas, adaptability, and
self-improvement.
References
1. Asmussen, S.: Ruin Probabilities. World Scientific (2000)
2. Berlekamp, E.R., Conway, J.H., Guy, R.K.: Winning Ways for your Mathematical
Plays. Academic Press (1982)
3. Cremers, A.B., Hibbard, T.H.: A programming notation for locally synchronized
algorithms. In: Bertolazzi, P., Luccio, F. (eds.) VLSI: Algorithms and Architec-
tures, pp. 341–376. Elsevier (1985)
4. Garber, D.: Body, Substance, Monad. Oxford University Press, Monad (2009)
5. Hutter, M.: Universal Artificial Intelligence. Springer (2005)
6. Kloster, O.: A solution to the angel problem. Theoretical Computer Science 389(1–
2), 152–161 (2007)
7. Legg, S., Hutter, M.: Universal intelligence: A definition of machine intelligence.
Minds & Machines 17(4), 391–444 (2007)
8. Moore, E.F.: Gedanken-experiments on sequential machines. Automata Studies
34, 129–153 (1956)
9. Rongming, W., Haifeng, L.: On the ruin probability under a class of risk processes.
Astin Bulletin 32(1), 81–90 (2002)
10. Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 3rd edn. Pren-
tice Hall (2009)
11. Schmidhuber, J.: Gödel machines: Fully self-referential optimal universal self-
improvers. In: Goertzel, B., Pennachin, C. (eds.) Artificial General Intelligence.
Springer (2007)
12. Solomonoff, R.: A formal theory of inductive inference, part I. Information and
Control 7(1), 1–22 (1964)
13. Solomonoff, R.: A formal theory of inductive inference, part II. Information and
Control 7(2), 224–254 (1964)
14. Zimmermann, J., Cremers, A.B.: Making Solomonoff Induction Effective. In:
Cooper, S.B., Dawar, A., Löwe, B. (eds.) CiE 2012. LNCS, vol. 7318, pp. 745–
754. Springer, Heidelberg (2012)
Author Index