How To Build A Brain
How To Build A Brain
Chris Eliasmith
1
CONTENTS 2
Bibliography 397
Preface
I suspect that when most of us want to know “how the brain works”, we are some
what disappointed with an answer that describes how individual neurons function,
or an answer that picks out the parts of the brain that have increased activity while
reading words, or an answer that describes what chemicals are in less abundance
when someone is depressed. These are all parts of an answer, no doubt, but the
parts need to be put together.
Instead, I suspect that the kind of answer that we would like is the kind of
answer that early pioneers in cognitive science were trying to construct. Answers
like those provided by Allen Newell (1990) in his book “Unified theories of cog-
nition.” Newell was interested in understanding how the “wheels turn” and the
“gears grind” during cognitive behavior. He was interested in the details of brain
function during all the variety of tasks that we might call “cognitive”. Newell’s
answer, while interesting and important, was ultimately not found to be very con-
vincing by many. Perhaps this is why most researchers in the behavioral sciences
– which I take to include psychology, neuroscience, psychophysics, cognitive sci-
ence, linguistics, and parts of engineering, computer science, statistics, and so on
– have since focused on characterizing the mechanisms of relatively small parts
of the brain. And notably, many of these characterizations do not fit well with the
answer Newell was attempting to propose.
Regardless, I think we may now be in a position to again try to provide the
kind of answer Newell was after, and I think we may be able to integrate many
of the pieces of the puzzle coming from these myriad different disciplines. In the
grandest (and hence least plausible) sense, that is the purpose of this book. It is an
attempt to provide basic principles and an outline of an architecture for building
the wheels and gears needed to drive cognition, in a way that integrates much of
the evidence from the behavioral sciences.
The architecture I propose and develop here is called the “Semantic Pointer
Architecture” (SPA), for reasons that will become clear in chapter 3. I believe a
central departure of the SPA from past attempts to articulate theories of cognition,
is that it does not begin with cognition, it ends there. That is, it does not start with
CONTENTS 6
the suggestion that cognitive systems are like computers, or that they are best de-
scribed as abstract nodes connected together in a network. Instead, it begins with
biology. That is why it is a theory of how to build a brain. It is in virtue of starting
with biology that the SPA is able to draw on evidence from (and make predic-
tions about) disciplines whose results are fundamentally the product of biological
processes. To the best of my knowledge, there are currently no systematic meth-
ods for relating biological data to our high-level understanding of the complex
dynamics, and sophisticated representational properties of cognitive systems.
Even in the likely scenario that the SPA is found wanting, it should at least
make clear that starting with biology mandates that the specific computational and
representational properties of neural systems be taken into account when devel-
oping an understanding of higher-order cognition. Not doing so makes it difficult,
if not impossible, to relate the two (biology and cognition) at the end of the day.
So, the architecture I propose adopts cognitively relevant representations, com-
putations, and dynamics that are natural to implement in large-scale, biologically
realistic neural networks. In short, the SPA is centrally inspired by understanding
cognition as a biological process – or what I refer to as “biological cognition”.
Perhaps unsurprisingly, before I can describe this architecture, I need to pro-
vide an outline of what is expected of a cognitive theory and a systematic method
for understanding large-scale biologically realistic systems: these are the purposes
of chapters 1 and 2 respectively. In chapters 3-6 I describe the four central aspects
of the SPA that allow it to capture biological cognition. In chapter 7 I provide
a summary of the architecture, and a description of its application to construct-
ing a single model that is able to perform a multitude of perceptual, motor, and
cognitive tasks without user intervention. In the remainder of the book, I argue
for a set of criteria for evaluating cognitive theories (chapter 8), I describe current
state-of-the-art approaches to cognitive modeling, evaluate them with respect to
those criteria, and compare and contrast them with the SPA (chapter 9), and I high-
light future directions of the SPA and discuss several conceptual consequences of
adopting this approach to understanding cognitive function.
Despite proposing a specific architecture, this book is also usable indepen-
dently of those commitments. That is, there is a practical sense to the title of this
book: I provide tutorials for constructing single neuron-level, spiking models at
the end of each chapter. For those who wish, brain building can be more than
just an abstract, theoretical pursuit. These tutorials employ the neural simulator
Nengo, that has been developed by my lab. The first tutorial describes how to set
up and run this graphical simulator. For those wishing to use this book as a text for
a course, please visit http://compneuro.uwaterloo.ca/cnrglab/ for further
CONTENTS 7
• Query: Have a section on ‘all the math you need to read this book’: 1. dis-
cussion of scalars and vectors, as ’arrows’ or just as points x,y on a graph,
as collections of numbers; 2. distinguishing linear from nonlinear functions
and functions in general; 3. integration 4. The math that goes beyond any-
thing described here is in the appendices 5. anything on dynamics? super
simple differential equation 6. dot product? 7. any other notation????
Acknowledgements
• TODO
Chapter 1
8
CHAPTER 1. THE SCIENCE OF COGNITION 9
Figure 1.1: The iCub. The iCub is an example of a significant recent advance
in robotics. See http://www.robotcub.org/index.php/robotcub/gallery/videos for
videos. © The RobotCub Consortium. Image by Lorenzo Natale, reproduced
with permission.
cally speaking, important discoveries have been made by the funded researchers.
However, these discoveries tend not to be of the kind that tell us how to better con-
struct integrated, truly cognitive systems. Rather, they are more often discoveries
in the specific disciplines that are taking part in these “Integrated Projects.”
For instance, sophisticated new robotic platforms have been developed. One
example is the iCub (figure 1.1), which is approximately the size of a two-year-
old child, and has over 56 degrees of freedom.1 The iCub has been adopted for
use by researchers in motor control, emotion recognition and synthesis, and active
perception. But, of course, iCub is not a cognitive system. It may be a useful
testbed for a cognitive system, no doubt. It may be a wonder of high-tech robotics
engineering, indeed. But, it is not a cognitive system.
So the pointed question still stands: “What have we actually accomplished in
the last 50 years in cognitive systems research?” That is, what do we now know
about how cognitive systems work, that we did not know 50 years ago? Pes-
simistically, it might be argued that we do not know too much more than we knew
50 years ago. After all, by 1963 Newell and Simon had described in detail the
program GPS (General Problem Solver). This program, which was an extension
of work that they had started in 1959, is the first in a line of explanations of human
1A degree of freedom is an independent motion of an object. So, moving the wrist up and
down is one degree of freedom, as is rotating the wrist, or moving a finger up and down.
CHAPTER 1. THE SCIENCE OF COGNITION 10
such a system when it interacts with the difficult-to-predict dynamics of the world,
and look to perception to provide guidance for that control. If-then rules are sel-
dom used. Differential equations, statistics, and signal processing are the methods
of choice. Unfortunately, it has remained unclear how to use those same methods
for characterizing high-level cognitive behavior – like language, complex plan-
ning, and deductive reasoning – behaviors which classical approaches have the
most success at explaining.
In short, there is a broad gap in our understanding of real, cognitive systems:
on the one hand there are the approaches centered on dynamic, real-world percep-
tion and action; on the other hand there are the approaches centered on higher-
level cognition. Unfortunately, these approaches are not the same. Nevertheless,
it is obvious that perception/action and high-level cognition are not two indepen-
dent parts of one system. Instead, these two aspects are, in some fundamental
way, integrated in cognitive animals such as ourselves. Indeed, a major theme of
this book is that biology underwrites this integration. But for now, I am only con-
cerned to point out that classical architectures are not obviously appropriate for
understanding all aspects of real cognitive systems. This, then, is why we cannot
simply say, in answer to the question of what has been accomplished in the last 50
years, that we have identified the (classical) cognitive architecture.
However, this does not mean that identifying such architectures is without
merit. On the contrary, one undeniably fruitful consequence of the volumes of
work that surrounds the discussion of classical architectures is the identification
of criteria for what counts as a cognitive system. That is, when proponents of clas-
sicism were derided for ignoring cognitive dynamics, one of their most powerful
responses was to note that their critics had no truly cognitive systems to replace
theirs with. This resulted in a much clearer specification of what a cognitive sys-
tem was. So, I suspect that there would be agreement amongst most of the experts
gathered in Brussels as to what has been accomplished lo these 50 years. Indeed,
the accomplishments are not in the expected form of an obvious technology, a
breakthrough method, or an agreed upon theory. Instead, the major accomplish-
ments have been in clarifying what the questions are, in determining what counts
as a cognitive system, and in figuring out how we are most likely to succeed in
explaining such systems (or, perhaps more accurately, how we are not likely to
succeed).
If true, this is no mean feat. Indeed, it is even more true in science than else-
where that, as economic Nobel laureate Paul A. Samuelson has observed, “good
questions outrank easy answers.” If we actually have more thoroughly identified
criteria for distinguishing cognitive from non-cognitive systems, and if we really
CHAPTER 1. THE SCIENCE OF COGNITION 12
have a good sense of what methods will allow us to understand how and why sys-
tems can successfully meet those criteria, we have accomplished a lot. Ultimately,
only time will tell if we are on the right track. Nevertheless, I believe there is an
overall sense in the field that we have a better idea of what we are looking for in
an explanation of a cognitive system than we did 50 years ago – even if we do not
yet know what that explanation is. Often, progress in science is more about iden-
tifying specific questions that have uncertain answers than it is about proposing
specific answers to uncertain questions.
The goal of this first chapter, then, is to identify these cognitive criteria and
articulate some questions arising from them. These appear in sections 1.3 and
1.4 respectively. First, however, it is worth a brief side trip into the history of the
behavioral sciences to situate the concepts and methods that have given rise to
these criteria.
ming language, expresses the rules that the system follows (Fodor, 1975). Fodor
claims that this metaphor is essential for providing a useful account of how the
mind works. Production systems, which I have already introduced, have become
the preferred implementation of this metaphor.
A second major approach is “connectionism” (aka the Parallel Distributed Pro-
cessing (PDP) approach or New-fangled Artificial Intelligence (NFAI)). In short,
connectionists explain cognitive phenomena by constructing models that consist
of large networks of typically identical nodes, that are connected together in var-
ious ways. Each node performs a simple input/output mapping. However, when
grouped together in sufficiently large networks, the activity of these nodes is in-
terpreted as implementing rules, analyzing patterns, or performing any of several
other cognitively-relevant behaviors. Connectionists, like the symbolicists, rely
on a metaphor for providing explanations of cognitive behaviors. This metaphor,
however, is much more subtle than the symbolicist one; these researchers pre-
sume that the functioning of the mind is like the functioning of the brain. The
subtlety of the “mind as brain” metaphor lies in the fact that connectionists also
hold that the mind is the brain. However, when providing cognitive descriptions,
it is the metaphor that matters, not the identity. In deference to the metaphor,
the founders of this approach call it “brain-style” processing, and claim to be dis-
cussing “abstract networks” (Rumelhart and McClelland, 1986). In other words,
their models are not supposed to be direct implementations of neural processing,
and hence cannot be directly compared to the kinds of data we gather from real
brains. This is not surprising since the computational and representational prop-
erties of the nodes in connectionist networks bear little resemblance to neurons in
real biological neural networks.2
The final major approach to cognitive systems in contemporary cognitive sci-
ence is “dynamicism”, and is often closely allied with “embedded” or “embod-
ied” approaches to cognition. Proponents of dynamicism also rely heavily on a
metaphor for understanding cognitive systems. Most explicitly, van Gelder em-
ploys the Watt Governor as a metaphor for how we should characterize the mind
(van Gelder, 1995). In general, dynamicist metaphors rely on comparing cogni-
tive systems to other, continuously coupled, nonlinear, dynamical systems (e.g.
the weather). Van Gelder’s metaphor is useful because it has played a central role
in his arguments for the novelty of dynamicist approaches, and the metaphor is a
simple one.
The Watt Governor is a mechanism for controlling (i.e. “governing”) the speed
2 As discussed in chapter 10 of (Bechtel and Abrahamsen, 2001).
CHAPTER 1. THE SCIENCE OF COGNITION 14
of an engine shaft under varying loads. It is named after James Watt because he
used it extensively to control steam engines, though the same mechanism was
in use before Watt to control wind and water mills. Figure 1.2 depicts a Watt
Governor. It consists of two weights attached to movable arms that are connected
a vertical shaft. The vertical shaft is driven by an engine, so as the engine spins
the vertical shaft more quickly, centripetal forces cause the weights to be raised.
The Governor is also connected to a throttle controlling the engine. Presuming the
engine is being used to do work (e.g., driving a saw to cut wood), it is subject to
varying loads (e.g., the presence or absence of a board to cut). With a constant load
on the engine, the vertical shaft spins a constant speed and the weights maintain
a given position. If the load decreases, the shaft speed increases, causing the
weights to move higher, decreasing the throttle until the desired speed is again
reached. Increasing the load has the opposite effect.
It is through his analysis of the best way to characterize this dynamic system
that van Gelder argues for understanding cognitive systems as non-representational,
low-dimensional, dynamic systems. Like the Watt Governor, van Gelder main-
tains, cognitive systems are essentially dynamic and can only be properly un-
derstood by characterizing their state changes through time. The “mind as Watt
Governor” metaphor suggests that trying to impose any kind of discreteness, ei-
ther temporal or representational, will lead to a mischaracterization of cognitive
systems.
This same sort of analysis – one which highlights the importance of dynamics
– highlights the essential coupling of cognitive systems to their environment (van
Gelder and Port, 1995). Dynamic constraints are clearly imposed by the environ-
ment on the success of our behavior (we must see and avoid the cheetah before
it eats us). If our high-level behaviors are built on our low-level competencies,
then it is not surprising that identifying this important role of the environment has
lead several researchers to emphasize the fact that real cognitive systems are em-
bedded within a specific environment, with specific dynamics. Furthermore, they
have argued, the nature of that environment can have significant impacts on what
cognitive behaviors are realized (Clark, 1997; Hutchins, 1995). Because many
of the methods and assumptions of dynamicism and embedded approaches are
shared, in this discussion I group both under the heading of “dynamicism.”
Notably, each of symbolicism, connectionism, and dynamicism, rely on metaphor
not only for explanatory purposes, but also for developing the conceptual founda-
tions of their preferred approach to cognitive systems. For symbolicists, the prop-
erties of Turing machines become shared with minds because they are the same
kind of computational system. For connectionists, the character of representa-
CHAPTER 1. THE SCIENCE OF COGNITION 15
weight weight
input shaft
engine
link
Figure 1.2: The Watt Governor. Two weights rotate around a vertical shaft driven
by an engine. The centripetal force created by the rotation of the vertical shaft
causes the weights to be pushed outwards and raises a collar which is attached
to the weighted arms. The collar is linked to the engine throttle and as it rises it
decreases the throttle, reducing the speed of the input shaft. The system settles
into a state where the throttle is open just enough to maintain the position of the
weights. Increasing or decreasing the load on the engine disturbs this equilibrium
and causes the governor to readjust the throttle until the system stabilizes again.
CHAPTER 1. THE SCIENCE OF COGNITION 16
activities of individual nodes to be representations. These are still treated unlike symbolic repre-
sentations.
4 For a detailed description of the analogies, predictions, and experiments, see (Eliasmith and
Thagard, 1997).
CHAPTER 1. THE SCIENCE OF COGNITION 17
insides of that black box was heavily influenced by concurrent successes in build-
ing and programming computers to perform complex tasks. Thus, many early
cognitive scientists saw, when they opened the lid of the box, a computer. As ex-
plored in detail by Jerry Fodor, “[c]omputers show us how to connect semantical
[meaning-related properties] with causal properties for symbols” (Fodor, 1987, p.
18), thus computers have what it takes to be minds. Once cognitive scientists be-
gan to think of minds as computers, a number of new theoretical tools became
available for characterizing cognition. For instance, the computer’s theoretical
counterpart, the Turing machine, suggested novel philosophical theses, includ-
ing “functionalism” (the notion that only the mathematical function computed by
a system was relevant for its being a mind or not) and “multiple realizability”
(the notion that a mind could be implemented (i.e. realized), in pretty much any
substrate – water, silicon, interstellar gas, etc. – as long as it computed the ap-
propriate function). More practically, the typical architecture of computers, the
von Neumann architecture (which maps directly to the architecture of a produc-
tion system), was thought by many to be relevant for understanding our cognitive
architecture.
Eventually, adoption of the von Neumann architecture for understanding minds
was seen by many as poorly motivated. Consequently, the early 1980s saw a sig-
nificant increase in interest in the connectionist research program. As mentioned
previously, rather than adopting the architecture of a digital computer, these re-
searchers felt that an architecture more like that seen in the brain would provided
a better model for cognition. It was also demonstrated that a connectionist archi-
tecture could be as computationally powerful as any symbolicist architecture. But
despite the similar computational power of the approaches, the specific problems
at which each approach excelled were quite different. Connectionists, unlike their
symbolicist counterparts, were very successful at building models that could learn
and generalize over the statistical structure of their input. Thus, they could begin
to explain many phenomena not easily captured by symbolicists, such as object
recognition, reading, concept learning, and other behaviors crucial to cognition.
For some, however, connectionists had clearly not escaped the influence of
the mind as computer metaphor” after all, connectionists still spoke of represen-
tations, and thought of the mind as a kind of computer. Dynamicists, in contrast,
suggested that if we want to know which functions a system can actually perform
in the real world, we must know how to characterize the system’s dynamics. Since
cognitive systems evolved in specific, dynamic environments, we should expect
evolved control systems, i.e., brains, to be more like the Watt Governor – dynamic,
continuous, coupled directly to what they control – than like a discrete-state Tur-
CHAPTER 1. THE SCIENCE OF COGNITION 20
work is done. First, for instance, I need to use the historical background just
presented to identify what I earlier claimed was the major advance of the the last
50 years: identifying criteria for evaluating the goodness of cognitive theories.
Perhaps more importantly, I need to specify the methods that can underwrite such
a departure – that is the objective of chapters 2-7.
1.3 Where we are
It will strike many as strange to suggest that there is anything like agreement on
criteria for identifying good cognitive theories. After all, this area of research has
been dominated by, shall we say, “vigorous debate” between proponents of each of
the three approaches. For instance, it is true that there are instances of symbolicists
calling connectionism “quite dreary and recidivist” (Fodor, 1995). Nevertheless, I
believe that there is some agreement on what the target of explanation is. By this
I do not mean to suggest that there is an agreed-upon definition of cognition. But
rather that “cognition” is to most researchers what “pornography” was to justice
Potter Stewart: “I shall not today attempt further to define [pornography]; and
perhaps I could never succeed in intelligibly doing so. But I know it when I see
it.”
Most behavioral researchers seem to know cognition when they see it, as well.
There are many eloquent descriptions of various kinds of behavior in the litera-
ture, which most readers – regardless of their commitments to symbolicism, con-
nectionism, or dynamicism – recognize as cognitively relevant behavior. It is not
as if symbolicists think that constructing an analogy is a cognitive behavior and
dynamicists disagree. This is why I have suggested that we may be able to identify
agreed-upon criteria for evaluating cognitive theories. To be somewhat provoca-
tive, I will call these the “Quintessential Cognitive Criteria”, or QCC for short.
I should note that this section is only a first pass and summary of considerations
for the proposed QCC. I return to a more detailed discussion of the QCC before
explicitly employing them in chapter 8.
So what are the QCC? Let us turn to what have researchers said about what
makes a system cognitive. Here are examples from proponents of each view:
together. For instance, they suggest that cognitive systems cannot represent “John
loves Mary” with out thereby being able to represent “Mary loves John”. Finally,
compositionality is the suggestion that the meaning of complex representations
is a direct “composition” (i.e. adding together) of the meanings of the basic rep-
resentations. Any good theory, they claim, must explain these basic features of
cognition.
More recently, Jackendoff has dedicated his book to identifying challenges for
a cognitive neuroscience of cognition (Jackendoff, 2002). In it he suggests that
there are four main challenges to address when explaining cognition. Specifically,
Jackendoff’s challenges are: 1) the massiveness of the binding problem (that very
many basic representations must be bound to construct a complex representation);
2) the problem of 2 (how multiple instances of one representational token can
be distinguished); 3) the problem of variables (how can roles (e.g. “subject”) in
a complex representation be generically represented); and 4) how to incorporate
long-term and working memory into cognition. Some of these challenges are
closely related to those of Fodor and Pylyshyn, and so are integrated with them as
appropriate in the QCC (see table 1.1).
The Fodor, Pylyshyn, and Jackendoff criteria come from a classical, symboli-
cist perspective. In a more connectionist-oriented discussion, Don Norman sum-
marizes several papers he wrote with Bobrow in the mid-70s, in which they ar-
gue for the essential properties of human information processing (Norman, 1986).
Based on their consideration of behavioral data, they argue that human cogni-
tion is: robust (appropriately insensitive to missing or noisy data, and damage
to its parts), flexible, and relies on “content-addressable” memory. Compared to
symbolicist considerations, the emphasis in these criteria has moved from repre-
sentational constraints to more behavioral constraints, driven by the “messiness”
of psychological data.
Dynamicists can be seen to continue this trend towards complexity in their dis-
cussions of cognition. Take, for instance, Gregor Schöner’s discussion in his arti-
cle “Dynamical systems approaches to cognition” (Schöner, 2008). In his opening
paragraphs, he provides examples of the sophisticated action and perception that
occurs during painting, and playing in a playground. He concludes that “cognition
takes place when organisms with bodies and sensory systems are situated in struc-
tured environments, to which they bring their individual behavioral history and to
which they quickly adjust“ (p. 101). Again, we see the importance of flexibility
and robustness, with the addition of an emphasis on the role of the environment.
All three perspectives, importantly, take the target of their explanation to be
a large, complex system. As a result, all three are suspect of “toy” models, that
CHAPTER 1. THE SCIENCE OF COGNITION 24
ioral sciences would agree can be used to evaluate cognitive theories – regardless
of their own theoretical predispositions. As a result, table 1.1 summarizes the
QCC we can, at least on the face of it, extract from this discussion. As a reminder,
I do not expect the mere identification of these criteria to be convincing. A more
detailed discussion of each is presented in chapter 8. It is, nevertheless, useful to
keep the QCC in mind as we consider a new proposal.
of these pieces of information can influence what is the best motor control plan,
and so must be taken into account when constructing such a plan. Because the
sources of such information can remain the same (e.g. visual information comes
from the visual system), while the destination of the information may change (e.g.,
from arm control to finger control), a means of routing that information must be in
place. More cognitively speaking, if I simply tell you that the most relevant infor-
mation for performing a task is going to switch from something you are hearing
to something you will be seeing, you can instantly reconfigure the information
flow through your brain to take advantage of that knowledge. Somehow, you are
re-routing the information you use for planning to come from the visual system
instead of the auditory system. We do this effortlessly, we do this quickly, and we
do this constantly.
The fourth question focuses our attention on another important source of cog-
nitive flexibility: our ability to use past information to improve our performance
on a future task. The time scale over which information might be relevant to a
task ranges from seconds to many years. Consequently, it is not surprising that the
brain has developed mechanisms to store information that also range over these
time scales. Memory and learning are behavioral descriptions of the impressive
abilities of these mechanisms. Considerations of memory and learning directly
address several of the performance concerns identified in the QCC. Consequently,
any characterization of a cognitive system has to provide some explanation for
how relevant information is propagated through time, and how the system can
adapt to its past experience.
So, I use these four questions to direct my description of a cognitive archi-
tecture. Unsurprisingly, there are already many answers to these questions. Most
often there are multiple, conflicting answers that have resulted in a variety of
contemporary debates regarding cognitive function. For example, with respect to
syntax, there is a contemporary debate about whether are cognitive representa-
tions are best understood as symbolic or subsymbolic. It might be natural, then,
to try to characterize a new architecture by describing which side of such debates
it most directly supports. However, Paul Thagard (2012) has recently suggested
that the architecture I describe – the Semantic Pointer Architecture (SPA) – can be
taken to resolve many of these debates by providing a synthesis of the opposing
views. For example, instead of thinking that an approach must be symbolic or
subsymbolic, or computational or dynamical, or psychological or neural, the SPA
is all of these. The seven debates he highlights are shown in table 1.2.
Of course, much work must be done to make a convincing case that the SPA
successfully provides a convincing resolution to some of these long-standing de-
CHAPTER 1. THE SCIENCE OF COGNITION 28
I would like to propose to adopt this as a motto for how best to answer our ques-
tions about cognition. It suggests that creating a cognitive system would provide
one of the most convincing demonstrations that we truly understand such a system.
From such a foundation, we would be in an excellent position to evaluate compet-
ing descriptions, and ultimately evaluate answers to our questions about cognition.
I will not argue for this point in great depth (though others have (Dretske, 1994)).
But this, in a nutshell, is the goal I’m striving towards. In the next chapter, I be-
gin with a description of neural mechanisms, so we can work our way towards an
understanding of biological cognition.
CHAPTER 1. THE SCIENCE OF COGNITION 29
Of course, in this case, as in the case of many other complex physical sys-
tems, “creating” the system amounts to creating simulations of the underlying
mechanisms in detail. As a result, I also introduce a tool for creating neural simu-
lations that embodies the principles described throughout the book. This tool is a
graphical simulation package called Nengo (http://www.nengo.ca/), that can
be freely downloaded and distributed. At the end of each chapter, I provide a tu-
torial for creating and testing a simulation that relates to a central idea introduced
in that chapter. It is possible to understand all aspects of this book without using
Nengo, and hence skipping the tutorials. But, I believe a much deeper understand-
ing is possible if you “play” with simulations to see how complex representations
can map to neural spike patterns, and how sophisticated transformations of the
information in neural signals can be accomplished by realistic neuronal networks.
After all, knowing how to build something and building it yourself are two very
different things.
Installation
Nengo works on Mac, Windows, and Linux machines and can be installed by
downloading the software from http://nengo.ca/.
• To install, unzip the downloaded file where you want to run the application
from.
CHAPTER 1. THE SCIENCE OF COGNITION 30
An empty Nengo world appears. This is the main interface for graphical model
construction.
Building a model
The first model we will build is very simple: just a single neuron.
• A column of icons should occupy the left side of the screen. This is the
Template Bar. If the template bar is not displayed, click the icon showing a
pair of scissors over a triangle which is located at the top-right corner of the
screen.
• From the template bar, click the icon above the label ‘Network’ and drag
the network into the workspace. Alternatively, right-click anywhere on the
background and choose New Network.
• Set the Name of the network to ‘A single neuron’ and click OK.
All models in Nengo are built inside ‘networks’. Inside a network you can put
more networks, and you can connect networks to each other. You can also put
other objects inside of networks, such as neural populations (which have indi-
vidual neurons inside of them). For this model, you first need to create a neural
population.
• Click the Ensemble icon in the template bar and drag this object into the
network you created. In the dialog box that appears, you can set Name to
‘neuron’, Number of nodes to 1, Dimensions to 1, Node Factory to ‘LIF
Neuron’, and Radius to 1. Click OK.
• To the see neuron you created double-click on the ‘population’ you just
created.
CHAPTER 1. THE SCIENCE OF COGNITION 31
Figure 1.3: A single neuron Nengo model. This is a screen capture from Nengo
that shows the finished model described in this section.
This shows a single neuron called ‘node0’. To get details about this neu-
ron, you can right-click it and select Configure and look through the parameters,
though they won’t mean much without a good understanding of single cell mod-
els. In order to simulate input to this neuron, you need to add another object to
the model that generates that input.
• Close the NEFEnsemble Viewer (with the single node in it) by clicking the
‘X’ in the upper right of that window.
• Drag the Input component from the template bar into the Network Viewer
window. Set Name to ‘input’, and Output Dimensions to 1. Click Set Func-
tions. In the drop down select Constant Function. Click Set. Set Value to
0.5. Click OK for all the open dialogs.
A network item that provides a constant current to inject into the neuron has now
been created. It has a single output labeled ‘origin’. To put the current into the
neuron, you need to create a ‘termination’ (i.e., input) on the neuron.
• Drag a Termination component from the template bar onto the ‘neuron’
population. Set Name to ‘input’, Weights Input Dim to 1, and tauPSC to
0.02. Click Set Weights, double-click the value and set it to 1. Click OK.
CHAPTER 1. THE SCIENCE OF COGNITION 32
A new element will appear on the left side of the population. This is a decoded
termination which takes an input and injects it into the population. The tauPSC
specifies the ‘synaptic time constant’, which I will discuss in the next chapter. It
is measured in seconds, so 0.02 is equivalent to 20 milliseconds.
• Click and drag the ‘origin’ on the input function you created to the ‘input’
on the neuron population you created.
Congratulations, you’ve constructed your first neural model. Your screen should
look like Figure 1.3.
Running a model
You want to make sure your model works, so you need to use to the parts of Nengo
that let you ‘run’ the model. There are two ways to run models in Nengo. The first
is a ‘non-interactive mode’, which lets you put ‘probes’ into parts of the network.
You then run the network for some length of time, and those probes gather data
(such as when neurons are active, what their internal currents and voltages are,
etc.). You can then view that information later using the built in data viewer
or with another program such as Matlab®. For future reference, to access this
method of running, you can right-click on the background of the network and use
the ‘Run’ command under the ‘Simulator’ heading.
The other way to run models is to start ‘interactive plots’, which is more hands-
on, so I will use it in the examples in this book. An example of the output gener-
ated by interactive plots for this model is shown in Figure 1.4.
• Click on the Interactive Plots icon (the double sine wave in the top right
corner).
This pulls up a new window with a simplified version of the model you made. It
shows the ‘input’ and ‘neuron’ network elements that you created earlier. They
are the elements that you will interact with while running the simulation. This
window also has a ‘play’ button and other controls that let you run your model.
First, you need to tell the viewer which information generated by the model you
would like to see.
A spike raster shows the times that the neurons in the population fire an action po-
tential spike (signifying a rapid change in its membrane voltage), which is largely
how neurons communicate with one another. You can drag the background, and
any of the objects around within this view to arrange them in a useful manner.
This graph will show what effects this neuron will have on the current going into
a neuron that receives its output spikes shown in the raster. Each small angular
pulse can be thought of as a postsynaptic current or PSC that would be induced in
the receiving cell.
This shows a controller for manipulating the input, and the value of your control
input over time.
The simulation is now running. Because the neuron that you generated was ran-
domly chosen, it may or may not be active with the given input. Either way, you
should grab the slider control and move it up and down to see the effects of in-
creasing or decreasing input. Your neuron will either fire faster with more input
(an ‘on’ neuron) or it will fire faster with less input (an ‘off’ neuron). Figure 1.4
shows an on neuron with a fairly high firing threshold (just under 0.69 units of
input). All neurons have an input threshold below which (or above which for off
neurons) they will not fire any spikes.
You can test the effects of the input and see if you have an on or off neuron,
and where the threshold is. You can change the neuron by pausing the simulation,
and returning to the main Nengo window. To randomly pick another neuron, do
the following in the main window:
• Click on the gray rightward pointing arrow that is beside i neurons (int). It
will point down and i 1 will appear.
• Double-click on the 1 (it will highlight with blue), and hit Enter. Click
Done.
CHAPTER 1. THE SCIENCE OF COGNITION 34
a single neuron
3.6490 7.6490
Figure 1.4: Running the single neuron model. This is a screen capture from the
interactive plot mode of Nengo running a single neuron.
CHAPTER 1. THE SCIENCE OF COGNITION 35
You can now return to the interactive plots and run your new neuron by hit-
ting play. Different neurons have different firing thresholds. As well, some
are also more responsive to the input than others. Such neurons are said to
have higher sensitivity, or ‘gain’. You can also try variations on this tutorial
by using different neuron models. Simply create another population with a sin-
gle neuron and choose something other than ‘LIF neuron’ from the drop down
menu. There are a wide variety of neuron parameters that can be manipulated.
In addition, it is possible to add your own neuron models to the simulator, in-
ject background noise, and have non-spiking neurons. These topics are beyond
the scope of this tutorials, but relevant information can be found on our website
http://compneuro.uwaterloo.ca/cnrglab/.
Congratulations, you have now built and run your first biologically plausible
neural simulation using Nengo. You can save and reload these simulations using
the File menu.
Part I
36
Chapter 2
Before turning to my main purpose of answer the four questions regarding seman-
tics, syntax, control, and memory and learning, I need to provide an introduction
to the main method I will be relying in providing answers to these questions. As
I argued in the last chapter, I believe the best answers to such questions will be
biologically-based. In the sections that follow, I provide an overview of the rel-
evant biology, and introduce the Neural Engineering Framework (NEF), which
provides basic methods for constructing large-scale neurally realistic models of
the brain. My purpose here is to lay a foundation for what is to come: a neural
architecture for biological cognition. Additional details on the NEF can be found
in Eliasmith and Anderson (2003).
Brains are absolutely fantastic devices. For one, they are incredibly efficient.
Brains consume only about 20 W of power – the equivalent of a compact fluo-
rescent light bulb. To put this power efficiency in some perspective, one of the
world’s most powerful supercomputers “roadrunner” at Los Alamos labs in the
United States, which, as far as we know, is unable to match the computational
power of the mammalian brain, consumes 2.35 MW (about 100,000 times more).
Brains are also relatively small compared to the size of our bodies. A typical brain
weighs between 1 and 2 kg and comprises only 2% of our body weight. Never-
theless, brains account for about 25% of the energy used by the body. This is
37
CHAPTER 2. AN INTRODUCTION TO BRAIN BUILDING 38
especially surprising when you consider the serious energy demands of muscles,
which must do actual physical work. Presumably, since the brain is such a power
hog, it is doing something important for our survival or it would not have been
preserved by evolution. Presumably, that “something important” is somewhat
obvious: brains control the four Fs (feeding, fleeing, fighting, and, yes, reproduc-
tion); brains provide animals with behavioral flexibility that is unmatched by our
most sophisticated machines; and brains are constantly adapting to the uncertain,
noisy, and rapidly changing world in which they find themselves embedded.
We can think of this incredibly efficient device as something like a soft pillow
crammed inside our skulls. While the texture of brains has often been compared
to that of a thick pudding, it is more accurate to think of the brain as being a large
sheet, equivalent in size to about four sheets of writing paper, and about 3 mm
thick. In almost all animals, the sheet has six distinct layers which are composed
of three main elements: 1) the cell bodies of neurons; 2) the very long thin pro-
cesses used for communication; and 3) glial cells, which are a very prevalent but
poorly understood companion to neurons. In each square millimeter of human
cortex there are crammed about 170,000 neurons. So, there are about 25 billion
neurons in human cortex. Overall, however, there are approximately 100 billion
neurons in the human brain. The additional neurons come from “subcortical” ar-
eas, which include cerebellum, basal ganglia, thalamus, and the brainstem, among
others. To get a perspective on the special nature of human brains, it is worth not-
ing that monkey brains are approximately the size of one sheet of paper, and rats
have brains the size of a Post-it note.
In general, it is believed that what provides brains with their impressive com-
putational abilities is the organization of the connections among individual neu-
rons. These connections allow cells to collect, process, and transmit information.
In fact, neurons are specialized precisely for communication. In most respects,
neurons are exactly like other cells in our bodies: they have a cell membrane, a
nucleus, and they have similar metabolic processes. What makes neurons stand
out under a microscope, are the many branching processes which project outwards
from the somewhat bulbous, main cell body. These processes are there to enable
short and long distance communication with other neural cells. This is a hint:
if we want to understand how brains work, we need to have some sense of how
neurons communicate in order to compute.
Figure 2.1 outlines the main components and activities underlying cellular
communication. The cellular processes that carry information to the cell body are
called dendrites. The dendrites carry signals, in the form of an ionic current to
the main cell body. If sufficient current enters the cell body at a given point in
CHAPTER 2. AN INTRODUCTION TO BRAIN BUILDING 39
time, a series of nonlinear events will be triggered that result in an action poten-
tial, or voltage “spike,” that will proceed down the output process of the neuron,
which is call an axon. Neural spikes are very brief events, lasting for only a few
milliseconds (see figure 2.2), which travel in a wave-like fashion down the axon
until they reach the end of the axon, called the bouton, where they cause the re-
lease of tiny packets of chemicals called neurotransmitters. Axons end near the
dendrites of subsequent neurons. Hence, neurotransmitters are released into the
small space between axons and dendrites, which is called the synaptic cleft. The
neurotransmitters very quickly cross the cleft and bind to special proteins in the
cell membrane of the next neuron. This binding causes small gates, or channels,
in the dendrite of the next neuron to open, which allows charged ions to flow into
the dendrite. These ions result in a current signal in the receiving dendrite, called
the postsynaptic current (PSC; see figure 2.2), which flows to the cell body of a
subsequent neurons, and the process continues as it began.
A slightly simpler description of this process, but one which retains the central
relevant features, is as follows:
1. Signals flow down the dendrites of a neuron into its cell body.
2. If the overall input to the cell body from the dendrites crosses a threshold,
the neuron generates a stereotypical spike that travels down the axon.
3. When the spike gets to the end of the axon, it causes chemicals to be re-
leased that travel to the connected dendrite and, like a key into a lock, cause
the opening of channels that produce a postsynaptic current (PSC) in the
receiving dendrite.
4. This current is the signal in step 1., that then flows to the cell body of the
next neuron.
As simple as this story first sounds, it is made much more complex in real
brains by a number of factors. First, neurons are not all the same. There are
hundreds of different kinds of neurons that have been identified in mammalian
brains. The neurons can range in size from 10−4 to 5 m in length. The number
of inputs to a cell can range from about 500 or fewer, to well over 200,000. The
number of outputs, that is, branches of a single axon, cover a similar range. On
average, cortical cells have about 10,000 inputs and 10,000 outputs. Given all
of these connections, it is not surprising to learn that there are approximately 72
km of fiber in the human brain. Finally, there are hundreds of different kinds of
CHAPTER 2. AN INTRODUCTION TO BRAIN BUILDING 40
spikes
axon
axon hillock
postsynaptic currents
receptor
neurotransmitters
closed ion channel
open ion channel
Figure 2.1: Cellular communication. This figure diagrams the main components
and kinds of activities of typical neural cells during communication. The flow of
information begins at the left side of the image, with postsynaptic currents (PSCs)
travelling along dendrites towards the cell body of the neuron. At the axon hillock,
these currents result in a voltage spike being generated if they cause the somatic
voltage to go above the neuron’s threshold. The spikes travel along the axon and
cause the release of neurotransmitters at the end of the axon. The neurotransmit-
ters bind to matching receptors on the dendrites of connected neurons, causing ion
channels to open. Following the opening of ion channels, postsynaptic currents
are induced in the receiving neuron and the process continues.
CHAPTER 2. AN INTRODUCTION TO BRAIN BUILDING 41
Action Potentials
Postsynaptic Potentials
200 ms
Figure 2.2: Electrical activity in two connected neurons. The top trace shows
a series of neural spikes recorded intracellularly from a single rat neuron, that
projects this signal to a receiving neuron. The bottom trace shows a series of
postsynaptic potentials (PSPs, caused by PSCs) generated at a single synapse in
the receiving neuron in response to the input action potentials. The dotted lines
indicate the single PSPs that would result from a spike in isolation. The PSP trace
is approximately a sum of the individual PSPs. (Adapted from Holmgren et al.
(2003), figure 4, with permission).
300
200
Frequency (Hz)
100
0
0 0.5 1 1.5 2
Injected Current (nA)
jected current from the most common kinds of cells found in cortex.
Experimental neuroscientists who record the activity of single neurons in re-
sponse to perceptual stimuli shown to an animal, will tell you that no two neurons
seem to respond in the same manner. Often, the responses of a given cell is cap-
tured in terms of its “tuning curve”. Sometimes, these tuning curves will look
much like the response functions shown in figure 2.3 (with a stimulus parameter
instead of current on the x-axis). However, tuning curves can have many different
shapes. An example tuning curve from a cell in primary visual cortex is shown
in figure 2.4. In general, a tuning curve is a graph that shows the frequency of
spiking of a neuron in response to a given input stimuli. In this figure, we can see
that as the orientation of a presented bar changes, the neuron responds more or
less strongly. Its peak response happens at about 0 degrees. Nevertheless, there is
some information about the orientation of the stimulus available from this activity
whenever the neuron responds.
This tuning curve, while typical of cells in primary visual cortex, includes
an additional implicit assumption. That is, this is not the response of the cell
to any oriented bar at any position in the visual field. Rather, it is a response
to an oriented bar in what is called the “receptive field” of the cell. Since this
is a visual cell, the receptive field indicates which part of possible visual input
space the neuron responds to. So, to completely capture the “tuning” of the cell
to visual stimuli, we ideally want to combine the receptive field and tuning curve.
In the remainder of this book, I will use the notion of tuning curve in this more
CHAPTER 2. AN INTRODUCTION TO BRAIN BUILDING 43
350
V1 data
best LIF fit
300
250
Firing Rate (spikes/s)
200
150
100
50
0
80 60 40 20 0 20 40 60 80 100
Orientation (degrees)
Figure 2.4: Example tuning curve for a cell from a macaque monkey in primary
visual cortex (data provided by Dan Marcus). The x-axis indicates the angle of
rotation of a bar, and the y-axis indicates the firing rate of the neuron when shown
that stimulus. The graph is a result of many trials at each rotation, and hence has
error bars that indicate the standard deviation of the response. The dashed line
indicates the best fit of a leaky integrate-and-fire (LIF) neuron model to the data
(adapted from Eliasmith & Anderson (2003), with permission).
general sense. That is, what I subsequently define as a tuning curve often includes
information about the receptive field of the cell in question. Returning to the issue
of heterogeneity, the reason that experimental neuroscientists would suggest that
no two cells are the same, is partly because their tuning curves (coupled with their
receptive fields) seem to never perfectly overlap. As a result, while neurons near
to the one shown in figure 2.4 may share similar tuning properties, the peak, width,
and roll-off of the graph will be somewhat different.
We can begin to get a handle on the heterogeneity observed in neural systems
by noticing that there are two main sources for the variability: intrinsic and ex-
trinsic. Intrinsic heterogeneity, such as different responses to the same injected
current, will have to be captured by variability in the models of individual cells.
Extrinsic heterogeneity, such as the variability of tuning curves, can be understood
more as a consequence of where a particular cell sits in the network. That is, the
CHAPTER 2. AN INTRODUCTION TO BRAIN BUILDING 44
reason neighboring cells have different tuning curves is not merely because of in-
trinsic heterogeneity, but also because they are receiving slightly different inputs
than their neighbors, or treating the same inputs slightly differently. So, even if all
of the cells were intrinsically homogeneous, their tuning curves might look very
different depending on the processing of the cells before them. This difference be-
tween extrinsic and intrinsic heterogeneity will prove important in understanding
the sources of different kinds of dynamics observed in real neural networks.
Of course, this distinction does not in any way mitigate the fact of heterogene-
ity itself. We must still overcome the observed diversity of neural systems, which
has traditionally been seen as a barrier to theoretical analysis. In the next sections
we will see that despite this complexity, it is possible to suggest and quantify un-
derlying principles which do a good job of describing the functional properties
of neural networks that display broad heterogeneity. In fact, we can come to un-
derstand why this massive heterogeneity that we observe might provide for more
robust computations than those of a system whose individual components were
identical (see section 2.3.1).
biology suffer the same difficulties, as do other sciences such as geology, chem-
istry, and even physics. However, one main advantage that, for example, physics
has in mitigating these challenges is a widely shared technical vocabulary for de-
scribing both problems and solutions in the field. The development and applica-
tion of these quantitative tools have helped physics develop rapidly, leading us to
exciting new discoveries, as well as deep, challenging problems. The subfield of
physics most centrally concerned with the development of such tools is theoretical
physics.
Interestingly, an analogous subfield has been developing in neuroscience over
the last few decades as well, and it is often appropriately called “theoretical neuro-
science” (though perhaps equally as often called “computational neuroscience”).
In fact, the analogy between theoretical neuroscience and theoretical physics is
both strong, and useful for understanding its importance to neuroscience. For in-
stance, both are centrally interested in quantifying the phenomena under study.
This does not mean merely statistically quantifying the data generated by the phe-
nomena, but rather coming up with quantitative descriptions of the deterministic
regularities and mechanisms giving rise to that data. Take, for instance, one of the
greatest advances in theoretical physics, the development of Newton’s three laws
of motion. The second, perhaps most famous, law is that “The alteration of motion
is ever proportional to the motive force impressed; and is made in the direction
of the right line in which that force is impressed” Newton (1729, p. 19). In short
form: F = ma. The purpose of this statement is to make a clear, straightforward
hypothesis about motion. I describe similar principles in the sections 2.3.1-2.3.3
for neural representation, computation, and dynamics.
A second analogy between these two subfields is that both are interested in
summarizing an enormous amount of data. Newton’s second law is intended to
apply to all forms of motion, be it rectilinear, circular, or what have you. Mea-
suring all such forms of motion, and describing the results statistically would not
be nearly as concise. Similarly, theoretical neuroscientists are attempting to un-
derstand the basic principles behind neural function. Often, they would like their
mathematical descriptions to be as general as possible, although there is some de-
bate regarding whether or not the kind of unification being striven for in physics
should be a goal for theoretical neuroscience.
A third crucial analogy between theoretical physics and theoretical neuro-
science is that the disciplines are speculative. Most such quantitative descriptions
of general mechanisms go beyond the available data. As such, they almost always
suggest more structure than the data warrants, and hence more experiments to per-
form. Looking again to Newton’s second law, it is clear that Newton intended it
CHAPTER 2. AN INTRODUCTION TO BRAIN BUILDING 46
to be true for all velocities. However, special relativity has subsequently demon-
strated that the law is measurably violated for large velocities. With speculation
comes risk. Especially for a young field like theoretical neuroscience, the risk of
being wrong is high. But, the risk of becoming lost in the complexity of the data
without such speculation is much higher. The real fruit of a field like theoretical
neuroscience – fruit realized by theoretical physics – is (hopefully) the identifi-
cation of a set of principles that lets massive amounts of data be summarized,
thereby suggesting new questions and opening lines of communication between
different sub-disciplines.
Of course, there are crucial disanalogies between these fields as well. For
instance, we can think of physics as being interested in questions of what there
is, while neuroscience is interested in questions of who we are. As well, neuro-
science is a much younger discipline than physics, and so the methods for making
measurements of the system of interest are still rapidly developing.
Nevertheless, the similarities can help us understand why theoretical neu-
roscience is important for the development of neuroscience. Like theoretical
physics, theoretical neuroscience can help in at least two crucial ways. Specif-
ically, it should:
1. Quantify, and hence make more precise and testable, hypotheses about the
functioning of neural systems;
2. Summarize large amounts of experimental data, and hence serve to unify the
many sources of data from different “neuro-” and behavioral sub-disciplines.
The first of these stem from the commitment to using mathematics to describe
principles of neural function. The second is crucial for trying to deal with the
unavoidable complexities of neural systems. This is a challenge not faced to the
same degree by many physicists.1 Characterizing a system of billions of parts as if
each is identical, and as if the connections between all of them are approximately
the same can lead to very accurate characterizations of physical systems (e.g. the
ideal gas law). The same is not true of neural systems. Large neural systems
where all of the parts are the same, and interact in the same way simply do not
exist.
Thus, the links between the “lowest” and “highest” levels of characterizing
neural systems are complex and unavoidable. It is perhaps in this kind of circum-
stance that quantification of the system of interest plays its most crucial role. If
1 Thisobservation should make it obvious that I am in no way suggesting the behavioral sci-
ences should become physics, a mistake which has been made in the past Carnap (1931).
CHAPTER 2. AN INTRODUCTION TO BRAIN BUILDING 47
we can state our hypotheses about neural function at a “high” level, and quan-
tify the relationship between levels, then our high-level hypothesis will connect
to low-level details. In fact, ideally, a hypothesis at any level should contact data
at all other levels. It is precisely this kind of unification of experimental data
that is desperately needed in the behavioral sciences to support cross-disciplinary
communication, and eventually a mature theory of biological cognition.
This ideal role for theoretical neuroscience has not yet been realized. Perhaps
this is because there has historically been more of a focus on “low” levels of neural
systems (i.e. single cells or small networks). This is perfectly understandable in
light of the complexity of the system being tackled. Nevertheless, I believe we are
now in a position to begin to move past this state of affairs.
In the context of this book, I will adopt methods from theoretical neuroscience
that I believe currently have the best potential to realize this ideal role. At the
same time I hope the contents of this book will prod theoretical neuroscientists to
consider expanding the viable areas of application of their methods to all of the be-
havioral sciences. In the next section I introduce a series of theoretical principles
developed in the context of traditional theoretical neuroscience. In subsequent
chapters I suggest a way of applying these same principles to large-scale, cogni-
tive modeling. This should help to not only test, but also to refine such principles,
and it will help make our cognitive models subject to data from all of the various
disciplines of the behavioral sciences.
auditory system (Fischer, 2005; Fischer et al., 2007), parts of the rodent navi-
gation system (Conklin and Eliasmith, 2005), escape and swimming control in
zebrafish (Kuo and Eliasmith, 2005), tactile working memory in monkeys (Singh
and Eliasmith, 2006), and simple decision making in humans (Litt et al., 2008)
and rats (ref Laubach???). We have also used these methods to better understand
more general issues about neural function, such as how neural systems might per-
form time derivatives (Tripp and Eliasmith, 2010), how the variability of neural
spike trains and the timing of individual spikes relates to information that can be
extracted from spike patterns (Tripp and Eliasmith, 2007), how we can ensure that
biological constraints such as Dale’s Principle – the principle that a given neuron
typically has either excitatory or inhibitory effects but not both – are respected
by neural models (Parisien et al., 2008). ???MacNeil learning paper ref, when
accepted???.
One reason the NEF has such broad application is because it does not make as-
sumptions about what the brain actually does. Rather, it is a set of three principles
that can help determine how the brain performs a given function. For this reason,
John Miller once suggested that the NEF is a kind of “neural compiler”. If you
have a guess about the high-level function of the system you are interested in, and
you know (or assume) some information about how individual neurons respond
to relevant input, the NEF provides a way of connecting populations of neurons
together to realize that function. This, of course, is exactly what a compiler does
for computer programming languages. The programmer specifies a program in
a high-level language like Java. The Java compiler knows something about the
low-level machine language implemented in a given chip, and it translates that
high-level description into an appropriate low-level one.
Of course, things are not so clean in neurobiology. We do not have a perfect
description of the machine language, and our high-level language may be able
to define functions that we cannot actually implement in neurons. Consequently,
building models with the NEF can be an iterative or bootstrapping process: first
you gather data from the neural system and have a hypothesis about what it does;
then you build a model using the NEF and see if it behaves like the real system;
then, if it does not behave consistently with the data, you alter your hypothesis
or perform experiments to figure out why the two are different. Sometimes a
model will behave in ways that cannot be compared to data because the data does
not exist. In these lucky cases, it is possible to make a prediction and perform
an experiment. Of course this process is not unique to the NEF. Instead, it will
be familiar to any modeller. What the NEF offers is a systematic method for
performing these steps in the context of neurally realistic models.
CHAPTER 2. AN INTRODUCTION TO BRAIN BUILDING 49
It is worth emphasizing that, also like a compiler, the NEF does not specify
what the system does. This specification is brought to the characterization of the
system by the modeler. In short, the NEF is about how brains compute, not what
they compute. The bulk of this book is about the “what”, but those considerations
do not begin until chapter 3.
In the remainder of this section, I provide an outline of the three principles of
the NEF. There are, of course, many more details (at least a book’s worth!). So, a
few points are in order. First, the methods here are by no means the sole invention
of our group. We have drawn heavily on other work in theoretical neuroscience.
To keep this description as brief as possible, I refer the reader to other descrip-
tions of the methods that better place the NEF in its broader context (Eliasmith
and Anderson, 2003; Eliasmith, 2005b; Tripp and Eliasmith, 2007). Second, the
original Neural Engineering book is a useful source for far more mathematical de-
tail than I provide here. However, the framework has been evolving, so that book
should be taken as a starting point. Finally, some mathematics can be useful for
interested readers, but I have placed most of it in the appendices to emphasize a
more intuitive grasp of the principles. This is at least partially because the Nengo
neural simulator has been designed to handle the mathematical detail, allowing
the modeller to focus effort on capturing the neural data, and the hypothesis she
or he wishes to test.
The following three principles form the core of the NEF:2
1. Neural representations are defined by the combination of nonlinear encod-
ing (exemplified by neuron tuning curves, and neural spiking) and weighted
linear decoding (over populations of neurons and over time).
2. Transformations of neural representations are functions of the variables rep-
resented by neural populations. Transformations are determined using an
alternately weighted linear decoding.
3. Neural dynamics are characterized by considering neural representations as
state variables of dynamic systems. Thus, the dynamics of neurobiological
systems can be analyzed using control (or dynamics systems) theory.
In addition to these main principles, the following addendum is taken to be im-
portant for analyzing neural systems:
• Neural systems are subject to significant amounts of noise. Therefore, any
analysis of such systems must account for the effects of noise.
2 Fora quantitative statement of the three principles, see Eliasmith and Anderson (2003, pp.
230-231).
CHAPTER 2. AN INTRODUCTION TO BRAIN BUILDING 50
2.3.1 Representation
A central tenet of the NEF is that we can adapt the information theoretic ac-
count of codes to understanding representation in neural systems. Codes, in en-
gineering, are defined in terms of a complimentary encoding and decoding proce-
dure between two alphabets. Morse code, for example, is defined by the one-to-
one relation between letters of the Roman alphabet, and the alphabet composed of
a standard set of dashes and dots. The encoding procedure is the mapping from
the Roman alphabet to the Morse code alphabet and the decoding procedure is its
inverse (i.e., mapping dots and dashes to the Roman alphabet).
In order to characterize representation in a neural system, we can identify
the relevant encoding and decoding procedures and their relevant alphabets. The
encoding procedure is straightforward to identify: it is the mapping of stimuli into
a series of neural spikes. Indeed, encoding is what neuroscientists typically talk
about, and encompasses many of the mechanisms I have discussed in section 2.1.
When we show a brain a stimulus, some neurons or other “fire” (i.e. generate
action potentials). This pattern of firing is often depicted by neuroscientists as
a “spike raster” (see the spike rasters in figure 2.7 for a simple example). Spike
rasters show the time of occurrence of an action potential from a given neuron – in
essence, the only response to the stimulus that it sends to other neurons. The raster
CHAPTER 2. AN INTRODUCTION TO BRAIN BUILDING 51
in figure 2.7 shows responses from just two neurons, while figure 2.7 shows the
responses from thirty. Often, rasters will show the responses of the same neuron
over many different trials to demonstrate the variability of the response to the
same stimulus, although I will typically show the responses of many neurons on
one trial.3 The precise nature of this encoding from stimulus to spikes has been
explored in-depth via quantitative models (see Appendix A.1.1).
Unfortunately, neuroscientists often stop here in their characterization of rep-
resentation, but this is insufficient. We also need to identify a decoding procedure,
otherwise there is no way to determine the relevance of the purported encoding
for “downstream” parts of the system. If no information about the stimulus can be
extracted from the spikes of the encoding neurons, then it makes no sense to say
that they represent the stimulus. Representations, at a minimum, must potentially
be able to “stand-in” for the things they represent. As a result, characterizing the
decoding of neural spikes into the physical variable they represent is as important
as characterizing the process of encoding physical variables into neural spikes.
Quite surprisingly, despite typically nonlinear encoding (i.e., mapping a con-
tinuously varying parameter like stimulus intensity into a series of discontinuous
spikes), a good linear decoding can be found.4 To say that the decoding is “linear”
means that the responses of neurons in the population are weighted by a constant
(the decoder) and summed up to give the decoding. There are several established
methods for determining appropriate linear decoders given the neural populations
that respond to certain stimuli (see Appendix A.1.2 for the one I will use in this
book).
There are two aspects to the decoding of the neural response that must be
considered. These are what I call the “population” and “temporal” aspects of
decoding. The “population” decoding accounts for the fact that a single stimulus
variable is typically encoded by many different (i.e. a population of) neurons. The
“temporal” decoding accounts for the fact that neurons respond in time, to a typ-
ically changing stimulus. Ultimately, these two aspects determine one combined
decoding. However, it is conceptually clearer to consider them one at a time.
As depicted in figure 2.5, population decoders are determined by finding the
weighting of each neuron’s tuning curve, so that their sum represents the input
3 Because we can control the trial-to-trial variability of our single neuron models, we often
keep it low to be able to better see the typical response. Nevertheless, variability is often central
to characterizing neural responses, and it can be captured effectively in Nengo using the available
noise parameters (see e.g., ref???laubach paper). And, as mentioned earlier, variability and noise
are central to the formulation of the NEF principles.
4 This is nicely demonstrated for pairs of neurons in (Rieke et al., 1997, pp. 76-87).
CHAPTER 2. AN INTRODUCTION TO BRAIN BUILDING 52
signal over some range. That is, each neuron is weighted by how useful it is for
carrying information about the stimulus in the context of the whole population.
If it’s very useful, it has a high weight, if not it has a lower weight. Finding the
exact decoding weights in the NEF is accomplished by solving an optimization
problem to minimize the representational error (see Appendix A.1.2). To be clear,
each neuron’s decoder is equal to the value of its weight, and it is the sum of the
weighted responses that gives the population decoding.
As can be seen in figure 2.5, as more neurons are added to the population,
the quality of the representation improves. This is because the neuron responses
are nonlinear, but the ideal representation is linear (specifically, the estimate of
the input should exactly equal the input). Once the decoders have been found,
they remain fixed for any input value. So again, it is clear from this figure that
different inputs will be encoded with different accuracies (as can be inferred from
the “ripples” in the thick black lines of figure 2.5). Typically, the input will be
changing over time, so we must also determine how to decode over time.
Figures 2.6 and 2.7 depict temporal decoding for different numbers of neurons.
For the NEF, temporal encoding and decoding are determined by the biophysics
of cellular communication. In short, the postsynaptic current (PSC) discussed in
section 2.1, is used to convert spikes generated by a model of a single cell into
an estimate of the input signal over time (see figure 2.6). Specifically, the spike
patterns of the neurons being driven by the input are decoded by placing a PSC at
each spike time and summing the result. More precisely, the “on” neuron PSCs are
multiplied by +1 and the “off” neuron PSCs are multiplied by -1 (this is a linear
decoding since we are multiplying a constant function (the PSC) by a weight (±1)
and summing).
When we apply this same principle to many neurons, as in figure 2.7, the set
of PSCs induced by a given neuron need a specific weight, which is precisely the
population decoder discussed earlier. Combining these decoding weights with a
standard PSC model completely defines a linear “population-temporal” decoder.
In this way, many neurons can “cooperate” to give very good representations of
time-varying input signals. Notice that adding additional neurons accomplishes
two things. First, it allows the nonlinearities of the tuning curves to be linearized
as previously discussed with respect to population decoding. Second, it allows
the very “spiky” decoding in figure 2.6 to become smoothed as the PSCs are
more evenly distributed over time. These concepts are illustrated in the tutorial
described in section 2.5.
Having specified the encoding and decoding used to characterize time-varying
representation, we still need to specify the relevant alphabets. While the specific
CHAPTER 2. AN INTRODUCTION TO BRAIN BUILDING 53
LIF Neuron Response Curves — 2 Neurons Weighted Response Curves and Reconstructed Output — 2 Neurons
100 1
90 0.8
80 0.6
Firing Rate (Spikes per Second)
70 0.4
Decoded X Value
60 0.2
50 0
40 −0.2
30 −0.4
20 −0.6
10 −0.8
0 −1
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
Input X Value Input X Value
LIF Neuron Response Curves — 8 Neurons Weighted Response Curves and Reconstructed Output — 8 Neurons
400 1.5
350
1
300
Firing Rate (Spikes per Second)
250
Decoded X Value
0.5
200
0
150
100
−0.5
50
0 −1
−1 −0. 5 0 0. 5 1 −1 −0. 5 0 0. 5 1
Input X Value Input X Value
LIF Neuron Response Curves — 30 Neurons Weighted Response Curves and Reconstructed Output — 30 Neurons
400 1
0.8
350
0.6
300
Firing Rate (Spikes per Second)
0.4
250
Decoded X Value
0.2
200 0
−0.2
150
−0.4
100
−0.6
50
−0.8
0 −1
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
Input X Value Input X Value
Figure 2.5: Finding decoders for a set of neurons. This set of figures demonstrates
how neuron tuning curves can be individually weighted by decoders to approxi-
mate an input variable, x, over a range of values. The left column shows the
tuning curves of the neurons. The right column shows linearly weighted versions
of these curves in gray and the sum of these weighted curves as a thick black line.
As the number of neurons is increased, the approximation to ideal representation
(the straight dashed line) improves.
CHAPTER 2. AN INTRODUCTION TO BRAIN BUILDING 54
1.0 decoding
80
Firing Rate (spikes/s)
Neuron 2
0.5
60
Value
0.0
40
Neuron 1
-0.5
20 -1.0
0 -1.5
−1 −0.5 0 0.5 1 1 2 1 1.2 1.4 1.6 1.8 2
X Time (s) Time (s)
Figure 2.6: Encoding and decoding of an input signal by two neurons. The neu-
rons fire at rates specified by their tuning curves, shown here to be symmetric
about zero. With a sinusoidally varying input, these tuning curves cause the neu-
rons to fire as shown by the black vertical lines in the spike raster. The contin-
uous black lines in the middle plot show the postsynaptic currents that would be
generated in the dendrites of a receiving neuron by the spike rasters. In the last
panel, the PSCs are summed to give an estimate (black) of the original input signal
(gray). The estimate is poor with two neurons, but can be made much better with
additional neurons (see figure 2.7).
300
0.5
250
Value
200 0.0
150
-0.5
100
-1.0
50
0 -1.5
-1 -0.5 0 0.5 1 1 2 1 1.2 1.4 1.6 1.8 2
X Time (s) Time (s)
cases will diverge greatly, we can describe the alphabets generally: neural re-
sponses (encoded alphabet) code physical properties (decoded alphabet). Slightly
more specifically, the encoded alphabet is the set of temporally patterned neural
spikes over populations of neurons. This is reasonably uncontroversial.
However, it is much more difficult to be specific about the nature of the al-
phabet of physical properties. We can begin by looking to the physical sciences
for categories of physical properties that might be encoded by nervous systems.
Indeed, we find that many of the properties that physicists traditionally use to de-
scribe the physical world do seem to be represented in nervous systems: there
are neurons sensitive to displacement, velocity, acceleration, wavelength, temper-
ature, pressure, mass, etc. But, there are many physical properties not discussed
by physicists that also seem to be encoded in nervous systems: such as red, hot,
square, dangerous, edible, object, conspecific, etc.5 It is reasonable to begin with
the hypothesis that these “higher-level” properties are inferred on the basis of rep-
resentations of properties more like those that physicists talk about.6 In other
words, encodings of “edible” depend, in some complex way, on encodings of
“lower-level” physical properties like wavelength, velocity, etc. The NEF itself
does not determine precisely what is involved in such complex relations, although
I do suggest that it provides the necessary tools for describing such relations. I
return to these issues – related to the meaning (or “semantics”) of representations
– throughout much of the book, starting in chapter 3.
For now, we can be content with the claim that whatever is represented, it
can be described as some kind of structure with units. A precise way to describe
structure is to use mathematics. Hence, this is equivalent to saying that the de-
coded alphabet consists in mathematical objects with units. The first principle of
5I am allowing any property reducible to or abstractable from physical properties to count as
physical. My suspicion is that this captures all properties, but that is a conversation for another
book. Notice, however, that many typical physical properties, like temperature, are both abstracted
from lower-level properties and are part of physics. So, abstracting from physical properties does
not make properties non-physical. In philosophical terms, this is the claim that properties “super-
vening” on physical properties are physical.
6 Clearly, this hypothesis can be wrong. It may turn out that no standard physical properties are
directly encoded by neurons, but in the end this probably does not matter. We can still describe
what they encode in terms of those properties, without being inaccurate (though our terminology
may be slightly cumbersome; i.e. we may speak of “derivatives-of-light-intensity”, and so on). In
short, it makes sense to begin to describing the properties encoded by neurons in terms of familiar
physical properties, because so much of our science is characterized in terms of those properties,
we are familiar with how to manipulate those properties, and in the end we need a description that
works (not necessarily the only, or even best possible, description).
CHAPTER 2. AN INTRODUCTION TO BRAIN BUILDING 56
recurrent layer
control
feedback
input
memory
input
Figure 2.8: The architecture of a controlled integrator. This network can act like
a loadable and erasable memory. Both the control input and the memory input
are connected to the recurrent layer of neurons and the feedback to the recurrent
layer is the product of both inputs. The NEF allows the neural connection weights
in this network of spiking neurons to be determined, once we have defined the
desired representations, computations, and dynamics of the system.
the NEF, then, provides a general characterization of the encoding and decoding
relationship between mathematical objects with units and patterns of spikes in
populations of neurons.
To make this characterization more concrete, let us turn to considering the
example of the controlled integrator (see figure 2.8).
One of the simplest mathematical objects is a scalar. We can characterize
the horizontal position of an object in the environment as a scalar, whose units are
degrees from mid-line. There are neurons in a part of the monkey’s brain called the
lateral intraparietal (LIP) cortex, that are sensitive to this scalar value (Andersen
et al., 1985). Indeed, these parts of the brain seem to act as a kind of memory for
object location, as they are active even after the object has disappeared. I should
be clear that I do not think a simple controlled integrator that remembers only a
scalar value maps well to these specific neurons, but a similar architecture with
a more complex representation has been used to model many properties of LIP
activity (Eliasmith and Anderson, 2003). What is relevant for this example is that,
as a population, neurons in this area encode an object’s position over time.
To summarize, the representation of object position can be understood as a
scalar variable, whose units are degrees from mid-line (decoded alphabet), that
is encoded into a series of neural spikes across a population (encoded alphabet).
Using the quantitative tools mentioned earlier, we can determine the relevant de-
coder (see appendix A.1.2). Once we have such a decoder, we can then estimate
CHAPTER 2. AN INTRODUCTION TO BRAIN BUILDING 57
what the actual position of the object is given the neural spiking in this popula-
tion, as in figure 2.7. Thus we can determine precisely how well, or what aspect
of, the original property (in this case the actual position) is represented by the
neural population. We can then use this characterization to understand the role
that the representation plays in the system as a whole.
One crucial aspect of this principle of representation, is that it can be used to
characterize arbitrarily complex representations. The example I have described
here is the representation of a scalar variable. However, this same principle ap-
plies to representations of vectors (such as movement or motion vectors found in
motor cortex, brainstem, cerebellum, and many other areas), representations of
functions (such as stimulus intensity across a spatial continuum as found in au-
ditory systems, and many working memory systems), representations of vector
fields (such as the representation of a vector of intensity, color, depth, etc. at each
spatial location in the visual field as found in visual cortex), and representations of
composable symbol-like objects (as argued for throughout this book). Suggesting
that one of these kinds of representation can be found in a particular brain area,
crucially, does not rule out the characterization of the same areas in terms of other
kinds of representations. This is because we can quantitatively define a “represen-
tational hierarchy” that relates such representations to one another. I return to this
issue in section 2.4.
A second crucial aspect of this principle is that it distinguishes the mathemati-
cal object being represented from the neurons that are representing it. I refer to the
former as the “state space”, and the latter as the “neuron space”. There are many
reasons it is advantageous to distinguish the neuron space from the state space.
Most such advantages should come clear in subsequent chapters, but perhaps most
obviously, distinguishing these spaces naturally accounts for the well-known re-
dundancy found in neural systems. Familiarity with Cartesian plots makes us
think of axes in a space as being perpendicular. However, the common redun-
dancy found in neural systems suggests that their “natural” axes are in fact not
perpendicular, but rather slanted towards one another (this is sometimes called an
“overcomplete” representation). Distinguishing the state space, where the axes
typically are perpendicular, from the neuron space, where they are not, captures
this feature of neurobiological representation.
A third crucial aspect of this principle is that it embraces the heterogeneity
of neural representation. There are no assumptions about the encoding neurons
being similar in any particular respect. In other work, we have shown that such
a representation is nearly as good as an optimally selected representation, both
in terms of capacity and accuracy Eliasmith and Anderson (2003, pp. 206-217).
CHAPTER 2. AN INTRODUCTION TO BRAIN BUILDING 58
And, such a representation does not need mechanisms for optimal selection of
new neurons (the neural properties are simply randomly chosen, as in figure 2.7).
In practice, this means that the typically heterogeneous tuning curve data from
a particular neural system can be matched in a model (since the principle puts
no constraints on tuning curve distribution), and then optimal decoding can be
subsequently determined.
The tutorial at the end of this chapter demonstrates how to build and interact
with simulations of heterogeneous scalar and vector representations in Nengo.
The central features of the representation principle are further highlighted there
with concrete examples.
2.3.2 Transformation
LIF Neuron Response Curves — 2 Neurons Weighted Response Curves and Decoded Output — 2 Neurons
100 0.4
Firing Rate (Spikes per Second)
80
0.3
Decoded X Value
60
0.2
40
0.1
20
0 0
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
Input X Value Input X Value
LIF Neuron Response Curves — 8 Neurons Weighted Response Curves and Decoded Output — 8 Neurons
400 1.5
Firing Rate (Spikes per Second)
1
300
Decoded X Value
0.5
200
0
100
−0.5
0 −1
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
Input X Value Input X Value
LIF Neuron Response Curves — 30 Neurons Weighted Response Curves and Decoded Output — 30 Neurons
400 1.5
1
300
Firing Rate (Spikes per Second)
0.5
Decoded X Value
200 0
−0.5
100
−1
0 −1.5
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
Input X Value Input X Value
Figure 2.9: A nonlinear function of the input computed over time. The left column
shows the tuning curves of the neurons which are identical to the curves shown in
figure 2.5. The right column shows linearly weighted versions of these curves in
gray and the sum of these weighted curves as a thick black line. As can be seen,
the choices of linear weights used result in a quadratic function over the range of
-1 to 1. Under this decoding scheme, the network is thus computing x2 .
CHAPTER 2. AN INTRODUCTION TO BRAIN BUILDING 60
5
a) 1.5
input
4
1 integral
3
0.5
input
2
0
1
-0.5 0
0 2 4 6 8 10 0 2 4 6 8 10
Time (s) Time (s)
b) 1.5 2.5
2
1
Input Value
1.5
0.5
1
input
0
0.5
-0.5 0
0.5
0 -0.5
Control Value
-0.5 -1 input
-1 control control
-1.5
-1.5 integral
-2
-2
-2.5 -2.5
0 2 4 6 8 10 0 2 4 6 8 10
Time (s) Time (s)
incoming signal and the encoding of the outgoing signal. Practically speaking, this
means that changing a connection weight both changes the transformation being
performed and the tuning curve of the receiving neuron. As is well known from
both connectionism and theoretical neuroscience, this is exactly what happens in
such networks. In essence, the encoding/decoding distinction is not one that neu-
robiological systems need to respect in order to perform their functions, but it is
extremely useful in trying to understand such systems and how they do, in fact,
manage to perform those functions. Consequently, decoders – both transforma-
tional and representational – are theoretical constructs. The only place something
like decoding actually happens is at the final outputs of the nervous system, such
as at the motor periphery.
2.3.3 Dynamics
a)
u(t) x(t)
B ∫
A
b)
u(t) x(t)
hsyn (t)
In many ways, this simple integrator is like a memory. After all, if there is no
input (i.e., u(t) = 0) then the state of the memory will not change. This captures
the essence of an ideal memory – its state stays constant over time with no input.
However, as noted in figure 2.11, neuron dynamics are not well-characterized
by perfect integration. Instead, they have dynamics identified by the function
hsyn (t) in that figure. Specifically, this term captures the postsynaptic current
(PSC) dynamics of the neurons in question, several examples of which are shown
in figure 2.12. As mentioned earlier, these dynamics depend on the kind of neuro-
transmitter being used. As a result, the “translation” from a control theoretic de-
scription to its neural equivalent must take these differences into account. Specif-
ically, that translation must tell us how to change A and B in figure 2.11a into A0
and B0 in figure 2.11b, given the change from integration to hsyn (t).
For example, if we apply this translation to the neural integrator, the feedback
matrix A0 is equal to 1 and the input matrix B0 is equal to the time constant of
the PSC. This makes sense because the dynamics of the neural system decay at
a rate captured by the PSC time constant, so the feedback needs to “remind” the
system of its past value at a similar rate. Setting A0 and B0 to these values will
result in behavior just like that described for a perfect integrator in equation 2.2.
CHAPTER 2. AN INTRODUCTION TO BRAIN BUILDING 65
Normalized Current
AMPA (τs ≈ 5ms)
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
Time(s)
that A is negative. In this case, the opposite happens. The current state will move
exponentially towards 0. This is because the next state will be equal to itself minus
some fraction of the current state.
To build a controlled integrator that acts as an erasable memory, we can thus
implement the dynamical system in equation 2.1, with an additional input that
controls A. We can use principle 3 to determine how to map this equation into
one that accounts for neural dynamics. We then need to employ principle 2 to
compute the product of A and x(t) to implement this equation (as in the previous
section). And, we need to employ principle 1 to represent the state variable, the
input variable, and the control variable in spiking neurons. The result of employ-
ing all three principles is shown in figure 2.13. Ultimately, the principles tell us
how to determine the appropriate connection weights between spiking neurons in
the model (recurrent or otherwise), to give the desired dynamical behavior.
It is perhaps worth emphasizing that while the controlled integrator is simple
– it computes only products and sums, it represents only scalars and a 2D vector,
and it has nearly linear time-invariant dynamics – it employs each of the princi-
ples of the NEF. Furthermore, there is nothing about these “simplicities” that are a
consequence of the principles employed. NEF models can compute complex non-
linear functions, have sophisticated high-dimensional representations, and display
more interesting, nonlinear dynamics.
It is perhaps worth pointing out here that such sophistication, while avail-
able in the NEF, is not necessary for giving us a deeper understanding of neural
function. For example, Ray Singh and I used two simple (non-controlled) neural
integrators, coupled them together, and provided a novel explanation for observed
spiking patterns in monkey cortex during a working memory task (Singh and Elia-
smith, 2006). Specifically, the first integrator integrated a brief input stimulus to
result in a constant output (i.e. a memory of the input), and the second integra-
tor integrated the output of the first, to give a “ramp” (i.e. a measure of how
long since the input occurred). We demonstrated that projecting these two outputs
onto a population of cells representing a 2-dimensional vector gave all of the ob-
served classes of spiking single-cell responses in a working memory experiment
performed on monkey frontal cortex, including responses with unusual dynamics
(Romo et al., 1999a). A recent analysis of a similar, but larger data set verified
that this kind of model – one with linear dynamics in a higher-dimensional space
– captures 95% of the observed responses of the over 800 observed neurons (there
were 6, not 2 dimensions required for this model; Machens et al., 2010). Another
very similar model (which uses coupled controlled integrators) captures the popu-
lation dynamics, and single cell spiking patterns observed in rat medial prefrontal
CHAPTER 2. AN INTRODUCTION TO BRAIN BUILDING 67
1.5 2.5
1 2
Input Value
input 1
0
Decoded Value
0.5
-0.5
0
0.5
0
-0.5
Control Value
-0.5 -1
-1 control
-1.5
-1.5
-2 -2
-2.5 -2.5
0 2 4 6 8 10 0 2 4 6 8 10
Time (s) Time (s)
cortex similarly well, though on a very different task (Laubach et al., 2010). In
both cases, a surprisingly good explanation of the complex, detailed spiking pat-
terns found in “higher” cortical areas (like the prefrontal cortex) was discovered
despite the simplicity of the model.
In many ways, it is the successes of these simpler models, built using the prin-
ciples of the NEF, that make it reasonable to explore the available generalizations
of those principles. Ultimately, it is this generality, I believe, that puts the NEF in a
unique position to help develop and test novel hypotheses about biological cogni-
tion. Developing and testing one such hypothesis is the purpose of the remainder
of this book, beginning with the next chapter.
hsyn (t)
outgoing
incoming temporal dynamics spikes
decoding
spikes decoding matrices
encoding soma
...
...
...
...
state space
description
PSCs synaptic weights
dendrites
neural description
recurrent
connections
the neural integrator as an example of the system shown in figure 2.11. There,
the A0 matrix specifies the dynamics matrix which connects outgoing spikes of
the population back to itself (the recurrent connection in figure 2.14, equal to 1
for the integrator). The B0 matrix specifies the dynamics matrix that connects the
incoming spikes to this population (equal to τsyn for the integrator). Associated
with both of these dynamics matrices are the decoding and encoding elements,
which in this case are the same and specified by the neuron responses (encoders)
and the optimal linear weights for estimating the input (decoders). Finally, the
dynamics of the system are capture by the PSC model being used (examples are
shown in figure 2.12). Having specified all of these elements allows us to generate
the synaptic connection weights necessary to implement the defined higher-level
dynamical system in a population of spiking neurons.
In this way, the subsystem in figure 2.14 captures the contributions of the prin-
ciples. That is, together the principles determine the synaptic weights. That de-
termination depends on the contributions of each of the principles independently:
the representation principle identifies encoders and representational decoders; the
transformation principle identifies transformational decoders; and the dynamics
principle identifies the dynamics matrices (i.e. the matrices identified in figure
2.11). The synaptic weights themselves are the product of these elements.
Crucially, while this theoretical characterization of the subsystem is generic,
its application is not. To determine decoders, the specific tuning curves of neurons
in the circuit play a role. To determine dynamics, the kinds of neurotransmitters
found in a given circuit are crucial. To determine what kind of spiking occurs, the
choice of a single cell model is important. To determine encoders, both the single
cell model and the tuning curves need to be known. To determine transforma-
tional decoders and dynamics, a high-level hypothesis about the system function
is needed. All of these considerations can vary widely depending on which brain
areas are being considered.
I mentioned several examples of the broad application of the NEF at the begin-
ning of section 2.3. As noted, these applications have focused on detailed neural
models. A benefit of this focus has been that the NEF has been used to account
for and predict a myriad of detailed neural properties, such as changes in single
cell tuning curves under different environmental circumstances (Conklin and Elia-
smith, 2005), the variety of single cell dynamics observed during working memory
(Singh and Eliasmith, 2006) and other delayed activity tasks (ref Laubach???), re-
covery from ablation of single cells, the effects of systematic input perturbation,
and the normal functioning of the neurobiological integrator controlling eye posi-
tion (ref macneill???) among many others. In short, the credentials of the NEF as
CHAPTER 2. AN INTRODUCTION TO BRAIN BUILDING 71
a method for generating models with close connections to detailed neural data are
good.
As well, these applications make it clear that the NEF bears little resemblance
to traditional connectionism: NEF neurons spike, they are highly heterogeneous,
they have a variety of different dynamics, they are differentially affected by differ-
ent neurotransmitters – they are not “units” or “nodes”, but neurons. In addition,
the NEF does not rely on learning to design models (although learning can be
included if appropriate). Consequently, the NEF can be used to construct arbi-
trarily “deep” (i.e. with any number of layers), and arbitrarily connected models,
without running afoul of the many difficulties of learning such a model. This is
because NEF networks can be designed from the top down (given a functional hy-
pothesis, how might neurons be organized to realize that function). This top-down
approach has been used in all of the previously mentioned applications of the NEF
to specific systems.
One drawback of these biologically focused applications of the NEF is that the
kinds of functions addressed are simple, making it less than obvious how the NEF
might be relevant for understanding complex behavior. It has been suggested that
a detailed understanding of neural implementation is completely irrelevant for un-
derstanding cognition (Fodor and Pylyshyn, 1988b). I, and many others, disagree,
though I won’t argue the point here. But if we disagree, we must also admit that it
is not obvious what the relation is between these simple neural systems and cog-
nition, and presumably we want to understand the cognitive forest as well as the
neural trees.
Before confronting the task of applying the NEF to biological cognition in
the next chapter, I believe it is conceptually important to clarify a notion I will be
using throughout the remainder of the book. A notion I have been using somewhat
loosely to this point: the notion of “levels”.
2.4 Levels
At one point in the development of science, it was widely thought that the sciences
could be identified with the level of nature which they characterized. Physics was
the most fundamental, followed closely by chemistry, and subsequently biology,
psychology, sociology and so on (Oppenheim and Putnam, 1958; Carnap, 1931;
Hempel, 1966). The suggestion was that lower-level sciences could reductively
describe the higher-level sciences. Societies, after all, are composed of people,
who are made up of biological parts, which are largely driven by chemical pro-
cesses, which can be understood as interactions between the fundamental parts of
CHAPTER 2. AN INTRODUCTION TO BRAIN BUILDING 72
nature. This uncomplicated view has unfortunately not withstood the test of time
(Bechtel, 1988, 2008; Craver, 2007). However, the notion that there are “levels” of
some kind has proven too useful to discard. Instead it has become a perennial and
complex problem to determine what, exactly, the relation between the sciences,
and the entities they talk about, is (Fodor, 1974; Wilson; Batterman, 2002).
Jerry Fodor has famously suggested that, since the reduction of one science
to another has failed, the sciences must be independent (Fodor, 1974). Conse-
quently, he has argued that to understand cognitive systems, which lie in the do-
main of psychology, appeal to lower-level sciences is largely useless. The only
thing such disciplines, including neuroscience, can do is to provide an implemen-
tational story that bears out whatever psychological theory has been independently
developed. There are many difficulties with such a view, but the one that stands
out in the present context is that cleaving neuroscience from psychology simply
throws out an enormous amount of empirical data about the system we are at-
tempting explain. If we were certain that we could achieve an adequate, purely
psychological-level description of the system, this strategy might be defensible.
However, we are in no such position.
There are other possible relations between levels other than the extremes of
their being reducible to one another, or their being independent. I suspect that
the reason that both the reducibility and the independence views seem implausi-
ble is because they share a very strong interpretation of the term “level” in na-
ture. Specifically, both assume that such levels are ontological (Oppenheim and
Putnam, 1958). That is, that the levels themselves are intrinsic divisions in the
structure of the natural world. The reason we might assume we can reduce the-
ories about people to theories about particles, is because people seem ultimately
describable as a bunch of particles stuck together in some (perhaps very complex)
way. The reason we might assume theories about people are independent of the-
ories about particles, is because we have given up on reduction, but still believe
that both people and particles are equally real and so believe that both kinds of
theory are equally scientific.
If, instead, we think of levels from an epistemological perspective, that is, as
different descriptions of a single underlying system, neither view seems plausible.
Sometimes we may describe a system as a person without much consideration of
their relationship to individual particles. Other times we may be concerned about
the decomposition a biological person, leading to what looks like a kind of re-
ductive description. The reasons for preferring one such description over another
can be largely practical: in one case we need to predict the overall behavior of the
whole system; in another we need to explain how changes in a component influ-
CHAPTER 2. AN INTRODUCTION TO BRAIN BUILDING 73
ence complex interactions. In either case, we need not take these different per-
spectives on the underlying system as something we need reify (i.e., make real),
especially at the expense of other perspectives. Rather, we can understand levels
to be kinds of description that are chosen for certain explanatory and predictive
purposes.
Crucially, this does not mean that levels are in any problematic sense “made
up”: they can still be right and wrong (depending on whether they are consis-
tent with empirical evidence), and they can still be about real things (i.e., things
whose existence does not depend on our existence). Instead, “levels as kinds of
description for purposes” are flexible in a manner that acknowledges our limited
intellectual capacities, or maybe simply our limited knowledge. It is sometimes
important to make assumptions or simplifications in order to make predictions
with our theoretical descriptions. In psychology, we may talk about people as if
they were unchanging sets of particles because most of our predictions are suffi-
ciently accurate regardless of that assumption. The psychological level may thus
pick out descriptions that employ a consistent set of assumptions which are effec-
tive at maintaining the explanatory and predictive power of those descriptions..
In short, I have suggested that levels are pragmatically identified sets of de-
scriptions that share assumptions. Perhaps it is useful to call this position “de-
scriptive pragmatism” for short. Importantly, I think descriptive pragmatism does
a reasonable job of capturing a variety of intuitions about “levels” evident in sci-
entific practice. For example, descriptive pragmatism can help us understand why
levels, spatial scale, and complexity seem inter-related. Consider: as we increase
the number of objects in a system, we need to increase the amount of space nec-
essary to encapsulate the system, and the number of possible interactions goes
up rapidly. So, given our constant intellectual capacity, typically, larger spatial
scales allow for more complexity and hence concurrently demands higher-level
descriptions to address the behavior of the entire system. And, more complexity
often means more assumptions need to be made to keep the description simple
enough for it to be practical to us. A lower-level description of the entire sys-
tem will require fewer assumptions, but soon outstrip our intellectual capacity to
represent (which is perhaps why we often turn to simulations for our lower-level
descriptions).
Similarly, this view of levels is consistent with the notion that part/whole re-
lations, especially in mechanisms, often help pick out levels. Most systems com-
posed of parts have parts that interact. If the parts of a system are organized
in such a way that their interaction results in some regular phenomena, they are
called “mechanisms” (Machamer et al., 2000). Mechanisms are often analyzed
CHAPTER 2. AN INTRODUCTION TO BRAIN BUILDING 74
by decomposing them into their simpler parts, which may also be mechanisms
(Bechtel, 2005). So, within a given mechanism, we again see a natural correspon-
dence between spatial scale, complexity and levels. Note, however, that even if
two distinct mechanisms can be decomposed into lower level mechanisms, this
does not suggest that the levels within each mechanism can be mapped across
mechanisms (Craver, 2007). This, again, suggests that something like descriptive
pragmatism better captures scientific practice than more traditional ontological
views (Bechtel, 2008), as it allows different practical considerations to be used to
identify decompositions across different mechanisms.
Descriptive pragmatism about levels may initially seem to be a more arbitrary
characterization of levels but, there is no reason not to demand systematic identi-
fication and quantification of the relations between identified levels. The descrip-
tions of inter-level relations we employ can still be mathematical, after all. In fact,
mathematical descriptions can help clarify the assumptions made when identify-
ing levels. So, for example, we might talk in terms of mathematical objects that
represent concepts instead of mathematical objects that capture individual neu-
ron function because successful explanation or prediction of certain behavior can
proceed in terms of concepts without detailed consideration of individual neuron
function. The nature of “without detailed consideration” is specified by assump-
tions that underly our mathematical theory of concept representation that relates
it to individual neuron function. And, identifying those assumptions is paramount
to identifying the level of description being employed.
I should note that I in no way take this brief discussion to sufficiently specify
a new characterization of levels. There are many, much deeper considerations
of levels that I happily defer to (e.g., Bechtel, 2008; Craver, 2007; Hochstein,
2011). My purpose is to clarify what general characterization of levels lies behind
my use of the term throughout the book, so as to avoid confusion about what I
might mean. The main point to be taken from this discussion is simply that we
need not think of levels as being either reductive (Oppenheim and Putnam, 1958)
or independent (Fodor, 1974). There are accounts, instead, which identify levels
with something more like degrees of descriptive detail.
With this background in mind let me return to specific consideration of the
NEF. As described in section 2.3.1 on the principle of representation, the principle
applies to the representation of all mathematical objects. Since such objects can
be ordered by their complexity, we have a natural and well-defined meaning of a
representational hierarchy: a hierarchy whose levels can be understood as kinds
of descriptions with specific inter-relations. Table 2.1 provides the first levels of
such a hierarchy.
CHAPTER 2. AN INTRODUCTION TO BRAIN BUILDING 75
Table 2.1: A representational hierarchy. Each row has the same type of encod-
ing/decoding relations with neurons. Higher rows can be constructed out of linear
combinations of the previous row.
Mathematical Object Dimension Example
Scalar (x) 1 Light intensity at x, y
Vector (x) N Light intensity, depth, color, etc. at x, y
Function (x(ν)) ∞ Light intensity at all spatial positions
Vector Field (x(r, ν)) N ×∞ Light intensity, depth, color, etc. at all
spatial positions
.. .. ..
. . .
chy itself lies within the state space half of this distinction. The relation between
the neuron and state space levels (i.e., nonlinear encoding and linear decoding is
different than that between levels within the representational hierarchy (i.e. per-
fect linear encoding and decoding). But, both are quantitatively defined, and the
increase of levels with spatial scale, complexity, and part/whole relations still ob-
tains. Consequently, it still makes sense to talk of all of these as levels of the same
system.
The fact that all of these levels can be described quantitatively and in a stan-
dard form suggests that the NEF characterization provides a unified means of de-
scribing neurobiological systems. In addition, it makes clear how we can “move
between” levels, and precisely how these levels are not independent. They are not
independent because empirical data that constrains one description is about the
same system described at a different level. So, when we move between levels, we
must either explicitly assume away the relevance of lower (or higher) level data,
or we rely on the hypothesized relationship between the levels to relate that data
to the new level. Either way, the data influences our characterization of the new
level (since allowable assumptions are part and parcel of identifying a level).
One final subtlety in the NEF characterization of levels in neural systems is
worth noting. Specifically, it is consistent with this characterization that what
we may naturally call a high-level neural representation (e.g., the representation
of phonemes), is a relatively simple mathematical object (e.g., a scalar specify-
ing only one of forty possible values). On the face of it, this characterization of
phonemes as simple mathematical objects seems to contradict my earlier claim
that, in general, higher-level representations are more complex mathematical ob-
jects. However, this observation misses the fact that the complexity of the repre-
sentation has been “shifted” into the units of the object. That is, phonemes them-
selves are very complicated objects that require many sophisticated computations
to extract (they depend not only on complex interactions of frequencies, but au-
ditory and motor contexts). So the fact that the unit of the object is “phoneme”
captures enormous amounts of complexity. Consequently, the representational
hierarchy applies in cases when the units of the component representations stay
constant (as in the example provided in table 2.1). If we change units, we may still
proceed “up” levels even though the mathematical objects themselves get simpler.
Of course, in such a case, the relationship between units at different levels must
also be specified to properly characterized the relationship between these levels.
This relationship is most often a transformational one, and hence not one that the
NEF specifies in general (recall from section 2.3 that the NEF is about how, not
what, neural systems compute).
CHAPTER 2. AN INTRODUCTION TO BRAIN BUILDING 77
So, the NEF provides a consistent and general way to talk about levels in the
behavioral sciences. However, only some of the inter-level relations are defined
by the principles of the framework. This seems appropriate given our current state
of ignorance about how best to decompose the highest-level neural systems. In the
next chapter, I begin to describe a specific architecture that addresses what kinds
of functions brains may actually be computing in order to underwrite cognition.
The NEF provides a method for specifying that description at many levels of de-
tail, while the architecture itself helps specify additional inter-level relations not
addressed by the NEF.
Representing a scalar
We begin with a network that represents a very simple mathematical object, a
scalar value. As in the last tutorial, we will use the interactive mode to examine
the behavior of the running model. Let us begin.
• In an empty Nengo world, drag a Network component from the template bar
into the workspace. Set the Name of the network to ‘Scalar Representation’
and click OK.
• Drag an Ensemble into the network. A configuration window will open.
Here the basic features of the ensemble can be configured.
• Set Name to ‘x’, Number of nodes to 100, Dimensions to 1, Node factory to
‘LIF Neuron’, and Radius to 1.
The name is a way of referring to the population of neurons you are creating. The
number of nodes is the number of neurons you would like to have in the popula-
tion. The dimension is the number of different elements in the vector you would
CHAPTER 2. AN INTRODUCTION TO BRAIN BUILDING 78
like the population to represent (a 1 dimensional vector is a scalar). The node fac-
tory is the kind of single cell model you would like to use. The default is a simple
leaky integrate-and-fire (LIF) neuron. Finally, the radius determines the size of
the n-dimensional hypersphere that the neurons will be good at representing. A
1-dimensional hypersphere with a radius of 1 is the range from -1 to 1 on the real
number line. A 2-dimensional hypersphere with the same radius is a unit circle.
• Click the Set button. In the panel that appears, you can leave the defaults
(tauRC is 0.02, tauRef is 0.002, Max rate low is 100 high is 200, Intercept
is low -1.0 and high is 1.0).
Clicking on Set allows the parameters for generating neurons in the population to
be configured. Briefly, tauRC is the RC time constant for the neuron membrane,
usually 20ms, tauRef is the absolute refractory period (the period during which a
neuron cannot spike after having spiked), Max rate has a high and low value, in
Hertz, which determines the range of firing rates neurons will have at the extent of
the radius (the maximum firing rate for a specific neuron is chosen randomly from
a uniform distribution between low and high), Intercept is the range of possible
values along the represented axis where a neuron ‘turns off’ (for a given neuron its
intercept is chosen randomly from a uniform distribution between low and high).
• Click OK. Click OK again, and the neurons will be created.
If you double-click on the population of neurons, each of the individual cells will
be displayed.
• Right-click on the population of neurons, select Plot→Constant Rate Re-
sponses.
The ‘activities’ graph which is now displayed shows the ‘tuning curves’ of all the
neurons in the population. This shows that there are both ‘on’ and ‘off’ neurons
in the population, that they have different maximum firing rates at x = ±1, and
that there is a range of intercepts between [−1, 1]. These are the heterogeneous
properties of the neurons that will be used to represent a scalar value.
• Right-click on the population of neurons, select Plot→Plot distortion: X.
This plot is an overlay of two different plots. The first, in red and blue compares
the ideal representation over the range of x (red) to the representation by the pop-
ulation of neurons (blue). If you look closely, you can see that blue does not lie
exactly on top of red, though it is close. To emphasize the difference between the
CHAPTER 2. AN INTRODUCTION TO BRAIN BUILDING 79
two plots, the green plot is the distortion (i.e. the difference between the ideal and
the neural representation). Essentially the green plot is the error in the neural rep-
resentation, blown up to see its finer structure. At the top of this graph, in the title
bar, the error is summarized by the MSE (mean squared error) over the range of
x. Importantly, MSE decreases as the square of the number of neurons (so RMSE
is proportional to 1/N), so more neurons will represent x better.
• Right-click on the population and select Configure. Any of these properties
can be changed for the population.
There are too many properties here to discuss them all. But, for example, if you
click on the arrow beside i neurons, double-click the current value. You can now
set it to a different number and the population will be regenerated with the new
number of neurons.
Let us now examine the population running in real time.
• Drag a new Input from the template bar into the Scalar Representation net-
work.
• Set Name to ’input’, make sure Output Dimensions is 1 and click Set Func-
tions.
• Select Constant Function from the drop down menu click Set and set Value
to 0. Click OK on all the open configuration windows.
The function input will appear. We will now connect the function input to the
neural population as in the tutorial in section 1.5. This essentially means that the
output from the function will be injected into the soma of each of the neurons in
the population, driving its activity.
• Drag a Termination component onto the ‘x’ population. Set Name to ‘input’,
Weights Input Dim to 1, and tauPSC to 0.02. Click Set Weights, double-click
the value and set it to 1. Click OK twice.
• Click and drag the ‘origin’ on the input function you created to the ‘input’
on the ‘x’ population.
• Click the Interactive Plots icon (the double sine wave in the top right cor-
ner).
This plot should be familiar from the previous tutorial. It allows us to interactively
change the input, and watch various output signals generated by the neurons.
CHAPTER 2. AN INTRODUCTION TO BRAIN BUILDING 80
• Right-click ‘x’ and select value. Right-click ‘x’ and select spike raster.
Right-click ‘input’ and select value. Right-click ‘input’ and select control.
Change the layout to something you prefer by dragging items around. If you
would like the layout to be remembered in case you close and re-open these plots,
click the small triangle in the middle of the bottom of the window (this expands
the view), then click the disk icon under layout (if it is not visible you may have
to widen the window).
• Click the play button. Grab the control and move it up and down. You will
see changes in the neural firing pattern, and the value graphs of the input
and the population.
Note that the value graph of the ‘x’ population is the linearly decoded estimate
of the input, as per the first principle of the NEF. Note also that the spike raster
graph is displaying the encoding of the input signal into spikes. The spiking of
only 10% of the neurons are shown by default. To increase this proportion:
• Right-click on the spike raster graph and select a larger percentage of neu-
rons to show.
The population of neurons does a reasonably good (if slightly noisy) job of rep-
resenting the input. However, neurons can not represent arbitrary values well. To
demonstrate this do the following.
• Right-click the control and select increase range. Do this again. The range
should now be ±4 (to see the range, hover over the slider).
• Centre the controller on zero. The population should represent zero. Slowly
move move the controller up, and watch the value graph from the ‘x’ popu-
lation.
Between 0 and 1, the graph will track the controller motions well. Notice that
many of the neurons are firing very quickly at this point. As you move the con-
troller past 1, the neural representation will no longer linearly follow your move-
ment. All the neurons will become ‘saturated,’ that is, firing at their maximum
possible rate. As you move the controller past 2 and 3, the decoded value will
almost stop moving altogether.
• Move the controller back to zero. Notice that changes around zero cause
relatively large changes in the neural activity compared to changes outside
of the radius (which is 1).
CHAPTER 2. AN INTRODUCTION TO BRAIN BUILDING 81
These effects make it clear why the neurons do a much better job of representing
information within the defined radius: changes in the neural activity outside the
radius no longer accurately reflect the changes of the input.
Notice that this population, which is representing a scalar value, does not in
any way store that value. Instead, the activity of the cells act as a momentary
representation of the current value of the incoming signal. That is, the population
acts together like a variable, which can take on many values over a certain range.
The particular value it takes on is represented by its activity, which is constantly
changing over time. This conception of neural representation is very different
from that found in many traditional connectionist networks which assume that
the activation of a neuron or a population of neurons represents the activation of
a specific ‘concept’. Here, the same population of neurons, differently activated,
can represent different ‘concepts’.8 I return to this issue in sections 9.4 and 10.1.1.
Representing a vector
A single scalar value is a simple neural representation, and hence at the bottom of
the representational hierarchy. Combining two or more scalars into a representa-
tion, and moving up one level in the hierarchy, results in a vector representation.
In this tutorial, I consider the case of two-dimensional vector representation, but
the ideas naturally generalize to any dimension. Many parts of cortex are best
characterized as using vector representations. Most famously, Apostolos Geor-
gopoulos and his colleagues have demonstrated vector representation in motor
cortex (Georgopoulos et al., 1984, 1986, 1993).
In their experiments, a monkey moves its arm in a given direction while the
activity of a neuron is recorded in motor cortex. The response of a single neuron
to forty different movements is shown in figure 2.15. As can be seen from this
figure, the neuron is most active for movements in a particular direction. This
direction is called the ‘preferred direction’ for the neuron. Georgopoulos’ work
has shown that over the population of motor neurons, these preferred directions,
captured by unit vectors pointing in that direction, are evenly distributed around
the unit circle in the plane of movement(Georgopoulos et al., 1993).
To construct a model of this kind of representation, we can do exactly the same
steps as for the scalar representation, but with a two-dimensional representation.
8 Thisis true even if we carve the populations of neurons up differently. That is, there is clearly
a range of values (perhaps not the whole range in this example) over which exactly the same
neurons are active, but different values are represented.
CHAPTER 2. AN INTRODUCTION TO BRAIN BUILDING 82
Figure 2.15: The response of a single neuron in motor cortex to different directions
of movement. M indicates the time of movement onset. T indicates the time that
the target appears. This data shows five trials in each direction for this neuron.
The neuron is most active when the target is between 135 and 180 degrees. The
direction of the peak of this response is called the neuron’s ‘preferred direction’.
In the NEF the preferred direction is represented as a unit (i.e., length one) vector
pointing in that direction. (Reproduced from Georgopoulos et al. (1986) with
permission.)
CHAPTER 2. AN INTRODUCTION TO BRAIN BUILDING 83
• In an empty Nengo world, drag a new Network into the workspace. Set the
Name of the network to ‘Vector Representation’ and click OK.
• Create a new Ensemble within the network. Set Name to ‘x’, Number of
nodes to 100, Dimensions to 2, Node factory to ‘LIF Neuron’, and Radius
to 1.
• Click the Set button. In the panel that appears, you can leave the defaults
(tauRC is 0.02, tauRef is 0.002, Max rate low is 100 high is 200, Intercept
is low -1.0 and high is 1.0).
The previous step gives a motor cortex-like distribution of preferred direction vec-
tors.
The noise parameter sets the expected amount of noise that Nengo uses to cal-
culate decoder values. Do not set this parameter to zero as this would make the
ensemble extremely sensitive to noise.
Given these settings, preferred direction vectors will be randomly chosen from an
even distribution around the unit n-dimensional hypersphere (i.e. the unit circle
in two dimensions). If you plot the constant rate responses, the neuron responses
will be plotted along the preferred direction vector of the cell. Consequently, the
plot is generated as if all neurons had the same preferred direction vector.
You can now create an input function, and run the network.
CHAPTER 2. AN INTRODUCTION TO BRAIN BUILDING 84
• Set Name to ’input’, make sure Output Dimensions is 2 and click Set Func-
tions.
• Select Constant Function from the drop down menu, click Set and set Value
to 0 for both functions. Click OK for each open window.
• Add a Termination to the ‘x’ ensemble. Set Name to ‘input’, Weights Input
Dim to 2, and tauPSC to 0.02. Click Set Weights and set the weights to
an identity matrix (the top-left and bottom-right values should be set to 1,
while the remaining to values should be set to 0).
Notice that you have a matrix of weights to set. This is because the input functions
can be projected to either of the dimensions represented by the neural population.
For simplicity, we have told the first function to go only to the first dimension and
the second function only to the second dimension.
• Click OK twice.
• Click and drag the ‘origin’ on the input function you created to the ‘input’
on the ‘x’ population.
• Right-click ‘x’ and select value. Right-click ‘x’ and select spike raster.
Right-click ‘input’ and select value. Right-click ‘input’ and select control.
• Press play to start the simulation. Move the controls to see the effects on the
spike raster (right-click the raster to show a higher percentage of neurons).
You can attempt to find a neuron’s preferred direction vector, but it will be difficult
because you have to visualize where in the 2D space you are because the value
plot is over time.
Now you can attempt to determine a neuron’s preferred direction vector. This
should be easier because you can see the position of the input vector in 2D space.
There is a trail to the plot to indicate where it has recently been. The easiest
way to estimate a neuron’s preferred direction vector is to essentially replicate the
Georgopoulos experiments.
• Move the input so both dimensions are approximately zero. Then move one
input to its maximum and minimum values.
If a neuron does not respond (or responds minimally), that is not its preferred
direction. A neuron(s) whose preferred direction is close to the direction you are
changing will respond vigorously to the changes you make. Keep in mind that
different neurons have different gains, meaning they may ‘ramp’ up and down at
different rates even with the same preferred direction.
• Right-click the ‘x’ population and select preferred directions to show the
neuron activity plotted along their preferred directions in the 2D space.
This plot multiplies the spike activity of the neuron with its preferred direction
vector. So, the longer lines are the preferred directions of the most active neurons.
• Put one input at an extreme value, and slowly move the other input between
extremes.
It should be clear that something like the average activity of the population of
neurons moves with the input. If you take the mean of the input (a straight line
through the middle of the blob of activity), it will give you an estimate of the input
value. That estimate is something like the linear decoding for a vector space as
defined in the first NEF principle (although it is not optimal, as in the principle).
• Right-click the ‘x’ population and select XY plot. This shows the actual
decoded value in 2D space.
Another view of the activity of the neurons can be given by looking at the neurons
plotted in a pseudo-cortical sheet.
This graph shows the subthreshold voltage to the neurons in the population in
gray. Yellow boxes indicate that a spike is fired.
CHAPTER 2. AN INTRODUCTION TO BRAIN BUILDING 86
• Right-click on the voltage grid and select improve layout to organize the
neurons so that ones with similar preferred direction vectors will be near
each other, as in motor cortex.
resources (i.e. values) to do so. If there was no uncertainty in any of these 100
values this would be a simple waste of resources. However, in the much more
realistic situation where there is uncertainty (due to noise of receptors, or noise in
the channels sending the signals, etc.) this redundancy can make specifying that
underlying object much more reliable. And, interestingly, it can make the system
much more flexible in how well it represents different parts of that space. For
example, we could use 10 of those neurons to represent the first dimension, or
we could use 50 neurons to do so. The second option would give a much more
accurate representation of that dimension than the first. Being able to redistribute
these resources to respond to task demands is one of the foundations of learning
(see section 6.4).
More extensive tutorials on neural representation are available on the CNRG
website at http://compneuro.uwaterloo.ca/cnrglab/?q=node/2.
Chapter 3
In reading what follows, it is important to keep in mind that the ideas captured
here represent the beginning, not the end, of a research program. For this reason,
it is perhaps worth stating what the following architecture – the Semantic Pointer
Architecture (SPA) – is not. First, it is not testable by a single, or small set, of
experiments. Being the foundation of a research program, the SPA gains some
amount of credibility when it gives rise to successful models. Failures of such
models lead to a reconsideration of those specific models and, when systematic, a
reconsideration of their foundations.
Second, the SPA is not a completed theory of mental function: we have not yet
actually built a fully cognitive brain (you will not be surprised to learn). In what
follows I describe models of perception, action, and cognition. And, I describe
these in a way that combining them is reasonably straightforward (an example
of their combination is found in chapter 7). Nevertheless, there are many more
behaviors involving these aspects cognition that I do not discuss.
Third, even theoretically speaking, the coverage of the SPA is uneven. Some
aspects of cognition are more directly addressed than others – the SPA is undeni-
ably a work in progress. Perhaps most obviously, the SPA has little to say about
the development of cognitive systems either evolutionarily or over the course of a
single life span. It is focused instead on characterizing already mature cognitive
systems (i.e., “typical adult human cognizers”). This focus is shared with most,
though not all, current cognitive architectures.
These qualifications aside, there are still compelling reasons to pursue the
SPA. First, while it is not a completed, unified theory of mental function, it is
an attempt to move towards such a theory. Such attempts can be useful in both
their successes and failures: either way, we learn about constructing a theory of
88
CHAPTER 3. BIOLOGICAL COGNITION – SEMANTICS 89
this kind. As discussed earlier, some have suggested that such a theory does not
exist (section 1.2). But the behavioral sciences are far too young to think we have
done anything other than scratch the surface of possible cognitive theories. And,
as I argue in detail in chapters 8 and 9, I believe the SPA moves beyond currently
available alternatives.
A second, related reason to pursue the SPA is its close connection to biolog-
ical considerations. A main goal of this work is to show how we can begin to
take biological detail seriously, even when considering sophisticated, cognitive
behavior. Even if it is obvious that we should use as much available empirical
data as possible to constrain our theories in general, actually doing do so requires
well-specified methods that contact available data as directly as possible. In what
follows, I describe specific applications of the SPA which draw on, relate to, and
predict empirical data in new and interesting ways (see e.g., sections 3.5, 4.6, 5.7,
6.2, etc.).
A third reason to pursue the SPA is its generality. I have attempted to demon-
strate the generality of the approach by choosing a broad set of relevant examples
which demonstrate, but do not exhaust, the principles at work. Of course, one
book is not enough to adequately address even a small portion of cognitive behav-
ior. In short, the intent is to provide a method and an architecture that opens the
way for a much wider variety of work than can possibly be captured in a single
book, or for that matter, done in a single lab.
I leave a more detailed discussion of the consequences of adopting this archi-
tecture for chapters 9 and 10. Nevertheless, keeping in mind the main motivations
behind the SPA should help situate the following introduction to it.
In the next four chapters, I describe and demonstrate central aspects of the Se-
mantic Pointer Architecture (SPA): semantics, syntax, control, and memory and
learning. In the chapter after that, I provide an integration of these aspects of cog-
nition in the form of a general theoretical characterization of the Semantic Pointer
Architecture, and an integrated model that addresses many features of biological
cognition at once. I then explicitly compare the SPA to past suggestions for cog-
nitive architectures, and discuss important practical and conceptual differences.
%=
+
$€
c
ab
ï âå
ëê
Figure 3.1: A “conceptual golf ball” depicting a 3D state space. Each dimple in
the ball represents a concept, visualized here as a set of specific example symbols
inside a circle. Vectors from the center of the sphere to the dimples on its surface
are used to identify these concepts. In the figure concepts for letters are clus-
tered together while other kinds of symbols (e.g. $, %) are clustered separately;
within the cluster of dimples storing letters, the letters with special accents are
also grouped more closely together. The location of a concept (or specific exam-
ple) on the surface of the conceptual golf ball thus carries information about its
relationship to other concepts (and/or specific example).
ball”. The surface of the ball represents the conceptual space, and the dimples
in the surface represent concepts. Specific examples of a concept are picked out
by points that lie within a given dimple. Concepts (and examples) that are se-
mantically similar lie in relatively close proximity compared to concepts that are
semantically dissimilar.
Unfortunately, things will get quickly crowded in 3 dimensions, and so neigh-
boring concepts will be easily confused. This is especially true if there is un-
certainty, or noise, in the specification of a given example. Since the presence
of noise should be expected, especially in a real physical system like the brain,
the number of distinct concepts that be represented effectively may be quite small
in low-dimensional spaces. The important contribution of high-dimensionality is
that the amount of surface area available to put concepts in increases incredibly
quickly with dimensionality, as shown in figure 3.2. As a result, many concepts,
CHAPTER 3. BIOLOGICAL COGNITION – SEMANTICS 92
7000000
Number of Vectors
6000000
5000000
4000000
3000000
2000000
1000000
0
0 2 4 6 8 10
Number of Dimensions
Figure 3.2: The scaling of the available conceptual space on the surface of a hy-
persphere with dimensionality. The figure shows the number of unit vectors that
can be placed into a disk and ball under the constraint that there must be a min-
imum angle of ten degrees between any two vectors. The rightmost plot extends
the result from intuitive two and three dimensional shapes to higher dimensional
spheres.
any given concept. There is simply too much information related to any particular
conceptual representation to be able to actively represent and manipulate it all at
the same time. This is why the Semantic Pointer Architecture employs the notion
of a “pointer”.
A pointer, in computer science, is a set of numbers that indicate the address
of a piece of information stored somewhere in memory. What is interesting about
pointers, is that manipulations of pointers themselves can be performed which
result in an indirect use of the information identified by the pointer, despite the fact
that that information itself is never explicitly addressed. Most such manipulations
are quite simple. For instance, if I am creating a list of data that needs to be
grouped, I can store each piece of data with a pointer to the next data item in the
list. Such a “linked list” provides a flexible method for traversing, removing, and
adding items to a collection of data. Similarly, if I need to pass a data structure to
a function which may use it, I can simply pass the pointer, which is typically much
smaller than the data structure itself. As a result, much less information needs to
be moved around within the system, while still making the relevant data available
for subsequent use.
One notable feature of pointers in computer science, is that a pointer itself
and the information contained at its address are arbitrarily related. As a result,
having the pointer itself often indicates nothing about what sort of information
will be found when the address to which it points is accessed. Since decisions
about how to use information often depend on what the content of that information
is, pointers are often “dereferenced” during program execution. Dereferencing
occurs when the data at the address specified by the pointer is accessed. In a
computer, this is a relatively cheap operation because memory is highly structured,
and the pointer is easy to interpret as an address.
Given these features, pointers are reminiscent of symbols. Symbols, after all,
are supposed to gain their computational utility from the arbitrary relationship
they hold with their contents (Fodor, 1998). And symbols are often thought to act
like labels for more sophisticated data structures (such as schemas, scripts, etc.),
just as pointers act as labels for whatever happens to be at their address.
The semantic pointer hypothesis suggests that neural representations, espe-
cially those implicated in cognitive processing, share central features with this
traditional notion of a pointer. In short, the hypothesis suggests that the brain
manipulates compact, address-like representations to take advantage of the signif-
icant efficiency and flexibility afforded by such representations. Relatedly, such
neural representations may be able to act like symbols in the brain.
However, the arbitrary relationship between a pointer and its contents seems
CHAPTER 3. BIOLOGICAL COGNITION – SEMANTICS 94
seat. As a result, much recent empirical work on semantics has focused on under-
standing what kind of semantic processing is needed to perform various cognitive
tasks. In short, the question of interest has become: for which behaviors do we
need “deep” semantic processing, and which can be effectively accounted for by
“shallow” semantic processing?
The distinction between deep and shallow processing can be traced back to
Allan Paivio’s (1986; 1971) Dual-Coding Theory. This theory suggests that per-
ceptual and verbal information are processed in distinct channels. In Paivio’s
theory, linguistic processing is done using a symbolic code, and perceptual pro-
cessing is done using an analog code, which retains the perceptual features of a
stimulus. Paivio (1986) provides a lengthy account of the many sources of em-
pirical evidence in support of this theory, which has been influential in much of
cognitive psychology, including work on working memory, reading, and human
computer interface design.
More recently, this theory has been slightly modified to include the observa-
tion that both channels are not always necessary for explaining human perfor-
mance on certain tasks (Simmons et al., 2008; Glaser, 1992). Specifically, simple
lexical decision tasks do not seem to engage the perceptual pathway. Two re-
cent experiments help demonstrate this conclusion. First, Solomon and Barsalou
(2004) behaviorally demonstrated that careful pairings of target words and prop-
erties can result in significant differences in response times to determining if a
property belongs to a target word. For instance, when subjects were asked to
determine if the second word in a pair was a property of the first word, false pair-
ings that were lexically associated took longer to process. For example, a pair
like “cherry-card” resulted in 100ms quicker responses than a pair like “banana-
monkey”. Second, Kan et al. (2003) observed that fMRI activation in perceptual
systems was only present in the difficult cases for such tasks. Together, this work
suggests that deep processing is not needed when a simple word association strat-
egy is sufficient to complete the task.
Nevertheless, much of the semantic processing we perform on a daily basis
seems to be of the “deep” type. Typical deep semantic processing occurs when
we understand language in a way that would allow us to paraphrase its meaning,
or answer probing questions about its content. It has been shown, for instance,
that when professional athletes, such as hockey players, read stories about their
sport, the portions of their brain that are involved in generating the motor ac-
tions associated with that sport are often active (Barsalou, 2009). This suggests
Barsalou, 1999).
CHAPTER 3. BIOLOGICAL COGNITION – SEMANTICS 97
that deep semantic processing may engage a kind of “simulation” of the circum-
stances described by the linguistic information. Similarly, when people are asked
to think and reason about objects (such as a watermelon), they do not merely ac-
tivate words that are associated with watermelons, but seem to implicitly activate
representations that are typical of watermelon backgrounds, bring up emotional
associations with watermelons, and activate tactile, auditory, and visual represen-
tations of watermelons (Barsalou, 2009).
Consider the Simmons et al. (2008) experiment which was aimed at demon-
strating both the timing and relative functions of deep and shallow processing. In
this experiment participants were each scanned in an fMRI machine twice. In one
session, the experimenters were interested in determining the parts of the brain
used during shallow semantic tasks and during deep semantic tasks. As a result,
participants were asked two questions: first, “For the following word, what other
words come to mind immediately?”; and second, “For the following word, imag-
ine a situation that contains what the word means and then describe it?” The
experimenters found that in response to the first question, language areas (such
as Broca’s area) are most active. In contrast, in response to the second question
participants engaged brain areas that are active during mental imagery, episodic
memory, and situational context tasks (Kosslyn et al., 2000; Buckner and Wheeler,
2001; Barr, 2004). In other words, from this session it was evident that simple lex-
ical association activated language areas, whereas complex meaning processing
activated perceptual areas, as would be expected from the Dual-Coding Theory.
During the other scanning session, participants were asked to list, in their
heads, answers to the question “what characteristics are typically true of X?”,
where X was randomly chosen from the same set of target words as in the first
session. When the two different scanning sessions were compared, the experi-
menters were able to deduce the timing of the activation of these two different ar-
eas. They found that the first half of the “typically true of X” task was dominated
by activation in language areas, whereas the second half of the task was domi-
nated by activation in perceptual areas. Consistent with the earlier behavioural
experiments, this work shows that shallow processing is much more rapid, so it
is not surprising that highly statistically related properties are listed first. Deep
processing takes longer, but provides for a richer characterization of the meaning
of the concept.
This experiment, and many others emphasizing the importance and nature of
deep semantic processing, have been carried out in Larry Barsalou’s lab at Emory
University. For the last two decades, he has suggested that his notion of “percep-
tual symbols” best characterizes the representational substrate of human cogni-
CHAPTER 3. BIOLOGICAL COGNITION – SEMANTICS 98
tion. He has suggested that the semantics of such symbols are captured by what
he calls “simulations”. Indeed, the notion of a “simulation” has often been linked
to ideas of deep semantic processing (e.g., Allport, 1985; Damasio, 1989; Pul-
vermüller, 1999; Martin, 2007). Consequently, they would no doubt agree with
Barsalou’s claim that deep semantic processing occurs when “the brain simulates
the perceptual, motor, and mental states active during actual interactions with the
word’s referents” (Simmons et al., 2008, p. 107). Indeed, his data and arguments
are compelling.
However, the important missing component of his theory is how such symbols
and simulations can be implemented and manipulated by the brain. In a discussion
of his work in 1999, one recurring critique was that his notion of “perceptual
symbols” is highly under-defined. For instance, Dennett and Viger pointedly note
“If ever a theory cried out for a computational model, it is here” (1999, p. 613).
More to the point, they conclude their discussion in the following manner:
Indeed, in the conclusion to a recent review of his own and others’ work on the
issue of semantic processing, Barsalou states (Barsalou, 2009, p. 1287):
Perhaps the most pressing issue surrounding this area of work is the
lack of well-specified computational accounts. Our understanding of
simulators, simulations, situated conceptualizations and pattern com-
pletion inference would be much deeper if computational accounts
specified the underlying mechanisms. Increasingly, grounding such
accounts in neural mechanisms is obviously important.
for the SPA is: how can we incorporate both deep and shallow processing? The
central hypothesis of the SPA outlines the answer I will pursue in the next three
sections: that semantic pointers carry partial semantic information. The crucial
steps to take now are to: 1) describe exactly how the partial semantic informa-
tion carried by semantic pointers is generated (and how they capture shallow se-
mantics); and 2) describe how semantic pointers can be used to access the deep
semantics to which they are related (i.e., how to dereference the pointers).
quently, methods for reducing the size of the matrix are often employed. These
methods are chosen to ensure that the most important statistical relationships cap-
tured by the original matrix are emphasized as much as possible. This, then is
a kind of lossy compression applied to the original raw matrix. One of the best
know compression methods is singular-value decomposition, which is used, for
example, in the well-known latent semantic analysis (LSA) approach (Deerwester
et al., 1990b). For explanatory purposes, however, I will simply consider the raw
matrix shown in figure 3.3.
The reason such a matrix can capture semantic information is that we expect
semantically similar words to occur in the same documents. This is because most
of the considered documents in these computational experiments are short, and
on one or a few basic themes – like the books and stories encountered in early
childhood. So, if we compare the row-vectors of semantically similar words, we
expect them to have similar vectors, because those words will appear in many of
the same contexts. Semantically unrelated words will have vectors that are not
very similar. Notice that like semantic pointers, these representations of words
are first and foremost compressed high-dimensional vectors.
While simple, this method is surprisingly effective at categorizing documents.
CHAPTER 3. BIOLOGICAL COGNITION – SEMANTICS 101
Using a wide variety of text corpora, well above 90% of new documents can be
correctly classified with such representations (Yang and Liu, 1999). More impor-
tantly, this same kind of representation has been used by researchers to capture
psychological properties of language. For example, Paaßet al. (2004) demonstrate
how prototypes2 can be extracted using such a method. The resulting representa-
tions capture standard typicality effects.3 As well, this kind of representation has
been used to write the Test of English as a Foreign Language (TOEFL). The com-
puter program employing these representations scored 64.4%, which compares
favorably to foreign applicants to American universities, who scored 64.5% on
average (Landauer and Dumais, 1997).
However, these representations of words clearly only capture shallow seman-
tics. After all, the word representations generated in this manner bear no rela-
tion to the actual objects that those same words pick out: instead, they model
the statistics of text. Nevertheless, this work shows that semantic representations
which capture basic statistics of word use can effectively support certain kinds of
linguistic processing observed in human subjects. As mentioned earlier, these rep-
resentations are compressed high-dimensional vectors, just like semantic pointers.
That is, they capture precisely the kind of shallow semantics that semantic point-
ers are proposed to capture in the SPA. In short, the SPA suggests that shallow
semantic processing can be performed without using the pointer as a pointer: i.e.,
by relying solely on the semantic content captured by the pointer representation
itself without dereferencing it.
However, consideration of the kinds of psychological experiments discussed
in the last section suggests that there is an important and distinct role for deep
semantic processing. This role is clearly not captured by the kinds of simple
lexical associations used to generate these shallow representations for automatic
text categorization. And, these shallow semantics are, in general, not sufficient to
address the symbol grounding problem identified in section 3.1. There is, after
all, no way to get back to a “richer” representation of a word from these term-
document representations. To address both deep semantics and symbol grounding,
in the next section I turn to a different, though still statistical, method of generating
2 Prototypes of categories are often used to explain the nature of concepts (Smith, 1989). It has
been a matter of some debate how such prototypes can be generated.
3 Typicality effects are used to explain why subjects rate some concept instances as being more
typical than others. These effects are often explained by the number of typical features that such
instances have (the more typical features an instance has, the more typical it will be). Typical
instances are both categorized more quickly and produced more readily by subjects. The prototype
theory of concepts has been successful at capturing many of these effects.
CHAPTER 3. BIOLOGICAL COGNITION – SEMANTICS 102
ing there is some visual data (e.g., an image), y, which is generated by the external
visual world and drives neural activity. If we suppose that the purpose of percep-
tual systems is to construct and use a statistical model of this data, the system must
figure out some function p(y) that describes how likely each state y is, so it can
use that information to disambiguate future data. For instance, if I am in North
America and a medium-sized brown animal is coming in my direction, I can as-
sign probabilities to the various kinds of animal it might be (e.g. groundhog, dog,
and rabbit are high, capybara and wallaby are low). Assigning those probabilities
is an example of using the model, p(y), that I have constructed based on past data.
Since the real world is extremely complex, the ideal statistical model will also
be enormously complex (as it is the probability of all possible data at all times).
As a result, the brain probably approximates this distribution by constructing what
is called a parameterized model. Such a model identifies a small number of pa-
rameters that capture the overall shape of the ideal model. For example, if the
statistics of the data y lie in the famous Bell curve (or Gaussian) distribution, we
can model the data with an equation like:
1 2 2
p(y) = √ e−(y−ȳ) /2σ
σ 2π
Then, to “capture” all past data using our model, we only need to remember two
parameters, ȳ (the mean) and σ (the standard deviation), and the equation describ-
ing their relationship. This is much more efficient than remembering each value
of p(y) for each value of y explicitly.
To build such a model, the system needs to estimate the parameters (the mean
and standard deviation in this case). Of course, to do any such estimating, the
system needs data. As a result, a kind of bootstrapping process is necessary to
construct this kind of model: we need data to estimate the parameters; then use
our best estimate of the parameters to interpret any new data. Despite this seeming
circularity, extremely powerful and general algorithms have been designed for
estimating exactly these kinds of models (Dempster et al., 1977). Such methods
have been extensively employed in building connectionist-type models, and have
been suggested to map to biological neural networks.4
Note, however, that the methods for model inference do not specify the struc-
ture of the model itself (i.e. the relationships between the parameters). In the
4A good place to start for state-of-the-art applications of these methods is Geoff Hinton’s web
page at http://www.cs.toronto.edu/~hinton/. For discussion of biological mappings of
these methods see Friston (2003).
CHAPTER 3. BIOLOGICAL COGNITION – SEMANTICS 104
50 Nodes
5
300 Nodes
500 Nodes
1000 Nodes
28 x 28 Image
784 Pixels
artificial neural network application of these methods, this structure is often “bio-
logically inspired”. One notable feature of brain structure that has proven a very
useful starting point to constrain such models, is its hierarchical nature. The best
known example of this structure in neuroscience is the visual hierarchy. For object
recognition, this hierarchy begins with the retina, and proceeds through thalamus
to visual areas V1 (primary visual cortex), V2 (secondary visual cortex), V4, and
IT (inferotemporal cortex) (Felleman and Essen, 1991).
In a hierarchical statistical model, each higher level in the hierarchy attempts
to build a statistical model of the level below it. Taken together, the levels de-
fine a model of the original input data (see figure 3.4). This kind of hierarchical
structure naturally allows the progressive generation of more complex features at
higher levels, and progressively captures higher-order relationships in the data.
Furthermore, these kinds of model lead to relations between hierarchical levels
that are reminiscent of the variety of neural connectivity observed in cortex: feed-
forward, feedback, and recurrent (interlayer) connections are all essential.
The power of these methods for generating effective statistical models is im-
CHAPTER 3. BIOLOGICAL COGNITION – SEMANTICS 105
Figure 3.5: Input images from the MNIST database of handwritten digits.
pressive (Beal, 1998). They have been applied to solve a number of standard pat-
tern recognition problems, improving on other state-of-the-art methods (Hinton
and Salakhutdinov, 2006). Furthermore, they have been shown to generate sensi-
tivities to the input data that look like the tuning curves seen in visual cortex (Lee
et al., 2007), when constructing models of natural images. In fact, many of the
most actively researched models of vision are naturally interpreted as constructing
exactly these kinds of hierarchical statistical models.
To get a clearer picture of what this approach to perceptual modeling offers,
and how it can be used to generate semantic pointers, let us turn to an example of
such a model that was built in my lab by Charlie Tang. The purpose of this system
is to construct representations which support recognition of handwritten digits
presented as visual input. The input is taken from the commonly used MNIST
database at http://yann.lecun.com/exdb/mnist/. Examples of the input are
shown in figure 3.5. The model is structured as shown in figure 3.4, and is shown
60,000 examples out of this data set, and told how those examples should be
categorized. Based on this experience, the model tunes its parameters to be able
to deal with another 10,000 unseen, though similar, visual inputs in the data set.
To maintain biological relevance, the first layer of the model is trained on
natural images, in order to construct an input representation that looks like that
found in primary visual cortex. As shown in figure 3.6, the tuning curves capture
many of the properties of V1 tuning, including a variety of spatial frequencies
(i.e. narrowness of the banding), positions, orientations of the bands, and the
edge-detector-like shape.5 More direct matches to biological data are shown at
the bottom of figure 3.6. The methods used to generate these tuning curves from
5 Trainingsuch networks on natural images has often been shown to result in V1-like tuning
(Olshausen and Field, 1996).
CHAPTER 3. BIOLOGICAL COGNITION – SEMANTICS 106
Model
Data
Figure 3.6: A sample of tuning curves of neurons in the model. These learned
representations have a variety of orientations, spatial frequencies, and positions,
likes those found in V1. A comparison of data and model tuning curves from V1.
Data adapted from Ringach (2002) with permission.
spiking single cells are identical to those used to generate the biological curves
(Ringach, 2002).
These tuning curves are used to represent the input digits, and then the remain-
ing layers of the network are trained to capture the statistics of that input. By the
time we get to the highest level of the hierarchy, we have a much smaller (i.e.,
compressed) representation summarizing what has been presented to the retina.
This compressed representation is a semantic pointer.
Notably, the highest layer of this network has 50 nodes, which means that
the state space is 50-dimensional. It is this 50D space that contains the semantic
pointers whose contents tell us about the presented digits. Clearly, this representa-
tion does not contain all of the information available in early visual areas. Instead,
CHAPTER 3. BIOLOGICAL COGNITION – SEMANTICS 107
it is a summary that usefully supports this object recognition task. This network,
like most in machine vision, can score less than 2% error on the 10,000 test dig-
its which have not been used to train the network (i.e., they classify about 200
wrong). In fact, these models outperform humans on this and other similar tasks
(Chaaban and Scheessele, 2007).
So, these compressed representations (i.e., semantic pointers), like the lexical
ones discussed previously, can capture important information that can be used to
classify the input. In both cases, it is shallow comparisons, that is, those directly
between the semantic pointers, that result in the classification. So, in both cases,
the semantic pointers themselves carry shallow semantic information. However,
there are two important differences between these semantic pointers and the lex-
ical ones. First, these were generated based on raw images. This, of course, is
a more biologically relevant input stream than text: we do not have organs for
directly detecting text. Consequently, the 50D pointers are grounded in a natural
visual input. If we have a way of treating these pointers like symbols, then we
have found a natural solution to the symbol grounding problem.
Second, and more importantly, these 50D pointers can be used to drive deep
semantics. That is, we can, in a sense, run the model “backwards” to decode, or
dereference, the 50D representation. In other words, we can clamp the semantic
pointer representation at the top level of the network and then generate an input
image at the lowest level.6 Several examples of this process are shown in figure
3.7.
This figure demonstrates that a lot of the detail of an input is in fact captured
by the semantic pointer representation. Subtleties of the way particular letters are
drawn, such as whether an “8” is slanted or not, can be reconstructed from the
pointer that such a figure generates. It is these subtleties that capture the deep
semantics of this representation. However, it is obviously not always the case that
we have a precisely drawn “8” in mind when we talk about the number “8”. That
is, we might want to have access to the deep visual semantics of a pointer, when it
is generated by an auditory input. In such a case, we can still use a visual semantic
pointer to represent a generic “8”. A natural choice is the mean of the pointers
associated with a category (see figure 3.7), which can be thought of as a prototype
6 This does not suggest that the brain somehow recreates retinal images (there are no neurons
that project from cortex to retina). Instead, figure 3.7 shows the retinal images that are consistent
with the unpacked cortical representations. The deep semantic information at these non-retinal
levels is accessible to the rest of cortex. In the brain, the unpacking would stop sooner, but could
still be carried out to as low a level as necessary for the task at hand. This is one reason why seeing
an image is not the same as imagining one, no matter how detailed the imagining.
CHAPTER 3. BIOLOGICAL COGNITION – SEMANTICS 108
4
4 9
2 5
6
0
0 8 1
7 1 3 2
3
−2
4
0 5
2
−4 6
7
8
−6
9
−8
−6 −4 −2 0 2 4 6 8
Figure 3.7: Dereferencing semantic pointers. (Top left) The original input of sev-
eral randomly chosen images. These are compressed into a 50D representation
at the highest level of the model. (Middle left) The resulting images from deref-
erencing the 50D semantic pointers. The 50D representations generated by the
network based on the input are clamped at the highest-level, and the rest of the
network is used to generate a guess as to what input would produce that semantic
pointer. (Right) The compressed 50D representation is plotted in two dimensions
using dimensionality reduction techniques. Large dark circles indicate the mean
value of the semantic pointers for each category. These can be thought of as the
middle of the dimples on the conceptual golf ball (see figure 3.1). (Bottom left)
The unpacked mean semantic pointer for each category. These can be thought of
as a prototype of the category.
CHAPTER 3. BIOLOGICAL COGNITION – SEMANTICS 109
a) b)
c) d)
a) b)
Cube Octahedron Perception Motor Control
Information Flow
Dominant
World
World
8 vertices 8 faces
6 faces 6 vertices World Representa on Representa on World
Figure 3.9: Dual problems. a) The cube and the octahedron provide a simple
example. Both figures share a similar geometric description, but the number of
vertices and faces of the two shapes are reversed. b) The perceptual and motor
systems are much more complicated, but can also be treated as duals. The per-
ceptual system encodes high-dimensional, non-linear relations using hierarchi-
cally compressed representations. The motor control system reverses this process
(though with similar structure and representations) to determine high-dimensional
nonlinear control signals from a low-dimensional signal.
As a simple example, consider the relationship that exists between a cube and
an octahedron (see figure 3.9a). Notice that there are the same number of sides
in a cube as vertices in an octahedron and vice versa. As well, both have twelve
edges, connecting the relevant vertices/faces in a structurally analogous manner.
These two solids are thus duals. If we pose a problem for one of these solids –
e.g., What is the volume of the smallest sphere that intersects all vertices (faces)?
– then a solution for one of the solids provides a solution for the other. This is
true as long as we swap the relevant structural elements (i.e., faces and vertices)
appropriately.
Why does this matter for understanding perception and motor control? Be-
cause this dual relationship has been suggested to exist between statistical models
of perceptual processes and optimal control models of motor processes (Todorov,
2007, 2009). Figure 3.9b suggests a mapping between perceptual and motor sys-
tems that takes advantage of this duality. From an architectural point of view,
this duality is very useful because it means that there is nothing different in kind
about perceptual and motor systems. From the perspective the SPA in particular,
this means that semantic pointers can play the same role in both perception and
CHAPTER 3. BIOLOGICAL COGNITION – SEMANTICS 114
action. Much of the remaining discussion in this section describes this role in the
context of motor control.
To begin, like the perceptual system, the motor system is commonly taken to
be hierarchical (see figure 3.10). Typically, we think of information as flowing
down rather than up the motor hierarchy. For instance, suppose you would like
to move your hand towards a given target object. Once the desired goal state has
been generated, it is provided to the cortical motor system. The first level of this
system may then determine which direction the arm must move in order to reach
that goal state.9 Once the direction of movement has been determined, it can be
used as a control signal for a controller that is lower in the hierarchy. Of course,
specification of the direction of movement does not determine how torques need
to be applied at the joints of the arm in order to realize that movement. This,
then, would be the function of the next lower level in the control hierarchy. Once
the specific forces needed to move the arm are determined, the specific tensions
that need to be produced in the muscles in order to realize those forces must be
generated by the activity of motor neurons. In short, as the motor command be-
comes more and more specific to the particular circumstance (e.g., including the
particular part of the body that is being moved, the current orientation of the body,
the medium through which the body is moving, etc.), lower level controllers are
recruited to determine the appropriate control signal for the next lower level. Ulti-
mately, this results in activity in motor neurons that cause muscles to contract and
our bodies move.
Notably, just as it is inaccurate to think of information as flowing only “up”
the perceptual hierarchy, so is it a mistake to think of this picture of motor control
as being in one direction. As well, in both systems, there are many connections
that skip hierarchical levels, so the flow of information is extremely complex.
Nevertheless, the main problems to be tackled are very similar. So, just as we
can begin our characterization of perception as thinking of higher levels in the
perceptual hierarchy as constructing models of the levels below them, so we can
think of higher levels in the motor hierarchy as having models of the lower levels.
Higher levels can then use such models to determine an appropriate control signal
to affect the behavior of that lower level.
There is good evidence for this kind of control structure in the brain (Wolpert
and Kawato, 1998; Kawato, 1995; Oztop et al., 2006). That is, a control structure
9 This description is highly simplified, and employs representations and movement decompo-
sitions that may not occur in the motor system. Identifying more realistic representations would
require a much lengthier discussion, but would not add to the main point of this section. For more
detailed descriptions of such representations, see (Dewolf, 2010), and Dewolf neur eng. paper???).
CHAPTER 3. BIOLOGICAL COGNITION – SEMANTICS 115
in which higher levels of the hierarchy have explicit models of the behavior of
lower levels. When we attempt to specify these models, a simplification in our
characterization of perception becomes problematic: we assumed that the models
were static. In motor control, there is no such thing as non-temporal processing.
Time is unavoidable. Embracing this observation actually renders the connec-
tion between perceptual and motor hierarchies even deeper: higher levels in both
hierarchies must model the statistics and the dynamics of the levels below them.
In the previous perceptual model, the statistics at higher-levels capture the
regularities at lower levels. In the neural model, these were evident as neural
tuning curves that were used to represent the perceptual space. In motor control,
these same kinds of regularities are often called “synergies” (Bernstein, 1967;
Lee, 1984). Synergies are useful organizations of sets of movements that are often
elicited together. As before, we would expect the tuning curves of neural models
to reflect these synergies for the representation of the space of motor actions.
These representations need to be learned based on the statistics of the space they
are representing, as in the perceptual model above.
What is not addressed by that simple perceptual model is dynamics.10 This
simplification is not appropriate when considering motor control. After all, the
particular dynamics of the system are likely to affect which synergies are most
useful for controlling action. So, the representational and dynamical aspects of
the system are tightly coupled. In fact, it is natural to describe this connection
in a way captured by the third principle of the NEF: the dynamical models are
defined over the representational state space. That is, the dynamical models of a
given level of the hierarchy are defined using the synergies of that level.11 The
resulting dynamics then drive lower levels, and feedback from lower levels can
inform these higher-level models. So, despite the added complexity of dynamics,
the general structure of the motor system is hierarchical, just like the perceptual
system.
Consequently, like the simpler perceptual model, there are two main features
10 For effective control, dynamical models are typically broken into two components: a forward
model and an inverse model. A forward model is one which predicts the next state of the system
given the current state and the control signal. An inverse model performs the opposite task, pro-
viding a control signal that can move between two given states of the system. For simplicity, I
discuss both using the term “dynamical model”, as this level of detail is beyond the current scope.
11 There is some issue here about drawing boundaries to determine the input and output of these
models. Essentially, a control (error) signal can be thought of as generated at a given hierarchical
level and then mapped to a lower (higher) level, or the mapping and generation may be considered
concurrent, in which case the dynamical model maps synergies at different levels. It is possible
that both happen in the nervous system, but I believe the former is easier to conceptualize.
CHAPTER 3. BIOLOGICAL COGNITION – SEMANTICS 116
World
Figure 3.10: Neural optimal control hierarchy (NOCH). This framework maps
neuroanatomy onto optimal control. The numbers indicate the approximate order
of processing for a movement. Each box contains a brief description of the func-
tion of that area. Semantic pointers can be thought of as driving the top of this
hierarchy, as in the perceptual system. Abbreviations: PM - premotor area; SMA -
supplementary motor area; M1 - primary motor area; S1 - primary somatosensory
area; S2 - secondary somatosensory area. This diagram is a simplified version of
NOCH presented in Dewolf (2010).
CHAPTER 3. BIOLOGICAL COGNITION – SEMANTICS 118
0.01 1.2
0 1
−0.01 0.8
0.6
−0.02
0.4
−0.03
−0.03 −0.02 −0.01 0 0.01 0.02 −1.5 −1 −0.5
Figure 3.12: A semantic motor space. This figure shows how similar high-level
semantic pointers can generate similar low-level movements. a) The clustering
of 6D control signals used to drive the movements in b). These are arbitrarily
rotated because of the dimensionality reduction performed to plot them in a 2D
space. b) The movements of the arm driven by one of eight semantic pointers
(corresponding to different marker shapes). The “dereferencing” of the pointers
drive the high-dimensional system consistently in external space.
tor areas. More recent work in motor control has been suggesting that higher-
dimensional representations are in fact the norm in motor cortex (Churchland
et al., 2010). And, in our own recent work we have shown that a spiking neuron
implementation of this kind of NOCH model effectively captures a wide variety
of the nonlinear temporal dynamics observed in the responses of motor neurons
(dewolf eliasmith neural eng paper???). As a result, the NOCH approach maps
well to the biological basis of motor control.
Conceptually, the most important consequence of NOCH is captured by a
comparison of figure 3.12 to the perceptual example shown in figure 3.7. This
comparison highlights the many similarities of the characterizations of motor and
perceptual semantics I have presented so far. In short, the high-dimensional con-
trol signals of the motor system are like the high-dimensional input images, and
the low-dimensional semantic pointers for target positions are like the semantic
CHAPTER 3. BIOLOGICAL COGNITION – SEMANTICS 119
semantic pointers, but they need not be the only kind. So, the means of defining
the semantics of the pointers, and their precise location in the hierarchy are not
essential to the SPA. What is central to the architecture is that the representations
generated by these models act as low-dimensional summaries of high-dimensional
systems engaged by the brain. This central feature highlights another defeasible
assumption of the above examples: that the perceptual and motor models are gen-
erated independently.
As has been argued forcefully in a variety of disciplines,13 it is a mistake to
think of biological systems as processing perceptual information and then pro-
cessing motor information. Rather, both are processed concurrently, and inform
one another “all the way up” the hierarchies. Consequently, it is more appropriate
to think of the semantics of items at the top of both hierarchies as having concur-
rent perceptual and motor semantics. The integrative model in chapter 7 provides
an example of this. In that case, the semantic pointer generated by the perceptual
system is “dereferenced” by the motor system. Thus, the deep semantics of the
pointer depends on both of the models that can be used to dereference it.
The SPA thus provides a natural, but precise and quantitative, means of spec-
ifying the kinds of interaction between perception and action that have seemed
central to cognitive behavior. Such a characterization may remind some of the
notion of an “affordance”, introduced by psychologist James Gibson (1977). Af-
fordances are “action possibilities” that are determined by the relationships be-
tween an organism and its environment. Gibson suggested that affordances are
automatically picked up by animals in their natural environments, and provide a
better characterization of perception (as linked to action), than traditional views.
Several robotics researchers have embraced these ideas, and found them use-
ful starting points for building interactive perception/action systems (Scheier and
Pfeifer, 1995; Ballard, 1991). Similarly, the notion of an “affordance”, while
somewhat vague, seems to relate to the information captured by a semantic pointer
when embedded in the SPA. As just mentioned, these pointers can be used as di-
rect links between action and perception that depend on the motor and perceptual
experiences of the organism. Consequently, their semantics are directly tied to
both the environment and body in which they are generated. There are, undoubt-
edly, many differences between semantic pointers and affordances as well (e.g.,
semantic pointers do not seem to be “directly perceived” in the sense champi-
13 I can by no means give comprehensive coverage of the many researchers who have made these
sorts of arguments, which can be found in robotics, philosophy, neuroscience, and psychology.
Some useful starting points include Brooks (1991); Churchland et al. (1994); Port and van Gelder
(1995); Regan and Noë (2001).
CHAPTER 3. BIOLOGICAL COGNITION – SEMANTICS 122
oned by Gibson, for example). Nevertheless, the similarities help relate semantic
pointers to concepts already familiar in psychology.
As well, affordances highlight the important interplay between perception and
action. Motor control, after all, is only as good as the information it gathers
from the environment. As a result, it is somewhat awkward to attempt to de-
scribe a motor controller without discussing perceptual input (notice that NOCH,
in figure 3.10, includes perceptual feedback). It is much more appropriate to con-
ceive of the entire perception/action system as being a series of nested controllers,
rather than a feed-in and feed-out hierarchy. As depicted in figure 3.13, nested
controllers become evident by inter-leaving the hierarchies of the kind described
above.14 What should be immediately evident from this structure, is that the pro-
cess of perceiving and controlling becomes much more dynamic, interacting at
many levels of what were previously conceived of as separate hierarchies. This,
of course, makes the system much more complicated, which is why it was more
convenient to describe it as two hierarchies. And, indeed, the real system is more
complicated still, with connections that skip levels of the hierarchy, and multiple
perceptual paths interacting with a single level of the controller hierarchy. The
question of relevance is: does identifying this more sophisticated structure change
our story about semantic pointers?
Given the suggested genesis and use of semantic pointers, I believe the answer
is no. Recall that the way the highest level representations in the motor and per-
ceptual hierarchies were generated, was by characterizing statistical models that
identify the relationships within the domain of interest. Whether these models are
influenced solely by perceptual or motor processes, or whether they are influenced
by perceptual and motor processes, may change the nature of those relationships,
but will not change the effective methods for characterizing those relationships.
It will, then, still be the case that dereferencing a perceptual representation for
deep semantic processing results in identifying finer perceptual details not avail-
able in the higher-level (semantic pointer) representation. As suggested earlier,
these same pointers may include information about relevant motor activities. So,
semantic pointers will still be compressed and capture higher-order relationships,
it just may be that their contents are neither strictly perceptual nor strictly motor.
So perhaps the perceptual/motor divide is one that is occasionally convenient for
theorizing, even though it is not built into the semantics of our conceptual system.
14 The observation that perceptual and motor cortex are both hierarchically structured and mutu-
ally interacting is hardly a new new one (Fuster, 2000), what is new, I believe, is the computational
specification, the biological implementation of the computations, and the integration into a cogni-
tive hierarchy.
CHAPTER 3. BIOLOGICAL COGNITION – SEMANTICS 123
Brain
action
...
Nested Dynamical
controller system
Body
Dynamical
Controller system
feedback
senses
World
We should also be careful, however, not to overstate the closeness of the re-
lationship between perception and action. Dissociation between impairment to
visually guided motor control and object identification has been well-established
(Goodale and Milner, 1992). Some patients, with damage to the dorsal visual
pathways, can identify objects but not reach appropriately for them. Others, with
damage to the ventral visual pathways, can reach appropriately, but not classify
objects. This suggests two things: 1) that motor action and visual perception
can come apart; and 2) that the object classification model presented earlier only
models part of the visual system (the ventral stream), at best. Again, the relevant
point here is not about the completeness of the models, but that the subtle, com-
plex character of action and perception in biological systems is consistent with
the assumptions of the SPA. Capturing the appropriate degree of integration and
distinctness of action and perception is a task independent of generating semantic
pointers, so long as there are high-level representations with the assumed proper-
ties of semantic pointers.
So, semantic pointers are compressed representations of motor/perceptual in-
formation found within a nested control structure. The story I have told so far is
intended to characterize semantics, both deep and shallow, as related to semantic
pointers. Consequently, I would argue that this account is theoretically appealing
because it unifies deep and shallow semantics along with sensory and motor se-
mantics in a manner that can be realized in a biological cognitive system. What
remains to be described in order to make semantic pointers cognitively relevant,
is how they can actually be used to encode complex syntactic structures. That is,
how are these perceptual/motor vectors used to generate the kinds of language-
like representations that underlie high-level cognition? Answering this question
is the purpose of the next chapter.
to review that material. The following discussion is broken into two sections, the
first on linear transformations and the second on nonlinear transformations. The
tutorial focuses on scalar transformations, but the methods generalize readily to
any level of the representational hierarchy, as I briefly discuss.
Linear transformations
As formulated, principle 2 makes it clear that transformation is an extension of
representation. In general, transformations rely on the same encoding, but exploit
a different decoding than is used when representing a variable. I will introduce
such transformations shortly. First, however, we can consider a transformation
that is a basic property of single neurons – addition. Addition transforms two
inputs into a single output which is their sum.
• In Nengo, create a new network. Set the Name of the network to ‘Scalar
Addition’.
• Create a new ensemble within the ‘Scalar Addition’ network. Name the new
ensemble ‘A’ and set Number of Nodes to 100, otherwise use the default
settings.
To ensure the ensemble uses the default settings, choose ’default’ from the drop-
down menu at the top and then change any settings as appropriate. In general,
Nengo reuses the last set of settings used to generate an ensemble.
• Create a second ensemble within the ‘Scalar Addition’ network. Name this
ensemble ‘B’ and use the same parameters as you did for ensemble A.
These two neural ensembles will represent the two variables being summed. As
with the tutorial on neural representation, we now need to connect external inputs
to these ensembles.
• Create a new function input, name it ‘input A’, set Output Dimensions to
1, and click Set Functions to ensure that the function will be a Constant
Function.
• Name the termination ‘input’ and ensure that number of input dimensions
(Input Dim) is set to 1. The default value of tauPSC (0.02) is acceptable.
• Click Set Weights and confirm that the value in the 1 to 1 coupling matrix
that appears is 1.0.
• Click OK to confirm the options set in each window until the new termina-
tion is created.
• Follow the same procedure to make a second constant function input and
project it to ensemble B.
This process should be familiar from the previous tutorial. We have represented
two scalar values in neural ensembles and we could view the result in the Inter-
active Plots window as before. However, we wish to add the values in the two
ensembles and to do this we will need a third ensemble to represent the sum.
• Create another ensemble. Call it ‘Sum’ and give it 100 nodes and 1 dimen-
sion. Set Radius to 2.0. Click OK.
Note that we have given the new ensemble a radius of 2.0. This means that it
will be able to accurately represent values from -2.0 to 2.0 which is the maximum
range of summing the input values. The ‘Sum’ ensemble now needs to receive
projections from the two input ensembles so it can perform the actual addition.
• Create a new termination on the ‘Sum’ ensemble. Name it ‘input 1’, set
Input Dim to 1.
• Open the coupling matrix by clicking Set Weights and set the value of the
connection to 1.0.
• Form a projection between ensemble ‘A’ and the ‘Sum’ ensemble by con-
necting the origin of ‘A’ to one of the terminations on ‘Sum’.
CHAPTER 3. BIOLOGICAL COGNITION – SEMANTICS 127
• Likewise form a projection between ensemble ‘B’ and the ‘Sum’ ensemble
by connecting its origin to the remaining termination.
You have just completed the addition network; the addition is performed simply
by having two terminations for two different inputs. Recall from the discussion in
section 2.1 that when a neuron receives a signal from another neuron, postsynaptic
currents are triggered in the receiving neuron and these currents are summed in the
soma. When signals are received from multiple sources they are added in exactly
the same fashion. This makes addition the default transformation of neuron inputs.
• To test the addition network, open the Interactive Plots viewer, display the
control sliders for the inputs and the value of the ‘Sum’ ensemble, and run
the simulation.
• Play with the values of the input variables until you are satisfied that the
‘Sum’ ensemble is representing the sum correctly. If the value graphs are
difficult to see, right-click on the graph and select ‘auto-zoom’.
Another exploration to perform with this network is to set one of the inputs to zero,
and change the value of the other input. You will see that the ‘Sum’ ensemble
simply re-represents that non-zero input value. This is perhaps the simplest kind
of transformation and is sometimes called a ‘communication channel’ because it
simply communicates the value in the first ensemble to the second (i.e., it performs
the identity transformation). It is interesting to note that the firing patterns can be
very different in these two ensembles, even though they are representing the same
value of their respective variables. This can be best seen by viewing either the
spike rasters or the voltage grid from both ensembles simultaneously.
Now that you have a working addition network, it is simple to compute any
linear function of scalars. Linear functions of scalars are of the form z = c1 x1 +
c2 x2 + ... for any constants cn and variables xn . So, the next step towards imple-
menting any linear function is to create a network that can multiply a represented
variable by a constant value (the constant is often called a ‘gain’ or ‘scaling fac-
tor’). Scaling by a constant value should not be confused with multiplying to-
gether two variables, which is a non-linear operation that will be discussed below
in section 3.8.
We will create this network by editing the addition network.
• Click the Inspector icon (the magnifying glass in the top right corner of the
window).
CHAPTER 3. BIOLOGICAL COGNITION – SEMANTICS 128
• In the inspector, click the gray triangle pointing to A transform (at the very
bottom).
This now shows the coupling weight you set earlier when creating this termina-
tion. It should be equal to 1.0.
• Double-click the 1.0 value. In the window that appears, double-click the 1.0
value again. Change this value to 2.0.
• Click Save Changes on the open dialog box. You should see the updated
value in the Inspector.
Setting the weight of a termination determines what constant multiple of the input
value is added to the decoded value of an ensemble. The preceding steps have set
up the weights such that the value of ‘A’ input is doubled.
• Test the network in the Interactive Plots window.
You should be able to confirm that the value of ‘Sum’ is twice the value of ‘A’,
when you leave ‘B’ set to zero. However, if you leave ‘A’ at a high value and
move ‘B’ you will begin to saturate the ‘Sum’ ensemble because its radius is only
2.0. The saturation (i.e. under-estimation) effect is minimal near the radius value
(2.0), and becomes more evident the farther over the radius the ensemble is driven.
To prevent saturation, change the radius of the ‘Sum’ population by changing the
‘radii’ property in the Inspector when the population is selected.
You can easily change the linear transformation between these ensembles by
selecting any scalar value for the A transform of the two inputs to the ‘Sum’ pop-
ulation as described above. However, you must also set an appropriate radius for
the ‘Sum’ ensemble to avoid saturation.
We can now generalize this network to any linear transformation, all of which
can be written in the form z = C1 x1 + C2 x2 + ..., where Cn are constant matrices
and xn are vector variables. Computing linear transformations using vector repre-
sentations instead of scalars does not introduce any new concepts, but the mapping
between populations becomes slightly more complicated. We will consider only a
single transformation in this network (i.e. z =Cx where C is the coupling matrix).
• In a blank Nengo workspace, create a network called ‘Arbitrary Linear
Transformation’.
CHAPTER 3. BIOLOGICAL COGNITION – SEMANTICS 129
• Create a new ensemble named ‘x’ with the default template, and give it 200
nodes and 2 dimensions. This will be the input.
• Create another ensemble named ‘z’ with the default template, and give it
200 nodes and 3 dimensions. This will be the output.
Note that we have two vector ensembles of different dimensionality. To connect
these together, we must specify the weights between each pair of variables in the
two ensembles.
• Add a new decoded termination to ensemble ‘z’ called ‘input’.
• Create an ‘input’ termination on ensemble ‘x’. Set Input Dim to 2 and click
Set Weights to set the coupling matrix to an identity matrix (1.0 along the
diagonal).
Creating a coupling matrix with a diagonal matrix of ones in this manner causes
the input dimensions to be mapped directly to corresponding output dimensions.
CHAPTER 3. BIOLOGICAL COGNITION – SEMANTICS 130
• Create a new function input. Give it 2 dimensions and set the functions for
each of the 2 dimensions as constant functions.
Nonlinear transformations
In the previous section, you learned how to build a linear brain. The real world is
filled with nonlinearities, however, and so dealing with it often requires nonlinear
computation. In this tutorial, we compute nonlinear functions by, surprisingly,
finding linear decodings. In fact, nonlinear decoding can be thought of as nothing
more than an ‘alternate’ decoding of a represented value, which we also linearly
decode. In this tutorial we only consider decoding using linear combinations of
neural responses but, as discussed later in section 4.2.2, combinations of dendritic
responses may allow for even more efficient computation of nonlinear transforma-
tions.
• Create a new network called ‘Nonlinear Function’.
• Inside that network, create an ensemble with the ‘default’ template and
name it ‘X’ (it should have 100 neurons representing 1 dimension with a
radius of 1.0).
• Create a constant function input for this ensemble and connect it the ensem-
ble (remember to create a decoded termination with a connection weight of
1.0 on the ‘X’ ensemble and a ‘tauPSC’ of 0.02).
As in the preceding section of this tutorial, we are starting with a scalar ensemble
for the sake of simplicity. Now we will square the value represented in ‘X’ and
represent the squared valued in another ensemble.
CHAPTER 3. BIOLOGICAL COGNITION – SEMANTICS 131
Note that we are not using the termination to calculate the transformation. This
is because the termination can only scale the decoded input by a constant factor,
leading to linear transformations such as those in the first part of this tutorial.
Non-linear transformations are made with decoded origins.
• Drag an Origin from the template bar onto the ‘X’ ensemble.
• In the dialog box that appears, name the origin ‘square’, set Output Dimen-
sions to 1, and click Set Functions.
• Select User-defined Function from the drop-down list of functions and click
Set.
The ‘User-defined Function’ dialog box that is now displayed allows you to enter
arbitrary functions for the ensemble to approximate. The ‘Expression’ line can
contain the variables represented by the ensemble, mathematical operators, and
any functions specified by the ‘Registered Functions’ drop-down list.
‘x0’ refers to the scalar value represented by the ‘X’ ensemble. In an ensemble
representing more than one dimension, the second variable is ‘x1’, the third is
‘x2’, and so forth. Returning to the preceding expression, it should be clear that it
is multiplying the value in the ‘X’ ensemble by itself, thus computing the squared
value.
• Drag from the ‘square’ origin to the ‘input’ termination to create a projec-
tion between the ‘X’ and ‘result’ ensembles.
The projection that has just been made completes the set up for computing a non-
linear transformation. The ‘X’ and ‘result’ ensembles have been connected to-
gether and Nengo automatically sets connection weights appropriately to match
the specified function (mathematical details can be found in section A.2 of the
appendix).
• Add a new constant function input named ‘input Y’ and project it to the
termination on ensemble ‘Y’.
CHAPTER 3. BIOLOGICAL COGNITION – SEMANTICS 133
• Now create a ‘default’ ensemble named ‘2D vector’ and give it 2 dimen-
sions.
• Create a termination named ‘x’ on the ‘2D vector’ ensemble. Set the di-
mension to 1, the weight matrix to [1, 0], and ‘tauPSC’ to 0.02.
• Create a termination named ‘y’ on the ‘2D vector’ ensemble. Set the di-
mension to 1, the weight matrix to [0, 1], and ‘tauPSC’ to 0.02.
• Connect the ‘x’ origins of the ‘X’ and ‘Y’ ensembles to the ‘x’ and ‘y’
terminations on ‘2D vector’ respectively.
The value from the ‘X’ ensemble will be stored in the first component of the 2D
vector and the value from the ‘Y’ ensemble will be stored in its second compo-
nent. At this point, there is no output for the network. We are going to represent
the product of the two components of the ‘2D vector’ ensemble in the ‘result’ en-
semble. To do this we cannot simply use the default ‘x’ decoded origin on the ‘2D
vector’ ensemble. We need to create an alternate decoding of the contents of ‘2D
vector’.
• Drag a new decoded origin onto the ‘2D vector’ ensemble. Name this origin
‘product’, give it 1 output dimension, and click Set Functions.
As mentioned earlier, x0 refers to the first component of the vector and x1 to the
second. Given this expression, Nengo generates decoders that yield an approxi-
mation of the two components multiplied together.
• Disconnect the ‘X’ and ‘result’ ensembles by dragging the ‘result’ termina-
tion away from the ensemble. To eliminate the hanging connection entirely,
right click it and select ‘remove’.
• Create a projection from the ‘product’ origin on ‘2D vector’ to the termina-
tion on ‘result’.
CHAPTER 3. BIOLOGICAL COGNITION – SEMANTICS 134
This completes the network. The ‘2D vector’ population receives scalar inputs
from two populations; stores these inputs separately as two components of a vec-
tor, and has its activity decoded as a nonlinear transformation in its projection to
the ‘result’ ensemble.
• Click the Interactive Plots icon, and show the ‘X’, ‘Y’, and ‘result’ values.
Show the ‘input’ and ‘input y’ sliders.
Move the sliders to convince yourself that the ‘result’ is the product of the in-
puts. For instance, recall that 0 times anything is 0, the product of two negative
numbers is positive, the product of opposite signs is negative, and so on. You
may have noticed that when computing 1 times 1, the answer is slightly too low.
This is because with the two inputs set to maximum, the 2D population becomes
saturated. This saturation occurs because the population represents a unit circle,
which
√ does
√ not include the point [1, 1]. The farthest diagonal point it includes is
[ 2/2 , 2/2]. To remove this saturation you need to increase the radius of the
2D population.
We can also examine the effects of changing the number of neurons on the
quality of the computation.
• In the Network Viewer, select the ‘2D vector’ ensemble, open the Inspector,
double-click ‘i neurons’, then change the number of neurons from 100 to
70.
Note that multiplication continues to work well with only 70 neurons in the vector
population, but does degrade as fewer neurons are used. I appeal to this fact in
later chapters when estimating the number of neurons required to compute more
complex nonlinear operations.
Chapter 4
135
CHAPTER 4. BIOLOGICAL COGNITION – SYNTAX 136
Consequently, it has been seen as a major challenge for neurally inspired archi-
tectures and theories of cognition to address the issue of how structure can be rep-
resented. For instance, Barsalou’s own perceptual symbol system theory, while
addressing semantics well, has been criticized because the theory (Edelman and
Breen, 1999, p. 614)
leaves the other critical component of any symbol system theory – the
compositional ability to bind the constituents together – underspeci-
fied.
This way, if we are given the final matrix and one of the vectors, it is clear how
we can recover the other vector.
The relevance of tensor products to binding was evident to Smolensky: if
we represent items with vectors, and structures with collections (sums) of bound
vectors, then we can use tensor products to do the binding, and still be able to “un-
bind” any of the constituent parts. Consider the previous example of chases(dog,
CHAPTER 4. BIOLOGICAL COGNITION – SYNTAX 138
boy). To represent this structure, we need vectors representing each of the con-
cepts and vectors representing each of the roles, which for this representation
includes (I write vectors in bold): dog, chase, boy, verb, agent, theme. In the
context of the SPA, each of these would be a semantic pointer.
To bind one role to one concept, we can use the tensor product:
b = dog ⊗ agent.
Now, the resulting vector b can be used to recover either of the original vectors
given the other. So, if we want to find out what role the dog is playing, we can
unbind it by inverting it and multiplying, just as in the scalar case:
b ⊗ dog−1 = agent.
A technical issue regarding what we mean by “inverting” a vector arises here.
Suffice it to say for present purposes that there is a good definition of the necessary
inversion called the pseudo-inverse, and I will return to this issue later in a related
context.
Given a means of binding and unbinding vectors, we also need a means of col-
lecting these bindings together to construct structured representations. Smolensky
adopted the standard connectionist approach of simply summing to conjoin vec-
tors. This results in a straightforward method for representing sentences. For
example, the structured vector representation of the proposition “The dog chases
the boy” is:
P = verb ⊗ chase + agent ⊗ dog + theme ⊗ boy.
In the case where all of the role vectors are linearly independent (no one vector
is a weighted sum of the others), decoding within this structure is the same as
before – just take the tensor product of P with the inverse of the role. These
tensor product representations are quite powerful for representing structure with
vectors as elements of the structure. They can be combined recursively to define
embedded structures, and they can be used to define standard LISP-like operations
that form the basis of many cognitive models, as Smolensky shows.
However, tensor products were never broadly adopted. Perhaps this is because
many were swayed by Fodor and McLaughin’s contention that this was “merely” a
demonstration that you could implement a standard symbolic system (Fodor and
McLaughlin, 1990). As such, it would be unappealing to symbolicists because
it just seemed to make models more complicated, and it would be unappealing
to connectionists because it seemed to lose the neural roots of the constituent
representations. For instance, Barsalou (1999, p. 643) comments that
CHAPTER 4. BIOLOGICAL COGNITION – SYNTAX 139
I suspect that both of these played a role. But there is also an important technical
limitation to tensor products, one which Smolensky himself acknowledges (1990,
p. 212):
He is referring here to the fact that the result of binding two vectors of lengths n
and m respectively, gives a new structure with nm elements. If we have recursive
representations, the size of our representations will grow quickly out of hand. That
is, they will not scale well. This also means that, if we are constructing a network
model, we need to have some way of handling objects of different sizes within
one network. We may not even know how “big” a representation will be before
we have to process it. Again, it is not clear how to scale a network like this.
I believe it is the scaling issue more than any other that resulted in sparse adop-
tion of Smolensky’s suggestion in the modeling community. However, modelers
began to attack the scaling problem head on, and now there are several meth-
ods for binding vectors that avoid the issue. For binary representations, there are
Pentti Kanerva’s Binary Spatter Codes or BSCs (Kanerva, 1994). For continu-
ous representations, there are Tony Plate’s Holographic Reduced Representations
or HRRs (Plate, 1994). And, also for continuous representations, there are Ross
Gayler’s Multiply-Add-Permute or MAP representations (Gayler, 1998). In fact,
all of these are extremely similar, with BSCs being equivalent to a binary ver-
sion of HRRs and both HRRs and BSCs being a different means of compressing
Smolensky’s tensor product (see figure 4.1).
Noticing these similarities, Gayler suggested that the term “Vector Symbolic
Architectures” be used to describe this class of closely related approaches to en-
coding structure using distributed representations (Gayler, 2003). Like Smolen-
sky’s tensor product representations, each VSA identifies a binding operation and
CHAPTER 4. BIOLOGICAL COGNITION – SYNTAX 140
A E AE AF AG AH
B x
F BE BF BG BH
1 =
C G CE CF CG CH
D H DE DF DG DH
A E AE
B F BF
2 x =
C G CG
D H DH
1 1 0
0 1 1
3 =
1 0 1
0 0 0
A E AE + BH + CG + DF
B F AF + BE + CH + DG
4 =
C G AG + BF + CE + DH
D H AH + BG + CF + DE
Figure 4.1: Various vector binding operations. (1) Tensor products are computed
as a multiplication of each element of the first vector with each element of the sec-
ond vector. (2) Piecewise multiplication, used in Gayler’s Multiply-Add-Permute
representation, multiplies each element of the first vector with one corresponding
element in the second vector. (3) Binary Spatter Codes (BSCs) combine binary
vectors using an exclusive or (XOR) function. (4) Circular convolution is the bind-
ing function used in Holographic Reduced Representations (HRRs). An overview
of circular convolution and a derivation of its application to binding and unbinding
can be found in section B.1.
CHAPTER 4. BIOLOGICAL COGNITION – SYNTAX 141
a conjoining operation. Unlike tensor products, the newer VSAs do not change
the dimensionality of representations when encoding structure.
There is a price for not changing dimensionality during binding operations:
degradation of the structured representations. If we constantly encode more and
more structure into a vector space that does not change in size, we must eventually
lose information about the original structure. Representations that slowly lose in-
formation in this manner have been called “reduced” representations, to indicate
that there is less information in the resulting representation than in the original
structures they encode (Hinton, 1990). Thus the resulting structured representa-
tions does not explicitly include all of the bound components: if they did, there
would be no information reduction. This information reduction has important
consequences for any architecture, like the SPA, that employs them.
For one, this means that such an architecture cannot be a classical architec-
ture. In short, the reason is that the representations in such an architecture are not
perfectly compositional. This is a point I will return to in chapter 10. A more tech-
nical point is that, as a result of losing information during binding, the unbinding
operation returns an approximation of the originally bound elements. This means
that the results of unbinding must be “cleaned-up” to a vector that is familiar to
the system. Consequently, use of a VSA imposes a functional demand for a clean-
up memory, i.e., a memory that maps noisy versions of allowable representations
onto those allowable representations. In the previous example, all of the vectors in
the structure (i.e., boy, dog, etc.) would be in the clean-up memory, so results of
unbinding could be “recognized” as one of the allowable representations. I return
to this important issue in section 4.5.
In any case, what is clear even without detailed consideration of clean-up, is
that there are limits on the amount of embedded structure that a reduced represen-
tation can support before noise makes the structure difficult to decode, resulting
in errors. As we will see, the depth of structure that can be encoded depends
on the dimensionality of the vectors being used, as well as the total number of
symbols that can be distinguished. Ultimately, however, I think it is this capac-
ity for error that makes the SPA psychologically plausible. This is because the
architecture does not define an idealization of cognitive behavior like classical ar-
chitectures tend to, but rather specifies a functional, implementable system that is
guaranteed to eventually fail. I take it that capturing the failures as well as the suc-
cesses of cognitive systems is truly the goal of a cognitive theory. After all, many
experiments in cognitive psychology are set up to increase the cognitive load to
determine what kinds of failure arise.
Now to specifics. In this book I adopt Plate’s HRR representations because
CHAPTER 4. BIOLOGICAL COGNITION – SYNTAX 142
they work in continuous spaces, as neurons do, and because he has established
a number of useful theoretical results regarding their capacity limitations (Plate,
2003). Still, most of the examples I provide do not depend on the VSA chosen. In
fact much work remains to be done on the implications of this choice. In partic-
ular, it would be interesting to determine in more detail the expected behavioral
differences between HRRs and MAPs. But, such considerations are beyond the
scope of the present discussion. It is worth noting, however, that the structure
of the SPA itself does not depend on the particular choice of a VSA. I take any
method for binding that depends on a piecewise (i.e., local) nonlinearity in a vec-
tor space to be able to support the main ideas behind the SPA. Nevertheless, it is
necessary to choose an appropriate VSA in order to provide specific examples.
In the remainder of this chapter, I turn to the consideration of implementing the
architectural components necessary to support HRR processing in a biologically
realistic setting. First, I consider binding and unbinding. I then turn to demonstrat-
ing how these operations can be used to manipulate (as well as encode) structured
representations, and I show how a spiking neural network can learn such manipu-
lations. I then return to the issue of implementing a clean-up memory in neurons,
and discuss some capacity results that suggest the SPA will scale appropriately.
Finally, I describe a detailed SPA model that performs structural inference, mim-
icking human performance in the Raven’s Progressive Matrices test of general
fluid intelligence.
C = A ~ B = F−1 (FA.FB)
where “.” is used to indicate element-wise multiplication of the two vectors. The
matrix F is a linear transformation that is the same for all vectors of a given dimen-
sion. The resulting vector C has the same number of dimensions as A and B. The
network that computes this binding is simply a standard, two-layer feedforward
network (see figure 4.2).
Because we have already seen examples of both linear transformation and
the multiplication of two numbers, it is straightforward to build this network in
Nengo. The tutorial at the end of this chapter gives detailed instructions on build-
ing a binding network to encode structured representations (section 4.8). Figure
4.3 shows the results of binding various ten-dimensional vectors using the circular
convolution method.
To unbind vectors, the same operation can be used. However, as explained
earlier we need to compute the inverse of one of the bound vectors to get the
other. Conveniently, for HRRs a good approximation to the inverse can be found
by a simple linear transformation (see appendix B.1). Calling this transformation
S, we can write the unbinding of any two vectors as
Recall that the result is only approximate and so must be cleaned-up, as I discuss
in more detail shortly (section 4.5).
From a network construction point-of-view, we can redeploy exactly the same
network as in the binding case, and simply pass the second vector through the
transformation S before presenting it to the network (this amounts to changing the
weights between the B and Bind layers in figure 4.2). Figure 4.4 demonstrates, us-
ing the output of the network in figure 4.3, that vectors can be effectively unbound
in this manner.
It is perhaps unsurprising that we can implement binding and unbinding in
neural networks. However, this does not allay concerns, like those expressed by
Barsalou, that this is merely a technical trick of some kind that has little psycho-
logical plausibility. I believe the psychological plausibility of these representa-
tions needs to be addressed by looking at larger scale models, the first of which
we will see later in this chapter (section 6.6). There I show how this operation is
able to capture psychological performance on a fluid intelligence test. Additional
examples are presented in later chapters.
CHAPTER 4. BIOLOGICAL COGNITION – SYNTAX 144
Bind
0 .2
0 .2 Time (s)
0 .2
Time (s) 0 .2
Time (s)
A B C=A B
1 1 1
0 0 0
−1 −1 −1
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
A A’ ≈ C B -1 B B’ ≈ C A -1
1 1 1 1
0 0 0 0
−1 −1 −1 −1
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
Figure 4.4: Unbinding vectors with neurons. The reconstructed vectors A and
B were obtained by using the transformations A ≈ C ~ B−1 and B ≈ C ~ A−1
respectively. This figure is using the same initial vectors and graphing methods as
figure 4.3 and the original A and B vectors have been reproduced alongside the
reconstructed vectors for ease of comparison. In the graphs of the reconstructed
vectors A0 and B0 , black circles indicate the original average values of the vectors
and the dotted lines indicate the decoding error.
processing then occurs at the soma, similar to the standard linear model, where the
input is a weighted summation of the subunit activities that determines the neu-
ron’s activity. This two-stage process has been observed in a study of subthresh-
old synaptic integration in basal dendrites of neocortical cells, where stimulation
within a single terminal dendritic branch evoked a sigmoidal subunit nonlinearity,
but the activity summed linearly when different terminal branches were stimulated
(Schiller et al., 2000). In effect, this provides evidence that a single neuron may
have the processing power of a two-layer linear network (although the connectiv-
ity is more restricted).
If our binding networks employed such nonlinearities, we could eliminate the
middle layer currently used to compute the nonlinearity. Consequently, the bind-
ing networks would use significantly fewer neurons than in the examples I con-
sider here. So, the examples I present are worst case scenarios for the number
of neurons that are needed to generate structured representations. In section 5.5 I
consider a model that uses nonlinear neurons.
The second concern regarding neural plausibility is connectivity. That is, we
know that in general cortical circuits are fairly locally and sparsely connected
(Song et al., 2005; Hellwig, 2000; Lund et al., 1993). Do these binding networks
respect this constraint? To begin, we need to be more accurate about what we
mean by locally connected. In their anatomical work that involves microinjec-
tions of a tracer across several areas of monkey cortex, the Levitt group has found
that small injections of 100µm (i.e., 0.1mm) spread to areas of approximately
3mm by 3mm or slightly larger (Lund et al., 1993). There is an interesting struc-
ture to this projection pattern, in which there are patches of about 300-400µm in
diameter separated by spaces of a similar size. These patches correspond well to
the sizes of dendritic fields of the pyramidal cells in these areas. Given that there
are 170,000 neurons per mm2 of cortex,1 this corresponds to about 85,000 neurons
within the needed distance for connectivity to a given cell. Consequently, a sin-
gle neuron could easily be connected to any other neurons in a network of about
85,000 neurons. So “local” connectivity seems seems to be local within 400µm
or about 85,000 neurons, with sparser connections between such patches. This
pattern of connectivity has been found throughout visual, motor, somatosensory
and prefrontal cortex, and across monkeys, cats, and rodents.
The plausibility of connectivity is closely tied to the third issue of scaling. To
1 This estimate is based on the data in Pakkenberg and Gundersen (1997) which describes
differences in density across cortical areas, as well as the effects of age and gender. This value
was computed from Table 2 for females, using a weighted average across cortical areas.
CHAPTER 4. BIOLOGICAL COGNITION – SYNTAX 148
determine if a binding network can be plausibly fit within this cortical area, it is
important to know what the dimensionality of the bound vectors is going to be. It
is also important to know how the size of the binding network changes with di-
mensionality. I discuss in detail the dimensionality of the bound vectors in section
4.5, because the necessary dimensionality is determined by how well we can clean
up vectors after binding. There I show that we need vectors of about 500 dimen-
sions in order to capture large structures with an adult-sized vocabulary. It is also
clear that the binding network itself will scale linearly with the number of dimen-
sions because the only nonlinearity needed is the element-wise product. Thus, one
additional dimension means computing one additional product. Recall from sec-
tion 3.8 that 70 neurons per multiplication results in good quality estimates of the
product, and two multiplications are needed per dimension. Taken together, these
considerations suggest that about 70,000 neurons are needed to bind two vectors.
Coupled with the previous local connectivity constraint, this suggests that binding
networks of the appropriate size to capture human-like structured representations
can be constructed in this manner.
Recalling that there are about 170,000 neurons per mm2 of cortex suggests
that we need about 0.7 mm2 of cortex to perform the necessary binding. The
projection data suggests that local projections cover at least 9 mm2 of cortex, so
the binding layer can fit comfortably within this area. If we want to include the
two input populations, the binding layer and the output population with similar
assumptions, we would need at most 2 mm2 of cortex. Recall that the unbinding
network requires the same resources. Consequently, these networks are consistent
with the kind of connectivity observed in cortex. As well, the architecture scales
linearly so this conclusion is not highly sensitive to the assumptions made regard-
ing the dimensionality of the representations. It is worth noting that if we allow
dendritic nonlinearities, the binding layer is not needed, so the entire network
would require less cortex still.
Together, these considerations suggest that the vector binding underlying the
SPA can be performed on the scale necessary to support human cognition in a
neurally plausible architecture. I return to considerations of the neural plausibility
of the SPA in chapter 9.4.
That is, we now have a representation of hug(boy, dog). This occurred because
the first term in T converts whatever was the agent into the theme by removing the
agent vector (since agent0 ~ agent ≈ 1) and binding the result with theme. The
last term works in a similar manner. The second term replaces chase with hug,
again in a similar manner. The noise term is a collection of the noise generated by
each of the terms.
For a demonstration of the usefulness of such manipulations, we can con-
sider one of the standard tasks for testing cognitive models: question answering.
Figures 4.5 and 4.6 show a network of 10,000 neurons answering four different
CHAPTER 4. BIOLOGICAL COGNITION – SYNTAX 151
A) The dog chases the boy. B) The dog chasing the boy C) The dog chasing the boy D) The big star is beside the little star
What was the action? caused the boy to fall. caused the boy to fall. What is beside the big star?
What caused the boy to fall? Who fell?
A B A B A B A B
C C C C
questions about different sentences. For each sentence and question, a different
input is provided to the same network for 0.25 seconds. In figure 4.6, the results
of clean up are shown over time, so the similarity between the output of the net-
work and possible responses is plotted. The system is considered to have given a
correct response if its output is more similar to the correct response than any other
possible response.
To understand these results, let us consider each question in more detail. The
first case is nearly identical to the first transformation we considered above. In-
stead of determining the role of chase, the network determines what the action of
the sentence is.
Turning to the second question, we see that the sentence has more complex,
embedded structure. To capture such structures, we need a way to map natural
language sentences onto semantic pointers. For the second case in figure 4.5, the
roles “agent” and “theme” are multiply instantiated. Consequently, roles need
to be tagged with the verb they are roles for, so the items they are bound to do
not get confused. For instance, if the roles were not tagged, it would be difficult
to distinguish whether dog was the agent of chase or of fall. This kind of role
CHAPTER 4. BIOLOGICAL COGNITION – SYNTAX 152
1.0
chased: 0.46 verb chase + boy: 0.28 star little: 0.32
agent chase dog +
0.8 theme chase boy :
Similarity to Possible Responses
0.40
0.6
0.4
0.2
0.0
-0.2
Figure 4.6: The network answering four questions over time. Each question is
presented for 0.25 seconds by changing the input to the neural groups represent-
ing vectors A and B in figure 4.5. The resulting spiking behavior of neural group
C is interpreted as a semantic pointer and compared to the possible responses.
This plot shows the similarity of the vector C to various possible responses (i.e.,
the top line in the plot represents the output of a clean-up memory). Dark gray
through medium gray lines indicate the similarity to the four correct responses.
The lightest gray lines are the similarities of the output with 40 randomly cho-
sen other semantic pointers in the lexicon, representing other possible responses.
Since the output is closer to the correct response than any other answer in each
case, this network successfully answers all of the questions.
CHAPTER 4. BIOLOGICAL COGNITION – SYNTAX 154
well be false. Nevertheless, induction is perhaps the most common kind of rea-
soning used both in science, and in our everyday interactions with the world.
In the language of the SPA, performing induction is equivalent to deriving a
transformation that accounts for the mapping between several example structure
transformations. For example, if I tell you that “the dog chases the boy” maps to
“the boy chases the dog”, and that “John loves Mary” maps to “Mary loves John”
and then provide you with the following structure: “the bird eats the worm”; you
would probably tell me that the expected mapping would give “the worm eats the
bird” (semantic problems notwithstanding). In solving this problem, you used
induction over the syntactic structure of the first two examples to determine a
general structural manipulation that you then apply to a new input. We would like
our architecture to do the same.
Intuitively, to solve the above example we are doing something like gener-
ating a potential transformation given information only about the pre- and post-
transformation structures. We may be able to do this with one example, but the
second example can help confirm our initial guess, by ensuring we would derive
the same transformation provided that information. In short, we are making sure
that if we were given just the pre-transformation structure, we could generate the
post-transformation structure. One way to characterize this process is to say we
are minimizing the error between our application of the transformation and the
post-transformation structure.
Neumann (2001) presented an early investigation into the possibility of learn-
ing structural transformations using a VSA. She demonstrated that with a set of
example transformations available, a simple error minimization would allow ex-
traction of a transformation vector. In short, she showed that if we had examples
of the transformation, we could infer what the transformation vector is by incre-
mentally eliminating the difference between the examples and the transformations
we calculated with our estimate of T (see appendix B.2).
A simple extension to her original rule that allows the transformation to be
calculated on-line is the following (Eliasmith, 2004):
Ti+1 = Ti − wi Ti − A0i ~ Bi
where i indexes the example, T is the transformation vector we are trying to learn,
w is a weight that determines how important we take the current example to be
compared to past examples, and A and B are the pre- and post-transformation
structured vectors. This rule essentially estimates the transformation vector that
would take us from A to B, and uses this to update our current guess of the trans-
formation. Specifically, the rule computes the convolution between the inverse of
CHAPTER 4. BIOLOGICAL COGNITION – SYNTAX 156
pointer A and be able to correctly map any vector extracted from it to a valid lexi-
cal item. For present purposes, we can suppose that A is the sum of some number
of “role-filler” pairs. For example, there are three such pairs in the previously
discussed encoding of “The dog chases the boy.”
Algorithmically, decoding any such structure would consist in:
2. Measuring the similarity of the result with all allowable items; and
3. Picking the item with the highest similarity and returning that as the cleaned
up result.
Perhaps the most difficult computation in this algorithm is measuring the similar-
ity with all allowable items. This is because the many-to-one mapping between
noisy input and clean output can be quite complicated.
So, unfortunately, many simple auto-associators, including linear associators
and multilayer perceptrons, do not perform well (Stewart et al., 2010b). These
are simple because they are feedforward: the noisy vector is on the input, and
after one pass through the network, the cleaned up version is on the output. Better
associators are often more complicated. The Hopfield network, for instance, is a
dynamical system that constantly feeds its output back to the input. Over time, the
system settles to an output vector that can be considered the cleaned up version of
an input vector. Unfortunately, such recurrent networks often require several iter-
ations before the results are available, slowing down overall system performance.
Ideally, we would like a clean-up memory that is both fast, and accurate.
To build such a memory, we can exploit two of the intrinsic properties of
neurons described in section 2.1, and demonstrated in the tutorial on neural repre-
sentation (section 2.5). The first is that the current in a neuron is the dot-product
of an input vector with the neuron’s preferred direction in the vector space. The
dot product is a standard way of measuring the similarity of two vectors. So,
a similarity measure is a natural neural computation. The second property is that
neurons have a non-linear response, so they do not respond to currents below some
threshold. This means they can be used to compute nonlinear functions of their
input. Combining a similarity measure and a nonlinear computation turns out to
be a good way to build a clean-up memory.
Specifically, for each item in the clean-up memory, we can set a small number
of neurons to have a preferred direction vector that is the same as that item. Then,
if that item is presented to the memory, those neurons will be highly active. And,
CHAPTER 4. BIOLOGICAL COGNITION – SYNTAX 158
for inputs near that item, these neurons will also be somewhat active. How near
items must be to activate the neurons will depend on the firing thresholds of the
neurons. Setting these to be slightly positive in the direction of the preferred
direction vector will make the neuron insensitive to inputs that are only slightly
similar. In effect, the inherent properties of the neurons are being used to clean up
the input.
Figure 4.7 shows how this neural clean-up memory scales. The parameters
affecting how good the clean-up is are: k, the number of role-filler items in the
input vector; M, the number of valid lexical items to compare to; and D, the di-
mensionality of the vector space (i.e., the dimensionality of the semantic pointer).
In the simulations shown here, performance for values up to k = 8 are shown be-
cause George Miller’s (1956) classic standard limits of seven plus or minus two on
working memory fall in about the same range. The graphs show M up to a value
of 100,000 as this is well over the size of an adult lexicon (Crystal, 2003, sug-
gests 60,000 terms). The plot tracks the dimensionality of the vectors that allow
for 99% clean-up accuracy over these ranges of k and M.
Notably, this clean-up memory is also very fast. The network is purely feed-
forward, and so clean-up occurs on the time-scale of the neurotransmitter used
in the network. For most excitatory connections in cortex, this is on the order of
about 5ms (i.e., for AMPA receptors).
So, what conclusions can we draw about the SPA with such a model? While
it would be difficult to argue that we are in a position to definitively set a “maxi-
mum” number of dimensions for semantic pointers in the SPA, these simulations
provide good evidence that the architecture is scalable to within the right neigh-
borhood. Because this model is constructed with 10 neurons per lexical item, we
need approximately 600,000 neurons to implement the clean-up memory model
described here for an adult-sized lexicon. This works out to about 3.5mm2 of
cortex, which fits well within the connectivity patterns discussed in section 4.2.2.
In addition, the connectivity matrix needed to implement this memory is of the
same dimensionality as in the binding network, and so it respects the anatomical
constraints discussed there as well.
When coupled with the previous discussion regarding the neural plausibil-
ity of the binding network (section 4.2.2), there is a strong case to be made that
with about 500 dimensions, structure representation and processing of human-
like complexity can be accomplished by a SPA. This is, we have good reason to
suppose we have appropriate tools for constructing a model able to perform so-
phisticated structure processing in a biologically plausible manner. We will see an
example of such a model in the next section.
CHAPTER 4. BIOLOGICAL COGNITION – SYNTAX 159
500
Number of Dimensions (D)
400
k=8
300
k=6
k=4
200
k=2
100
Before doing so, however, it is worth considering a few issues related to clean-
up memory in the SPA. In general, the SPA is not very specific with respect to
the anatomical location or number of clean-up memories we should expect to
find in the brain (see section 10.2). The calculations above suggest they may be
ubiquitous throughout cortex, but perhaps many structural manipulations can be
performed before clean-up is needed, which would be more efficient. As well,
construction of clean-up memories may be a slow process, or a rapid one, or,
I suspect most likely, both. That is, it makes sense to have long-term clean-up
memories that account for semantic memory, as well as clean-up memories con-
structed on-the-fly that are able to track the current inferential context. It is well
possible, for instance, that areas of the brain known for rapid learning, such as the
hippocampus, may play a central role in helping us to navigate through complex
structure processing. The considerations in chapter 6 on memory and learning
speak to general problems related to learning, so I will not consider these further
here.
However, I would like to address a concern that has been expressed about
any attempt to choose a fixed number of dimensions for semantic pointers. The
concern is that we have no good reasons to expect that all representations will
be of the same dimension. Perhaps some simple representations may not be of
sufficiently high dimension, or that perhaps perceptual and motor systems work
with vectors that are of very different dimensions. Or, perhaps, if we have a
“maximum” number of dimensions, the system simply stop working if we go
over that number.
There are two kinds of response to these concerns, theoretical and practical.
On the theoretical side, we can note that any vectors with lower than the maximum
number of dimensions can be perfectly represented in the higher-dimensional
space. In mathematical terms, note that vector spaces with smaller numbers of
dimensions are subspaces of vector spaces with more dimensions. Hence, there
is nothing problematic about embedding a lower-dimensional space in a higher-
dimensional one.
In the opposite case, if there are more dimensions in a representation than in
the vector space, we can simply choose a random mapping from the representation
to the vector space (even truncation often works well). There are more sophisti-
cated ways of “fitting” higher-dimensional spaces into lower dimensional ones
(such as performing singular-value decomposition), but it has been shown that
for high-dimensional spaces, simple random mappings into a lower-dimensional
space preserve the structure of the higher-dimensional space very well (Bingham
and Mannila, 2001). In short, all or most of the structure of “other dimensional”
CHAPTER 4. BIOLOGICAL COGNITION – SYNTAX 161
spaces can be preserved in a high-dimensional space like we have chosen for the
SPA. Vector spaces are theoretically quite robust to changes in dimension.
On the more practical side, we can run explicit simulations in which we dam-
age the simulation (thereby not representing the vector space well, or losing di-
mensions), or add significant amounts of noise, which distorts the vector space
and can effectively eliminate smaller dimensions (as measured by the singular
values). Performing these manipulations on the binding network described earlier
(specifically, the third example in figure 4.5) demonstrates significant robustness
to changes in the vector space. For instance, after randomly removing neurons
from the binding population in the network, accurate performance occurs even
with an average of 1,221 out of 4000 neurons removed (see figure 6.20).
As well, the discussion surrounding figure 6.20 shows that the network is ac-
curate when up to 42% Gaussian noise is used to randomly vary the connection
weights. Random variation of the weights can be thought to reflect a combination
of imprecision in weight maintenance in the synapse, as well as random jitter in
incoming spikes. Consequently, such simulations speak both to concerns about
choosing a specific number of dimensions, and to concerns about the robustness
of the system to expected neural variability.
Overall, then, the components of the SPA allow for graceful degradation of
performance. This should not be too surprising. After all, previous considerations
of neural representation make it clear that there is a smooth link between the num-
ber of neurons and the quality of the representation (see section 2.5). As well, we
have just seen that there is a smooth relationship between the quality of a clean-up
memory, the number of dimensions, and the complexity of the representation for
a fixed number of neurons. Not surprisingly, there are similarly smooth relation-
ships between dimensionality and complexity in ideal VSAs (Plate, 2003). Since
the SPA combines VSAs with neural representation, it is also not surprising that
it inherits the kinds of graceful degradation characteristic of these approaches.
Figure 4.8: Example Raven’s Progressive Matrix (RPM). A subject must examine
the first two rows (or columns) of the matrix to determine a pattern. The task
is pick one of the eight possible answers shown along the bottom to complete
that pattern for the third row (or column). Matrices used in the RPM cannot be
published, so this is an illustrative example only.
the world. Most tests that have been designed to measure general intelligence,
focus on fluid intelligence. One of the most widely used and respected tools for
this purpose is the Raven’s Progressive Matrices (RPM; Raven, 1962).
In the RPM subjects are presented with a 3 x 3 matrix where each cell in the
matrix contains various geometrical figures, with the exception of the final cell,
which is blank (figure 4.8). The task is to determine which of eight possible an-
swers most appropriately belongs in the blank cell. In order to solve this task,
subjects must examine the contents of all of the other cells in the matrix to de-
termine what kind of pattern is there, and then use that pattern to pick the best
answer. This, of course, is an excellent example of induction, which I mentioned
earlier in section 4.4.
While the RPM is an extremely widely used clinical tests, and many experi-
ments have been run using the test, our understanding of the processes underlying
human performance on this task is minimal. To the best of my knowledge, there
have been no cognitive models of the RPM that include the inductive process of
rule generation. The best-known model of RPM includes all of the possible rules
that the system may need to solve the task (Carpenter et al., 1990). In solving the
task, the model chooses from the available rules, and applies them to the given
matrix. However, this treats the RPM like a problem that employs crystallized
intelligence, which contradicts its acceptance as a test of fluid intelligence (Prab-
hakaran et al., 1997; Gray et al., 2003; Perfetti et al., 2009).
Recently, however, a graduate student in my lab named Daniel Rasmussen
CHAPTER 4. BIOLOGICAL COGNITION – SYNTAX 163
0.6
0.4
0.2
Similarity
0
−0.2
−0.4
−0.6
1 2 3 4 5 6 7 8
Answers
Figure 4.9: A simple induction RPM example. On the left is a simple RPM ma-
trix in which the number of items increases by one across each row. The model is
presented with consecutive pairs of cells, and infers a transformation. That trans-
formation is applied to the last cell and the similarity to each answer is computed.
This similarity is shown on the right, indicating that the model correctly identifies
the first answer (i.e., three triangles) as the correct one.
provided the pairwise examples in the order they tend to be read by subjects (i.e.,
across rows, and down columns; Carpenter et al. (1990)). As the examples are
presented, the inferred transformation is built up out of the average of each of the
pairwise transformations. That inferred transformation is then be applied to the
second last cell in the matrix. This results in a vector which is compared to each
of the possible answers, and the most similar answer is chosen as the correct one
by the model.
Comparing the model across many runs to average human performance on
all the matrices that include an induction rule (13 out of 36 examples), humans
score about 76% on average (Forbes, 1964), and the model scores 71% on average
(chance is 13%). So the model is performing induction at a similar level to human
subjects.
More generally, the model can account for several other interesting experi-
mental observations on the RPM. For instance, subjects improve with practice if
given the RPM multiple times (Bors, 2003), and also show learning within the
span of a single test (Verguts and De Boeck, 2002). As well, a given subject’s per-
formance is not deterministic; given the same test multiple times, subjects will get
previously correct answers wrong and vice versa (Bors, 2003). In addition, there
CHAPTER 4. BIOLOGICAL COGNITION – SYNTAX 165
are both qualitative and quantitative differences in individual ability; there is vari-
ability in “processing power” (variously attributed to working memory, attention,
learning ability, or executive functions), but there are also consistent differences in
high-level problem-solving strategies between low-scoring and high-scoring indi-
viduals (Vigneau et al., 2006). This is not an exhaustive list, but it represents some
of the features that best characterize human performance, and all of these features
are captured by the full RPM model (Rasmussen, 2010).
That being said, the theoretically most important feature of the model for my
purposes is its ability to successfully perform induction and extract rules. How-
ever, simply choosing the best of the eight possible answers might not convince us
that it has in fact found a structural regularity. Much more convincing, is determin-
ing how the learned transformation applies in circumstances never encountered by
the model during training. Figure 4.10 shows two examples of this kind of appli-
cation. In the first instance, the model is asked to apply the learned transform
to four triangles. As is evident from figure 4.10a, the model’s preferred answer
corresponds to five triangles. This suggests the inferred transformation encodes
the correct “increase by one” aspect of the rule, at least in the context of trian-
gles. Figure 4.10b shows that, in fact, the “increase by one” is generic across
objects. In this case, the rule is applied to four squares, and the preferred result
is five squares. In short, the model has learned the appropriate counting rule, and
generalized across objects.
This is a simple, but clear, example of what is sometimes called “syntactic
generalization”. Syntactic generalization is content insensitive generalization; i.e.,
driven by the syntactic structure of the examples. It is important to note that
precisely this kind of generalization has been often been claimed to distinguish
cognitive from non-cognitive systems (Fodor and Pylyshyn, 1988a; Jackendoff,
2002; Hummel and Holyoak, 2003). In this case, it is clear that the generalization
is content insensitive because “increase by one” is applied for squares, triangles,
or what have you. I provide more examples of syntactic generalization in section
6.6.
It might not be too surprising that in order to perform cognitively interesting
induction, one has to perform syntactic generalization. What is more interesting,
is that once we have an appropriate representational system (semantic pointers and
binding), the method for performing such syntactic induction is quite simple (i.e.
averaging over examples), and can be built into a biologically realistic network.
In addition, the same basic architecture is able to capture the other two kinds of
reasoning usually identified in cognitive science: abduction and deduction. In pre-
vious work, I showed that the approach underlying the SPA can be used to capture
CHAPTER 4. BIOLOGICAL COGNITION – SYNTAX 166
Similarity to Output
Similarity to Output
3 0.2 0.2
4
0 0
5
−0.2 −0.2
6
7 −0.4 −0.4
8
−0.6 −0.6
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
Option Option
Figure 4.10: Broader application of the rule induced from figure 4.9. a) Applied
to four triangles, the model chooses five triangles as the next item in the sequence.
b) Applied to four squares the model chooses five squares as the next item in the
sequence.
very natural, and very powerful. It is powerful because the SPA representations
allow whatever changes across examples to become “noise”, and whatever stays
consistent to be reinforced. Crucially, this is true for both content and syntactic
structure.
One final note, which strikingly distinguishes this SPA implementation from
standard connectionist approaches, is that there are no connection weight changes
in this model. This is true despite the fact that it learns based on past experience,
and is able to successfully generalize. I return to this observation in my discussion
on learning in chapter 6.
a network that was anatomically plausible. Certainly, more than 8 roles will be
needed in order to include all of these kinds of information in a single semantic
pointer. How then are we going to be able to encode sophisticated concepts in a
biologically realistic manner using the resources of the SPA?
I would like to propose a solution to this problem, that turns out to allow
the architecture to scale well even for complex structures. Namely, rather than
constraining ourselves to a single clean-up memory for decoding, we can chain
memories together. In short, I am suggesting that we can perform successive de-
codings in different clean-up memories. This would result in a chain of decoding,
where we perform initial decoding in one memory, take the results of that decod-
ing, and further decode those results a second memory. This process could then
be repeated several times. At each stage of such a chain, the vector to be decoded
comes from the clean-up of the previous stage, ensuring that full capacity of the
current stage is available for further decoding of the semantic pointer. As a result,
the effective capacity will scale as the power of the number of stages in such a
chain, but each stage will only require a linear increase in the number of neurons.
Let us consider a concrete example.
Consider a simple example of how the “dog” semantic pointer might be con-
structed. It might include role-filler pairs for a wide variety of information, as
described earlier. Suppose for this simple example that there are only three roles.
The representation of this pointer might then look something like:
Suppose that with a 100-dimensional semantic space, we can decode any elements
in a three role representation with 99.9% accuracy. If we query our “dog” concept,
wanting to know what it is (i.e., determining the filler in the “isA” role) we will
discover that a dog is a mammal. That is, the result of this operation will be a
clean version of the “mammal” semantic pointer.
We might then decode this new semantic pointer with a second memory in our
chain. Suppose that this semantic pointer is the sum of three other role-filler pairs,
perhaps
We could then decode any of the values in those roles also with 99.9% accuracy.
That is, we could determine that a mammal is an animal, for instance. And so
the chaining might continue with the results of that decoding. What is interest-
ing to note here is that in the second memory we could have decoded any three
CHAPTER 4. BIOLOGICAL COGNITION – SYNTAX 170
role-filler pairs from any of the three role-filler pairs in the memory earlier in the
chain. So, effectively, our original concept can include nine role-filler pairs that
we would be able to decode with near 100% accuracy (the expected accuracy is
99.9%*99.9%=99.8%).
In short, we can increase the effective number of role-filler pairs by chaining
the spaces. As figure 4.11 shows, the scaling of a chained space significantly
outperforms an un-chained space. It is clear from this figure that even with the
same total number of dimensions, an un-chained space can not effectively decode
very many pairs (the number of pairs is approximately linear in the number of
dimensions). In contrast, the number of decodable pairs in the chained space is
equal to the product of the number of decodable pairs in each space separately. In
this example it means we can effectively decode 150 instead of 20 role-filler pairs
with the same number of dimensions.
We can project the expected accuracy for much larger sets of pairs by turning
to our earlier consideration of clean-up memory. For instance, we know that we
can clean-up in a 500 dimensional space about 8 pairs from a vocabulary of 60,000
terms with 99% accuracy using the number of neurons consistent with observed
connectivity (see figure 4.7). If we chain 4 such memories together, we would be
able to decode 84 = 4096 pairs with .994 = .96, or 96%, accuracy. This kind of
scaling makes earlier concerns about encoding sophisticated concepts much less
pressing.
Why do we get such good scaling? In short, it is because we have introduced
a nonlinearity (the clean-up) into our decoding process that is not available in a
single memory. That nonlinearity acts to align the output of one memory with the
known values in the next semantic space in the chain. It may seem that this en-
forces a kind of hierarchy on our conceptual structure, but it does not. The reason
I use the word “chain” instead of “hierarchy” is because there are many kinds of
chains that could be implemented in this way, only some of which are hierarchies.
For instance, the second memory in a two-level chain could be the same as the
first (a recursive chain). Or, each possible role could have its own dedicated mem-
ory for decoding items of that role-type (a strict hierarchy). It is unclear which
of these, or other, kinds of structure will be most appropriate for characterizing
human-like semantic spaces. It is intriguing, however, that a hierarchy is a very
natural kind of chain to implement, and that there is some evidence for this kind
of structure to human concepts (Collins and Quillian, 1969). As well, parsimony
makes it tempting to assume that because of the hierarchical nature of percep-
tual and motor systems (also used for capturing deep semantics), cognition will
have a similar hierarchical structure. However, mapping any particular structure
CHAPTER 4. BIOLOGICAL COGNITION – SYNTAX 171
a) Scaling of Accuracy with Encoded Pairs b) Scaling of Accuracy with Encoded Pairs Chained
1
0.9 1
0.8
0.8
0.7
Percent Correct
Percent Correct
0.6
0.6
0.5
0.4 0.4
0.3
0.2 0.2
0.1
0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80
Number of Bound Pairs Number of Bound Pairs
Percent Correct
0.7 0.6
0.5
0.6
0.4
0.5 0.3
0.2
0.4
0.1
0
0 10 20 30 40 50 60 70 80 100 150 200 250 300 350 400
Number of Bound Pairs Number of Bound Pairs
carefully to data on human concepts is beyond the scope of this book (see, for
example, Rogers and McClelland (2004) for interesting work along these lines).
Regardless of how a specific mapping turns out, in general I believe chained
semantic spaces provide a crucial demonstration of how syntax and semantics can
be used together to represent very large, intricate conceptual spaces in ways that
can be plausibly be implemented in the brain. These semantic spaces are tied
directly to perceptual and motor experiences as described in the previous chapter,
and can also be used to encode temporal transformations, language-like structures,
and lexical relationships. Furthermore, we can transform these representations in
structure-sensitive ways, and even learn to extract such transformations from past
examples. The stage is now set for considering how to build a system which itself
uses, transforms, and generates semantic pointer representations.
• Create a ‘default’ ensemble in the network, name it ‘A’, give it 300 neurons,
and make it 20 dimensions.
• Open the network with Interactive Plots. Right-click the ‘A’ population and
click ‘value’. Right-click the population again and click ‘semantic pointer’.
You should now have two graphs.
Displaying the ‘value’ graph in Interactive Plots shows the value of individual
components of this vector.5 The ‘semantic pointer’ graph compares the vector
represented by the ensemble to all of the elements of the vocabulary, and displays
their similarity. Initially, the vocabulary contains no vectors and thus nothing
is plotted in the semantic pointer graph. As well, since there is no input to the
population, the value graph shows noise centered around zero for all components.
• Right-click the semantic pointer graph and select ‘set value’. Enter ‘a’ into
the dialog that appears and press OK.
Using the ‘set value’ option of the semantic pointer graph does two things: 1) if
there is a named vector in the set value dialog that is not part of the vocabulary,
it adds a randomly chosen vector to the vocabulary of the network and associates
the given name (e.g., ‘a’) with it; and 2) it makes the represented value of the
ensemble match the named vector. The result is that the ensemble essentially
acts a constant function that outputs the named vocabulary vector. As a result,
when the simulation is run, the semantic pointer graph plots a 1, because the
representation of the ensemble is exactly similar (i.e., equal) to the ‘a’ vector.
• Right-click the ‘semantic pointer’ graph and select ‘set value’. Enter ‘b’
into the ‘Set semantic pointer’ dialog and press OK.
• Switch between setting ‘a’ and setting ‘b’ on the semantic pointer graph
while the simulation is running.
Setting the value of the semantic pointer to ‘b’ adds a second, random 20-dimensional
vector to the vocabulary. The ‘value’ plot reflects this by showing that the neural
ensemble is driven to a new vector. Switching between the ‘a’ and ‘b’ vectors,
it should be clear that although the vectors were randomly chosen initially, each
vector is fixed once it enters the vocabulary. The semantic pointer graph changes
to reflect which vocabulary item is most similar to the current representation in
the ensemble.
5 By default, the plot shows five components – to change this you can right-click the ‘value’
graph and choose ‘select all’ but be warned that plotting all twenty components may cause the
simulation to be slow. Individual components can be toggled from the display by checking or
unchecking the items labeled ‘v[0]’ to ‘v[19]’ in the right-click menu.
CHAPTER 4. BIOLOGICAL COGNITION – SYNTAX 174
These are the vocabulary items that can be combined to form structured rep-
resentations. To experiment with the binding (i.e., convolution) and conjoin (i.e.,
sum) operations, we require additional ensembles.
• In the ‘Structured Representation’ network, add a ‘default’ ensemble named
‘B’ with 300 neurons and 20 dimensions.
• Add a third ensemble named ‘Sum’ with the same parameters.
• Add a decoded termination to the ‘Sum’ ensemble. Name the termination
‘A’ and set the number of input dimensions to 20, and ‘tauPSC’ to 0.02.
• The coupling matrix of the termination should be set to an identity matrix
by default (a grid of zeros except for a series of ones along the diagonal).
Click Set Weights to make sure, and keep this default. Click OK twice.
• Add a second decoded termination to the ‘Sum’ ensemble. Name the termi-
nation ‘B’ and set all other parameters as for the ‘A’ termination.
• Add projections from ensembles ‘A’ and ‘B’ to the ‘Sum’ ensemble.
• Add a fourth ensemble to the network named ‘C’ with the same parameters
as all of the previous ensembles.
• Drag the icon for the ‘Binding’ template from the bar on the left side of the
screen into the network viewer for the ‘Structured Representation’ network.
• In the dialog box that is opened, enter the name ‘Bind’, use ‘C’ as the name
of the output ensemble, and set 60 neurons per dimension. The two inver-
sion options should not be checked.
• Project the ‘A’ and ‘B’ ensembles to the ‘Bind’ network that was created.
The ‘Sum’ ensemble should be familiar from the previous tutorial about trans-
formations (section 3.8). However, the ‘Bind’ network uses a Nengo template to
construct a subnetwork, which we have not done before. Nengo can create subnet-
works by placing one network inside another and exposing selected terminations
and origins of the inner network to the outer network. This is a useful technique
for organizing a model, and it is used here to group the ensembles required to
compute a binding operation within a single subnetwork element.
In general, the template library allows common network components to be
created quickly and with adjustable parameters. The templates call script files that
CHAPTER 4. BIOLOGICAL COGNITION – SYNTAX 175
advanced Nengo users can write to aid rapid prototyping of networks (scripting is
the topic of the section 7.5 tutorial).
The ‘Bind’ network is a component that computes the circular convolution
of its two inputs using nonlinear transformations as discussed in the preceding
tutorial. To look at the subnetwork elements you can double-click it. You cannot
see the connections into and out of the subnetwork when you open it. We can now
continue to build a binding network.
• Open the Interactive Plots viewer. Right-click the background and select
the ‘A’, ‘B’, ’C’, ‘Bind’ and ‘Sum’ nodes if they aren’t already displayed.
• Show the semantic pointer graphs for the ‘A’, ‘B’, ‘C’, and ‘Sum’ nodes by
right-clicking the nodes and selecting ‘semantic pointer’.
• Set the value of the ‘A’ ensemble to ‘a’ and the value of the ‘B’ ensemble to
‘b’, by right-clicking the relevant semantic pointer graphs.
• Right-click the semantic pointer graph of ensemble ‘C’ and select ‘show
pairs’.
• Right-click the semantic pointer graph of ensemble ‘C’ and select ‘a*b’.
• Repeat the previous two steps for the ‘Sum’ ensemble, so that both ‘show
pairs’ and ‘a*b’ are checked.
• Run the simulation for about 1 second, and then pause it.
difference even clearer. The ‘show pairs’ option controls whether bound pairs of
vocabulary vectors are included in the graph.
In contrast, the ‘a’ and ‘b’ vectors have a relatively high (and roughly equal)
similarity to the vector stored in the ‘Sum’ ensemble. The sum operation preserves
features of both original vectors, so the sum is similar to both. Clearly, the conjoin
and bind operations have quite different properties.
We have now seen how we can implement the two functions needed to create
structured representations in a spiking network. However, to process this infor-
mation, we must also be able to extract the information stored in these conjoined
and bound representations. This is possible through use of an inverse operation.
• Display the ‘C’ semantic pointer graph again by right-clicking ‘C’ and se-
lecting ‘semantic pointer’ (hiding and displaying the graph allows it to ad-
just to the newly added ‘c’ and ‘d’ vocabulary items).
• Right-click the graph of ‘C’ and select ‘select all’ (the letters ‘a’ through ‘d’
should be checked).
In this simulation, we first set the input to a known, structured semantic pointer.
The value ‘a*c+b*d’ is the semantic pointer that results from binding ‘a’ with ‘c’
and ‘b’ with ‘d’ then summing the resulting vectors. This vector is calculated
analytically as an input, but it could be computed with neurons if need be. The
second input, ‘~d’, represents the pseudo-inverse of ‘d’ (see section 4.2.1). The
pseudo-inverse vector is a shifted version of the original vector that approximately
reverses the binding operation. Binding the pseudo-inverse of ‘d’ to ‘a*c+b*d’
yields a vector similar to ‘b’ because the ‘b*d’ operation is inverted while the ‘a*c’
component is bound with ‘d’ to form a new vector that is dissimilar to anything in
the vocabulary and so is ignored as noise.
• Experiment with different inverse values as inputs to the ‘B’ ensemble (‘~a’,
‘~b’, or ‘~c’). Note the results at the ‘C’ ensemble for different inputs.
CHAPTER 4. BIOLOGICAL COGNITION – SYNTAX 177
• You can also try changing the input ‘statement’ given to the ‘A’ ensemble
by binding different pairs of vectors together and summing all the pairs.
• Binding more pairs of vectors together will degrade the performance of the
unbinding operation. If ‘B’ is set to ‘~a’, then setting ‘A’ to ‘a*b+c*d+e*f+g*h’
will not produce as clean an estimate of ‘b’ as when ‘A’ is set only to ‘a*b’.
Instead of naming the vectors ‘a’, ‘b’ and so on, you can name them ‘dog’, ‘sub-
ject’, and so on. Doing so makes it clear how this processing can be mapped
naturally to the various kinds of structure processing considered earlier in this
chapter. Note that the name ‘I’ is reserved for the identity vector. The result of
binding any vector with the identity vector is the vector itself. The lowercase ‘i’
does not refer to the identity vector.
There are two competing constraints that determine the accuracy of informa-
tion storage and retrieval in this network. First, the number of dimensions. With
a small number of dimensions, there is a danger that randomly chosen vectors
will share some similarity due to the small size of the vector space (see figure
3.2 in chapter 3). So it would be desirable to have very high dimensional spaces.
However, the second constraint is the number of neurons. A larger number of
dimensions requires a larger number of neurons to accurately represent and ma-
nipulate vectors. There are, of course, only a limited number of neurons in the
brain. So, there is an unsurprising trade-off between structure processing power,
and the number of neurons, as I explored in section 4.5. The network presented
in this tutorial is using a rather low number of dimensions and neurons for its
semantic pointer representations for the sake of computational performance.
Chapter 5
178
CHAPTER 5. BIOLOGICAL COGNITION – CONTROL 179
mation can be routed to different areas of the brain to support a given task; how
that same information can be interpreted differently depending on the context;
how the system can determine what an appropriate strategy is; and so on.
In general, the process of control can be usefully broken down into two parts:
1. determining what an appropriate control signal (or state) is; and 2. applying
that control signal to affect the state of the system. The first of these tasks is
a kind of decision making. That is, determining what the next course of action
should be based on currently available information. The second task is more of an
implementational issue: how can we build a system that can flexibly gate informa-
tion between different parts of the system in useful ways. For instance, if we are
speaking on the phone and asked by a friend to report only what we are currently
seeing, our brain needs to both decide what action to pursue (e.g., answering the
friend’s question, which necessitates translating visual to verbal information), and
then actually pursue that action (e.g., allowing visual information to drive our lan-
guage system to generate a report). If the same friend asked for a report on what
we were currently smelling, then our brain would configure itself differently, to
allow olfactory information to drive the language system. If we could not sig-
nificantly change the flow of information between our senses and our language
system based on slightly different inputs, we would often generate irrelevant re-
sponses, or perhaps nonsense. In a sense, this re-routing of information is a kind
of cognitive action. The importance of rapidly reconfigurable control is perhaps
even more obvious for guiding motor action.
In this chapter, I describe the aspects of the Semantic Pointer Architecture
(SPA) most important for control. I begin by presenting the work of a member of
my lab, Terry Stewart, on the role of the basal ganglia in action selection, with a
focus on cognitive actions. This discussion largely focuses on determining the ap-
propriate control signal. I then describe work pursued by Bruce Bobier, also in my
lab, on how actions that demand the routing of information can be implemented
in a neurally realistic circuit: this addresses the biologically plausible applica-
tion of a control signal. Bruce’s work focuses on routing information through
the visual cortex to explain aspects of visual attention. However, I describe how
lessons learned from this characterization of attention can be exploited throughout
the SPA. Overall, this discussion forms the foundation of control in an example
circuit that combines all aspects of the SPA, which I present in chapter 7.
CHAPTER 5. BIOLOGICAL COGNITION – CONTROL 180
inhibitory connections
excitatory connections
tex
cor
cortex
striatum
thalamus
hyperdirect pathway
D1
D2
indirect pathway
striatum
VL MD
direct pathway
basal ganglia
GPe CM
GPe
/Pf
GPi
VA
STN
STN
Transmitters
Dopamine (modulatory)
Glutamate (excitatory)
GABA (inhibitory)
SNr
GPi/SNr
SNc
brainstem
thalamus
STN
excitatory connection
Cortex inhibitory connection
0.5
0.3 -1 0.5
GPi/SNr
-1 selected
0.8 0.0
action
BG output -1
0.5 0.3
Figure 5.2: Action selection via the striatum D1 cells and the subthalamic nucleus
(STN). Connections from the STN are all excitatory and set at a weight of 0.5.
The input with the highest utility (0.8) causes the corresponding output in the
globus pallidus internal/substantia nigra (GPi/SNr) to drop to zero, stopping the
inhibition of that action. (Left figure after Gurney et al. (2001), fig. 4b).
tional excitation provided by the STN, all of the actions might be concurrently
suppressed (meaning more than one action had been selected). These very broad
STN connections thus take the input from the hyperdirect pathway, and combine
it to provide a level of background excitation that allows only the most inhibited
action to be selected.
It should be evident that the example shown in figure 5.2 has carefully selected
utilities. In fact, if there are many actions with large utilities, or all actions have
low utilities, then this circuit will not function appropriately. For this reason, a
control system is needed to modulate the behaviour of these neural groups. Gur-
ney et al. (2001) argue that the globus pallidus external (GPe) is ideally suited for
this task, as its only outputs are back to the other areas of the basal ganglia, and it
receives similar inputs from the striatum and the STN, as does the globus pallidus
internal (GPi). In their model, the GPe forms a circuit identical to that in figure
5.2, but its outputs project back to the STN and the GPi. This regulates the action
selection system, allowing it to function across a full range of utility values. This
network is shown in figure 5.3.
The model discussed so far is capable of performing action selection and re-
producing a variety of single-cell recording results from electrostimulation and
CHAPTER 5. BIOLOGICAL COGNITION – CONTROL 183
cortex
Cortex
striatum
SNr
BG output control signals
Figure 5.3: The model of action selection in the basal ganglia presented by Gur-
ney, et al. (2001). The striatum D1 cells and the subthalamic nucleus (STN) are
as in figure 5.2, while the striatum D2 cells and globus pallidus external form a
modulatory control structure. (Left figure after Gurney et al. (2001), fig. 5).
lesion studies. However, it does so with rate neurons; that is, the neurons do not
spike and instead continually output a numerical value based on their recent input.
This makes it difficult to make precise timing predictions. Furthermore, the model
has no redundancy, since exactly one neuron is used per area of the basal ganglia
to represent each action. The original version of the model shown in figure 5.3
uses a total of 9 neurons to represent 3 possible actions, and if any one of those
neurons is removed the model will fail.
However, given the resources of the NEF and the SPA, these short-comings
can be rectified. In particular, the SPA suggests that instead of a single neuron rep-
resenting potential actions, a mapping from a high-dimensional semantic pointer
into a redundant group of neurons should be used. As well the NEF provides a
means of representing these high-dimensional semantic pointers and their utilities
in spiking neurons, and reproducing the transformations suggested by the original
model. I return to the role of SPAs shortly. For the time being, I consider a simpler
model that represents actions and utilities as scalar values. The ability of this new
model to select actions is demonstrated in figure 5.4. There it can be seen that the
highest utility action (B then C then A) is always selected (i.e., inhibited, and so
related neurons stop firing).
Crucially, this new implementation of the model allows us to introduce addi-
tional neural constraints into the model which could not previously be included.
In particular, the types of neurotransmitters employed in the excitatory and in-
CHAPTER 5. BIOLOGICAL COGNITION – CONTROL 184
1.0
Input Utility
0.5
0.0
C
Output Spikes
Figure 5.4: Spikes produced (bottom) for three possible actions (A, B, and C) as
their utility changes (top). The highest utility action is selected, as demonstrated
by a suppression of the spikes of the corresponding action. In order, the highest
utility action is B then C then A.
hibitory connections of the model have known effects on the timing of a receiving
neuron’s response. All of the inhibitory connections involve GABA receptors
(with time constants between about 6-10ms; Gupta et al. (2000)), while the exci-
tatory ones involve fast AMPA-type glutamate receptors (with time constants of
about 2-5ms; Spruston et al. (1995)). The time constants of these neurotransmit-
ters have a crucial impact on the temporal behavior of the model. I discuss these
temporal properties and their related predictions in section 5.7.
Previous chapters demonstrated how the SPA can support representations suf-
ficiently rich to capture the syntax and semantics of complex representations.
Here, I consider how these same representations can be used by the basal ganglia
to drive chains of cognitive behavior. I begin, in this section, with an overview
of the central functions of cortex, basal ganglia, and thalamus, and their inter-
relations. In the rest of this chapter, I demonstrate how this architecture can im-
plement sequences of fixed action, and subsequently consider how flexible se-
quences of action can also be realized in this architecture. I return to these issues,
in chapter 7, where I present a system choosing among cognitive strategies, ap-
plying them, and using the results to drive motor control responses to perceptual
input.
To construct such models, we can rely on the well-known and central cor-
tex/basal ganglia/thalamus loop through the brain (see figure 5.5). Roughly speak-
ing, the SPA assumes that cortex provides, stores, and manipulates representa-
tions, the basal ganglia map current brain states to future states by selecting ap-
propriate actions, and the thalamus provides for real-time monitoring of the entire
system.
Let me begin with a brief consideration of cortex. While cortex is obviously
able to perform intricate tasks, in the SPA, most models are built out of combi-
nations of four basic functions: integration (for working memory; section 6.2),
multiplication (specifically convolution for syntactic manipulation; section 4.8),
the dot product (for clean-up memory and other linear transformations; sections
4.5 and 3.8), and superposition, which is a kind of default operation. I have dis-
cussed each of these functions in detail in past tutorials, so will not describe them
any further here. The SPA does not carry strong commitments about specific
organization of these operations, and this is clearly an avenue for future work.
Determining the most appropriate combination of these (and perhaps other) func-
tions to mirror all cortical function is far beyond the scope of this book – as I have
been careful to note, we have not yet built a brain! As well, learning clearly plays
a large role in establishing and tuning these basic functions. I leave further con-
sideration of learning until later (sections 6.5). My suggestion, then, is that cortex
is an information processing resource that is dedicated to manipulating, remem-
bering, binding, etc. representations of states of the world and body. I take this
claim to be general enough to be both correct, and largely uninteresting. Far more
interesting are the specific proposals for how such resources are to be organized,
as captured in the many example models throughout the book.
As I was at pains to point out in the introduction to this chapter, pure process-
ing power is not enough for cognition. In order for these cortical operations to be
CHAPTER 5. BIOLOGICAL COGNITION – CONTROL 186
cortex tex
cor
- provides, stores, and
manipulates representations Mc
Mb Mb
basal ganglia
- maps current to
future brain states
by selecting actions
thalamus
basal ganglia
Mc
thalamus
- real-time monitoring
Figure 5.5: The cortex/basal ganglia/thalamus loop. In the schematic on the left,
arrows indicate mappings between these three areas. The brain states from the
cortex are mapped through the Mb matrix to the basal ganglia. Each row in such a
matrix specifies a context for which the basal ganglia can choose an action. Thus
the product of the current state and Mb determines how similar the current state
is to each of the possible contexts. The output of the basal ganglia disinhibits the
appropriate areas of thalamus, as described above. Thalamus, in turn, is mapped
through the matrix Mc back to the cortex. Each column of this matrix specifies an
appropriate cortical state that is the consequence of the selected action. The rele-
vant anatomical structures are pictured on the right based on a simplified version
of figure 5.1.
CHAPTER 5. BIOLOGICAL COGNITION – CONTROL 187
the utilities that drive the basal ganglia model in section 5.2. One natural and
simple interpretation of the rows of this matrix is that they specify the antecedent
of a rule. Consider the rule “if there is an A in working memory then set working
memory to B”. The Mb matrix can examine the contents of working memory, and
output a list of similarities between its rows and working memory with a simple
linear transformation (i.e. s = Mb w, where s are the similarities between the each
of the rows of Mb and the vector w, the input from working memory). That
vector of similarities then acts as input to basal ganglia, which selects the highest
similarity (utility) from that input.
The output from basal ganglia results in a release from inhibition of the con-
nected thalamic neurons, which are then mapped back to cortex through Mc . This
matrix can be thought of as specifying the consequent of a rule, resulting, for
example, in setting working memory to a new state B. More generally, the Mc
matrix can be used to specify any consequent control state given a current corti-
cal state. This loop from cortex through basal ganglia and thalamus and back to
cortex forms the basic control structure of the SPA.
Two points are in order regarding this rough sketch of the cortex-basal ganglia-
thalamus loop. First, as discussed in section 7.4, the Mb mapping is more general
than the implementation of simple if-then rules. It provides for subtle statistical
inference as well. Second, this additional inferential power is available partly
because all of the representations being manipulated in this loop (i.e., of cortical
states, control states, etc.) are semantic pointers. Let us now turn to consideration
of a simple control structure of this form, to illustrate its function.
where bold indicates that the item is a 250-dimensional semantic pointer. For
present purposes, these pointers are randomly generated. The inclusion of letter
CHAPTER 5. BIOLOGICAL COGNITION – CONTROL 189
in each of the pointers provides minimal semantic structure that can be exploited
to trigger actions appropriate for any letter (see section 5.6).
To implement the IF portion of the rule, an Mb matrix with rows consisting
of all the letter representations is constructed and embedded in the connection
weights between cortical working memory and the basal ganglia input (using the
methods detailed in section 3.8). This allows the basal ganglia to determine what
rule is most applicable given the current state of working memory.
The THEN portion of the rule is then implemented by the Mc matrix in a
similar manner, where only one row is activated by disinhibition (as determined
by the basal ganglia), sending a given letter representation to working memory.
As working memory is being constantly monitored by the basal ganglia, this new
state will drive subsequent action selection, and the system will progress through
the alphabet.
To run the model, it is initialized by setting the working memory neurons to
represent the letter + A semantic pointer. After this, all subsequent activity is due
to the interconnections between neurons. Figure 5.6 shows the model correctly
following the alphabet sequence. From the spiking pattern we see that the correct
action for each condition is successfully chosen by turning off the appropriate
inhibitory neurons in the GPi. The top plot is generated by comparing the semantic
pointer of each of the 26 possible letters to the current semantic pointer in working
memory (decoded from spiking activity) by using a dot product, and plotting the
top eight results.
This model demonstrates that a well-learned set of actions with a specific rep-
resentation as an outcome can be implemented in the SPA. However, this model is
not particularly flexible. For instance, we might assume that our working memory
is being driven by perceptual input, say from vision. If so, we would have a con-
nection between our visual system and the working memory system that is driving
action selection. However, if this is the case, then changing the visual input dur-
ing the fixed action sequence will cause the sequence to shift suddenly to the new
input. In fact, just leaving the visual input “on”, would prevent the model from
proceeding through the sequence, since the working memory would be constantly
driven to the visually presented input despite the actions of the basal ganglia, as
shown in figure 5.7.
In short, the current model is not sufficiently flexible to allow the determined
action to be one which actually changes the control state of the system. That is,
we are currently not in a position to gate the flow of information between brain
areas using the output of the basal ganglia. In this case, we could fix the problem
if we could load visual input into working memory only when appropriate. That
CHAPTER 5. BIOLOGICAL COGNITION – CONTROL 190
0.8
A B C D E F G H
Working Memory (Similarity)
0.6
0.4
0.2
H
G
F
Actions
E
D
C
B
A
0.1 0.15 0.2 0.25 0.3 0.35
Time (s)
Figure 5.6: Rehearsal of the alphabet. (Top) Contents of working memory gener-
ated by taking the dot product of all possible semantic pointers with the decoded
contents of working memory (top eight values are shown). (Bottom) The spiking
output from GPi indicating the action to perform. The changes in activity demon-
strate that the population encoding the currently relevant IF statement stops firing,
disinhibiting thalamus and allowing the THEN statement to be loaded into work-
ing memory.
CHAPTER 5. BIOLOGICAL COGNITION – CONTROL 191
0.9 A
0.8
Working Memory (Similarity)
0.7
B
0.6
0.5
0.4
0.3
0.2
0
−0.1
H
G
F
Actions
E
D
C
B
A
0 0.05 0.1 0.15 0.2 0.25 0.3
Time (s)
Figure 5.7: Inflexibility of the control structure. The (erroneous) result of con-
necting the visual system to working memory. (Top) The perceptual system
(letter + A) continually drives working memory and prevents it from properly
moving to letter + B and subsequent states. (Bottom) The spiking out from GPi
reflects this same “frozen” state.
CHAPTER 5. BIOLOGICAL COGNITION – CONTROL 192
is, we would like basal ganglia to determine how information is routed in cortex,
as well as being able to introduce new representations in cortex. I suspect that
routing information flexibly through the brain is a fundamental neural process. I
also suspect that this routing is sometimes called “attention”.
FEF
MT
V1-4B
V1-4C
SC LGN-M
VP LGN-P
V1
V2
V4
PIT
Figure 5.8: Anatomical structures relevant to the attentional routing circuit (ARC).
The lateral geniculate nucleus (LGN) is a structure within the thalamus which re-
ceives information directly from the optic nerve. The LGN contains magnocellu-
lar and parvocellular cells which form the LGN-M and LGN-P layers respectively.
The other structures shown are the posterior inferior temporal cortex (PIT), visual
cortical areas (V1, V1-4B, V1-4C α, V2, V4), frontal eye fields (FEF), the middle
temporal area (MT), superior colliculus (SC), and ventral pulvinar (VP). Feedfor-
ward visual information projects through the ventral stream areas (black lines).
The pathway from LGN-M leading to FEF (dark gray lines) is one possible route
through which the focus of attention (FOA) may be rapidly determined. Areas
within the rounded gray box are used to describe the ARC. VP projects a coarse
signal (dashed black line) indicating the location of the FOA to control neurons
in PIT. Based on the VP signal, local control signals are computed in PIT. The
results are then relayed to the next lower level of the ventral stream (light gray
lines), where relevant local control signals are computed and again relayed.
CHAPTER 5. BIOLOGICAL COGNITION – CONTROL 194
ertheless, it improves upon the shifter circuit by being significantly more biolog-
ically plausible, scalable, and hence able to account for detailed neurophysiolog-
ical findings not addressed by that model. Consequently, the ARC provides an
especially good account of routing, and one that can be generalized to other parts
of the SPA (see section 5.6). Let us consider the ARC in more detail.
As shown in figure 5.8, the ARC consists of a hierarchy of visual areas, for
example V1, V2, V4, and PIT, which receive a control signal from a part of the
thalamus called the pulvinar (VP). A variety of anatomical and physiological ev-
idence suggests that pulvinar is responsible for providing a coarse top-down con-
trol signal to the highest level of the visual hierarchy (Petersen et al., 1985, 1987;
Stepniewska, 2004).
A more detailed picture of the connectivity between the visual areas is pro-
vided by figure 5.9. In that figure, an example focus of attention that picks out
approximately the middle third of the network is shown. To realize this effective
routing (i.e. of the middle third of V1 up to PIT), VP provides a control signal
to PIT indicating the position and size of the current focus of attention in V1.
Control neurons in PIT use this signal to determine what connections to “open”
between V4 and PIT, and then send their signal to control neurons in V4. The sig-
nal that is sent to the next lower level of the hierarchy (V4) is similarly interpreted
by that level to determine what gating is appropriate, and then passed on (i.e., to
V2). Again, the gating allows the flow of information only from those parts of
the next lower level that fall within the focus of attention. And so this process of
computing and applying the appropriate routing signal continues to V1.
To “open” a connection essentially means to multiply it by a non-zero factor.
Consider a simple example. Suppose we have a random signal going from pop-
ulation A to population B in a communication channel (see section 3.8). At the
inputs to population B, we might insert another signal that can be 0 or 1, coming
from our control population, C. If we multiply the inputs to B by C, then we cause
the representation in B to be either 0 (if C=0) or the current value of A (if C=1).
Population C thus acts as a gate for the information flowing between A and B. The
control neurons in ARC are performing a similar function.
In essence, the control neurons determine which lower-level (e.g. V4) neu-
rons are allowed to project their information to the higher level (e.g., PIT). In the
ARC, this gating is realized by the non-linear dendritic neuron model mentioned
in section 4.2.2. This kind of neuron essentially multiplies parts of its input, and
sums the result to drive spiking. Consequently, these neurons are ideal for acting
as gates, and fewer such neurons are needed than would be in a two-layer network
performing the same function.
CHAPTER 5. BIOLOGICAL COGNITION – CONTROL 195
VP
PIT
V4
V2
V1
Figure 5.9: The architecture of the attentional routing circuit (ARC) for visual
attention. Each level has a columnar and retinotopic organization, where columns
are composed of visually responsive pyramidal cells (white circles) and control
neurons (black dots). Filled gray circles indicate columns representing an exam-
ple focus of attention. Neurons in each column receive feedforward visual sig-
nals (gray lines) and a local attentional control signal from control neurons (black
lines). These signals interact nonlinearly in the terminal dendrites of pyramidal
cells (small white squares, only some are shown). Coarse attentional signals from
pulvinar (VP) are relayed through each level of the hierarchy downward to control
neurons in lower levels (large gray arrows). Control connectivity is highlighted
for the rightmost columns only, although other columns in each level have similar
connectivity.
CHAPTER 5. BIOLOGICAL COGNITION – CONTROL 196
II/III
IV
V len
pos
shift sf
VI s
sf
Figure 5.10: Proposed laminar arrangement of neurons in ARC for a single col-
umn. Global attention signals (length Alen and position A pos of selected region)
and a local signal indicating the position of the target in the current level (L pos )
are fed back from the next higher cortical level to layer-I, where they connect to
apical dendrites of layer-V cells. The layer-V neurons relay this signal to the next
lower area with collaterals projecting to control neurons in layer-VI that compute
the sampling frequency (sf, solid lines) and shift (s, dotted lines). These sig-
nals, along with feedforward visual signals from retina (solid black lines from the
bottom) are received by layer-IV pyramidal cells, where the routing occurs (i.e.,
layer-VI signals gate the feedforward signals). Cells in layer-II/III (black) pool
the activity of multiple layer-IV neurons and project the gated signal to the next
higher visual area.
CHAPTER 5. BIOLOGICAL COGNITION – CONTROL 198
PIT
V4
V2
V1
Figure 5.11: An example of routing in the ARC. Levels of the visual hierarchy
are laid out as in figure 5.11. Above each of these levels, the thick line indicates
the decoded signal represented by the neurons at that level. The darkened circles
in each level indicate the focus of attention in the network. In this example, the
focus of attention has the effect of both shifting and scaling the original data from
V1 to fit within PIT.
Following a delay, three stimuli were presented, one at the target location, and two
distractor stimuli (S2 and S3) one inside and one outside of the recorded cell’s re-
ceptive field. The animal had to indicate when it saw a brief change in the stimulus
at the cued location, some random interval after the three stimuli were presented.
During the interval, the animal was taken to have sustained spatial attention to S1,
and the receptive field of the cell was mapped. This experimental design allowed
Womelsdorf et al. to map the receptive field during sustained states of selective
attention to the stimuli inside the receptive field or to the stimulus outside the
receptive field.
To compare the model to these experimental results, we ran analogous numer-
ical experiments on the model. We used same methods of fitting the spiking data
to determine neuron receptive fields as in the experiment. The main difference
between the model and experimental runs is that in the model all spikes could
be collected from all neurons. As a result, we could run the experiment on 100
different realizations of the model. Each version has the same general ARC ar-
chitecture, but the neurons themselves are randomly chosen from a distribution
of parameters that matches the known properties of cells in the relevant parts of
cortex. Consequently, rather than having just over a hundred neurons from two
animals as in the experiment, we have thousands of neurons from hundreds of
model-animals. This means that we have a much better characterization of the
overall distribution of neuron responses in the model than is available from the
CHAPTER 5. BIOLOGICAL COGNITION – CONTROL 199
S3
Cue S1 S2
Model
Data
RF Shrink
RF Shift
RF Gain
-20 -10 0 10 20 30 40
RF Change (%)
Figure 5.13: Attentional effects (mean and 95% confidence intervals) for 100
model-animals (dashed lines) and data from Womelsdorf et al. (2008) (solid
lines). Because we can record from many more spiking neurons in the model,
the distribution is much better characterized, and so the confidence intervals are
tighter. These results demonstrate that the ARC quantitatively accounts for the
receptive field (RF) shrink, shift, and gain observed experimentally.
experimental data for the animals, and we also have a distribution over animals,
which is not available from the experimental data as there are only two monkeys.
To compare the model and data, we considered the three main effects de-
scribed in the experimental work. These effects were seen by comparing attention
being directed at a stimulus inside the receptive field to attention being directed at
a stimulus outside the receptive field. The effects were: 1) a change in the peak
firing rate; 2) a shift of the receptive field center, and 3) a change in the receptive
field size. In each case, several statistics were calculated to compare model and
experimental data (see Bobier (2011) for details). Overall, the model and data
match extremely well on each of these effects, as shown in figure 5.13.
A similar comparison to neurophysiological experiments from Treue and Martinez-
Trujillo (1999) and Lee and Maunsell (2010) is performed in Bobier (2011). The
first of these experiments demonstrates that the width of tuning to visual fea-
tures is not affected by attention, although the gain of the neuron responses is
affected. The second experiment demonstrates that neuron gain responses are
best explained as response gain and not contrast gain. Both sets of experimental
results are well captured by the ARC, when the same analyses are used on the
experimental and model spiking data.
One final note about this model is that there are only two free parameters.
One governs the width of the routing function, and the other the width of feedfor-
CHAPTER 5. BIOLOGICAL COGNITION – CONTROL 201
ward weights. Both of these parameters are set to match the known receptive field
sizes in visual cortex independently of any particular experiment. That is, they
were in no way tuned to ensure the ability of the model to predict any specific ex-
periment. The remaining parameters in the model are randomly chosen between
different model-animals from distributions known to match general physiological
characteristics of cells in visual cortex. This means that the model does not sim-
ply capture the mean of the population (of cells or animals), but actually models
the distribution of responses, which is more difficult. Nevertheless, the model
is able to provide a good characterization of changes in several subtle aspects of
individual cell activity across a variety of experiments from different labs.
Overall, these results suggest that the detailed mechanism embodied by ARC
for routing information through cortex is plausible in its details. The fact that ARC
is specified to the level of single spiking cells allows the same analysis applied to
the animal data to be applied to the model. Hence matching of results is not a
consequence of different analysis methods, but rather of a similarity in the under-
lying spike patterns. More generally, ARC provides insight into how information
can be flexibly routed through the brain. That is, it provides a biologically plau-
sible method for applying control, which was missing from our model of action
selection.
This rule can be selectively employed, so the contents of visual cortex can be
selectively used to drive future actions. Note that this “copy visual cortex” com-
mand is the specification of a control state that consists in gating the information
between visual inputs and working memory (unlike the simple representation-to-
representation mapping in the previous model). This demonstrates a qualitatively
new kind of flexibility that is available once we allow actions to set control states.
In particular, it shows that not only the content of cortical areas, but also the com-
munication between such areas, can be controlled by our cognitive actions.
There is a second interesting consequence of being able to specify control
states. Notice that the specified rule applies to every letter, not just the one that
happens to be in the visual input at the moment. This allows for rules to be defined
at a more general category level than in the previous example. This demonstrates
an improvement in the flexibility of the system, in so far as it can employ instance
specific, or category specific rules. More precisely, the rule can be categorically
defined, but apply specifically (i.e., the previous model could have a rule that maps
all letters to one other representational state, but not a single rule that systemati-
cally maps each letter to a different representational state).
In fact, routing can provide yet more flexibility. That is, it can do more than
simply gate information flow between different cortical areas. In the simple al-
phabet model above, the routing action was essentially “on” or “off” and hence
gate-like. However, we can use the same neural structure to actually process the
signals flowing between areas. Recall that the method of binding semantic point-
ers is to use circular convolution, which is a linear transformation followed by
multiplication. As described in the section on attention, gating can also be ac-
complished by multiplication (and linear transformations are simple and will not
interfere with the multiplication). If we allow our gating signal to take on values
between 0 and 1, we can use the same gating circuits to bind and unbind semantic
pointers, not only routing, but concurrently processing signals. Essentially, we
can give the gating signals useful content.
CHAPTER 5. BIOLOGICAL COGNITION – CONTROL 203
0.6
0.5
Working Memory (Similarity)
F G H I J K L
0.4
0.3
0.2
0.1
−0.1
L
K
J
I
H
Actions
G
F
E
D
C
B
A
look
0 0.05 0.1 0.15 0.2 0.25 0.3
Time (s)
Figure 5.14: Routing information. Contents of working memory are shown on top.
The similarity of vocabulary vectors is found with respect to the decoded value of
the working memory ensemble. The lower half of the graph shows spiking output
from GPi indicating the action to perform. The look action takes information from
visual cortex (in this case, letter + F) and routes it to working memory. Visual
input stays constant throughout this run (as in figure 5.7).
CHAPTER 5. BIOLOGICAL COGNITION – CONTROL 204
Introduce this simple extension means that the same network structure as
above can be used to perform syntactic processing. So, for example, we can imple-
ment a dynamic, controlled version of the question answering network described
in section 4.3. In this network, we define semantic pointers that allow us to present
simple language-like statements and then subsequently ask questions about those
statements. So, for example, we might present the statement
statement + blue ~ circle + red ~ square
to indicate that a blue circle and red square are in the visual field. We might then
ask a question in the form
question + red
which would be asking “What is red?”. To process this input, we can define the
following rules
IF the visual cortex contains statement+?
THEN copy visual cortex to working memory
which simply gates the visual information to working memory as before. We can
also define a rule that performs syntactic processing while gating
IF visual cortex contains question+?
THEN apply visual cortex to the contents of working memory
Here, “apply” essentially indicates that the contents of visual cortex are to be
convolved with the contents of working memory, and the result is stored in the
network’s output. More precisely, the contents of visual cortex are moved to a
visual working memory store (to allow changes in the stimulus during question
answering, as above), and the approximate inverse (a linear operation) of visual
working memory is convolved with working memory to determine what is bound
to the question. This result is then stored in an output working memory to allow
it to drive a response. The results of this model answering two different questions
from the same remembered statement are given in figure 5.15. These two generic
rules can answer any question provided in this format.
Notably, this exact model can reproduce all of the control examples presented
to this point. This means that the introduction of these more flexible control struc-
tures does not adversely impact any aspects of the simpler models’ performance
as described above. This is crucial to claims of flexibility. The flexibility of the
SPA needs to be independent of its particular use: flexibility needs to reside in
CHAPTER 5. BIOLOGICAL COGNITION – CONTROL 205
+ red square
+ blue circle
statement
+ red square
+ blue circle
statement
Figure 5.15: Answering two different questions starting from the same statement.
gray areas indicate the period during which the stimuli were presented. The simi-
larity between the contents of network’s output and the top 7 possible answers is
shown. The correct answer is chosen in both cases after about 50ms.
CHAPTER 5. BIOLOGICAL COGNITION – CONTROL 206
the overall design of the system, not the task specific changes introduced by a
modeller.
In addition, we can be confident that this control circuit will not adversely af-
fect the scaling of the SPA. Only about 100 neurons need to be added for each
additional rule in the basal ganglia. Of those, about 50 need to be added to stria-
tum, which contains about 55 million neurons (Beckmann and Lauer, 1997), 95%
of which are input (medium spiny) neurons. This suggests that about one million
rules can be encoded into a scaled-up version of this model. In combination with
the reasonable scaling of the representational aspects of the SPA (see section 4.5),
this suggest that the SPA as a whole will scale well.
So, the detailed neural mechanisms for routing in attention explored in the pre-
vious section can provide a general account of routing in a wide variety of circuits.
This is important because it means that this routing mechanism lends biological
detail to the SPA and the inclusion of this mechanism in the SPA helps make
this form of routing a plausible general mechanism. That is, we can show that
this detailed, spiking approach to routing can be usefully integrated into a func-
tional large-scale model, unlike past neural mechanisms that have been suggested
to account for routing (Gisiger and Boukadoum, 2011).
mative results are presented in figure 5.16b, where it can be seen that the latency
can increase to about 38ms for actions that are only slightly different in utility af-
ter the change. For the largest utility differences, the latency drops to about 14ms,
the same lower bound as the Ryan and Clark experiment. As far as I am aware,
this latency profile as a function of utility differences remains an untested, but
strong prediction of the model.
We can also look at the effects on timing of the basal ganglia in the context
of the entire cortex-basal ganglia-thalamus loop. In fact, this timing is implicitly
demonstrated by figure 5.6. There it can be seen that in the fixed action selection
case, it takes about 40ms for the system to switch from one action to another.
This is more fully characterized in figure 5.17 where the mean timing is shown
over a range of time constants of the neurotransmitter GABA. GABA is the main
neurotransmitter of the basal ganglia, and has been reported to have a decay time
constant of between 6 and 11ms (Gupta et al., 2000). We have identified this range
on the graphs with a gray bar. On this same graph, we have drawn a horizontal
line at 50ms because this is the standard value assumed in most cognitive models
for the length of time it takes to perform a single cognitive action (Anderson et al.,
1995a; Anderson, 2007).
Interestingly, the cycle time of this loop depends on the complexity of the ac-
tion being performed. Specifically, the black line in figure 5.17 shows the simplest
fixed action cycle time, whereas the gray line shows the more complex flexible ac-
tion cycle time, like those discussed in section 5.6. The main computational differ-
ence between these two types of action is that the latter has a control step, which
can re-route information through cortex. The cycle time is affected by increasing
from about 30-45ms in the simple case, to about 60-75 ms in the more complex
case. These two instances clearly bracket the standard 50ms value. Notably, this
original value was arrived at through fitting of behavioural data. If the tasks used
to infer this value include a mix of simple and complex decisions, arriving at a
mean between the two values described here would be expected.
Together, these timing results help to highlight some of the unique properties
of the SPA. For example, these elements of the SPA can help provide an expla-
nation of the genesis of certain “cognitive constants” that would not otherwise be
available, and it can address available neural data about the impact of utility on
selection speed. These explanations cannot be provided by a basal ganglia model
implemented in rate neurons, because they do not include the relevant timing in-
formation (e.g., neurotransmitter time constants). They are also not available from
more traditional cognitive modelling approaches, such as ACT-R, that have deter-
mined such constants by fits to behavioural data. As well, these explanations are
CHAPTER 5. BIOLOGICAL COGNITION – CONTROL 208
1.0
latency
14.8ms
0.0
Output Spikes
b) 40
30
mean
Latency (ms)
20
10
std. dev.
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.5 2.0
Utility Difference
100
90
80
not available from other neurally-inspired architectures that include basal ganglia
as an action selector, such as Leabra, because the winner-take-all mechanisms in
such models do not have temporal constraints.
In general, the dynamical properties of the SPA do not merely help us match
and explain more data, they also suggest behavioural and neurobiological ex-
periments to run to test the architecture. Behaviorally, the SPA suggests that it
should be possible to design experiments that distinguish the kinds of cognitive
action that take longer (or shorter) to execute with reference to their complexity.
Physiologically, the SPA suggests that it should be possible to design experiments
that manipulate the length of time neurons in basal ganglia fire after actions are
switched. Results from such experiments should provide a more detailed under-
standing of the neural underpinnings of cognitive control.
In general, the central theme of this chapter, “control”, is tightly tied to neu-
ral dynamics. Although I have discussed control more in terms of flexibility of
routing information because of my interest in presenting an architecture that can
manipulate complex representations, the dynamical properties underlying such
control are ever-present and unavoidable. In many ways, measuring these dynam-
ical properties is simpler than measuring informational properties, because when
CHAPTER 5. BIOLOGICAL COGNITION – CONTROL 210
Figure 5.18: The Tower of Hanoi task for four disks. Each move made by an ex-
pert solver is shown. Simon’s (1975) original “Sophisticated Perceptual Strategy”
results in these moves, and is the basis for the neural model presented here.
and goal recall in order to solve the task in a human-like manner. Of course,
the point of the SPA is to provide a means of constructing just such models that
conform to the known anatomy, connectivity, and neural properties of the basal
ganglia, thalamus, and cortex. Terry Stewart has recently exploited these proper-
ties of the SPA to construct just such a model (Stewart and Eliasmith, 2011).
There are many algorithms that can be used to produce the series of moves
shown in figure 5.18. As a result, one focus of cognitive research on this task
is determining which algorithm(s) people are using. This is done by examining
factors such as the time taken between steps and the types of errors produced. In
this SPA model, the basic algorithm used is the “Sophisticated Perceptual Strat-
egy” suggested by Simon (1975). As a subject who employs this algorithm, we
start with the largest disk that is not in the correct location. We then examine the
next smaller disk. If it blocks the move we want to make, then our new goal is to
move that disk to the one peg where it will not be in the way. We then iterate this
algorithm, going back to previous goals once we have accomplished the current
one.
To implement this algorithm, we need to keep track of three things: the disk
we are currently attending to, the disk we are trying to move, and the location we
are trying to move it to. To keep track of these varying states, we can introduce
three cortical working memory circuits, which are called ATTEND, WHAT, and
WHERE for ease of reference (see figure 5.19).
In addition, the algorithm requires the storage and recall of old goals regarding
CHAPTER 5. BIOLOGICAL COGNITION – CONTROL 212
which disk to place where. Storing a single goal such as “disk 4 on peg C” would
be easy: we could simply add the vectors together (disk4 + pegC) and store the
result. However, multiple goals cannot be stored in this manner, as (disk4 + pegC)
+ (disk3 + pegB) cannot be distinguished from (disk4 + pegB) + (disk3 + pegC).
This, of course, is another instance of the binding problem discussed in section
4.2. So, to store a set of goals, we compute the sum of the bound vectors disk4 ~
pegC + disk3 ~ pegB. Then, to recall where we wanted to place a particular disk
(e.g. disk3), we can unbind that disk from that goal state.
Finally, we need to consider what input and output is needed for the model.
Here, the focus is on the cognitive operations, so we do not need to consider motor
or perceptual systems in detail. As a result, we can assume that perceptual systems
are able to supply the location of the currently attended object (ATTEND_LOC),
the location of the object we are trying to move (WHAT_LOC), and the final end
goal location of the object we are trying to move (GOAL_LOC). Similarly, the two
output motor areas in the model will have the results of deliberation that indicate
what disk should be moved (MOVE_WHAT) and where it should be moved to
(MOVE_WHERE).
We have now specified the essential architecture of the model, and can turn
to characterizing the set of internal actions the model needs to perform, and the
conditions under which it should perform each action. These rules define the Mb
(IF) and Mc (THEN) matrices from figure 5.5. For each action, we determine
what state cortex should be in for that action to occur, and connect the cortical
neurons to the basal ganglia using the resulting Mb . We then determine what
cortical state should result from being in the current cortical state, and connect
thalamus to cortex using the resulting Mc . A full listing of the 16 rules used in the
final model is in appendix C.1.
Let us consider a few steps in this algorithm for illustration. First, the model
will ATTEND to the largest disk (placing the disk4 vector into the ATTEND mem-
ory). Next, the model forms a goal to place disk4 in its final location, by routing
ATTEND to WHAT and GOAL_LOC to WHERE. The model now has disk4 in
ATTEND and WHAT and pegC in WHERE. Next, it checks if the object it is
trying to move is in its target location. If it is (WHERE=WHAT_LOC), then it
has already finished with this disk and needs to go on to the next smallest disk
(loading disk3 into WHAT and routing GOAL_LOC to WHERE).
If the disk in WHAT is not where it is trying to move it (i.e., WHERE is not
equal to WHAT_LOC), then it needs to try to move it. First, it looks at the next
smaller disk by sending disk3 to ATTEND. If it is attending to a disk that is not the
one it is trying to move (ATTEND is not WHAT) and if it is not in the way (AT-
CHAPTER 5. BIOLOGICAL COGNITION – CONTROL 213
frontal cortex
Goal
Attend Where What Memory
Mb
Mc
Mc
Actions
Figure 5.19: The architecture used for the Tower of Hanoi. As described in more
detail in the text and appendix C.1, elements in frontal cortex act as memories,
with the goal memory including the ability to bind items to keep track of multiple
goals. Visual and motor cortices are treated as inputs and outputs to the model,
respectively. The basal ganglia and thalamus are as described in figure 5.3. Note
that diamonds indicate gates that are used for routing information throughout cor-
tex, and are largely the targets of selected actions. Mb is defined through the
projections to the basal ganglia, and Mc is defined through the multiple projec-
tions from thalamus to cortex (only two of which are labeled with the matrix for
clarity). The model has a total of about 150,000 neurons.
CHAPTER 5. BIOLOGICAL COGNITION – CONTROL 214
TEND_LOC is not WHAT_LOC or WHERE), then attend to the next smaller disk.
If it is in the way (ATTEND_LOC=WHAT_LOC or ATTEND_LOC= WHERE),
then it needs to be moved out of the way.
To do this, the model sets a goal of moving the disk to the one peg where it will
not be in the way. The peg that is out of the way can be determined by sending
the value pegA + pegB + pegC to WHAT and at the same time sending the values
from WHAT_LOC (the peg the disk it is trying to move is on) and ATTEND_LOC
(the peg the disk it is looking at is on) to WHAT as well, but multiplied by -1. The
result will be pegA + pegB + pegC - WHAT_LOC - ATTEND_LOC, which is
the third peg. This algorithm, with the addition of a special case for attending the
smallest disk (disk1 can always be moved, since nothing is ever in its way, and
if the model working with disk1 without finding anything in the way, then it can
move the disk it is trying to move), is sufficient for solving Tower of Hanoi.
However, this algorithm does not make good use of the memory system, so it
results in rebuilding plans each time a disk is moved. To address this, we can first
add a rule to do nothing if RECALL is not the same as WHERE. This occurs if
the model has set a new goal, but there has not been enough time for the memory
to hold it. Next, we can add rules for the state where the model has just finished
moving a disk. Instead of starting over from the beginning, the model sends the
next largest disk to ATTEND and WHAT and routes the value from RECALL to
WHERE. This recalls the goal location for the next largest disk and continues the
algorithm.
To fully implement this algorithm, including better use of the memory system,
requires specifying 16 actions in the two M matrices (see table C.2). The behavior
of the model is captured by figure 5.20. The model is able to successfully solve
the Tower of Hanoi, given any valid starting position and any valid target position.
It does occasionally make errors, and recovers from them (our analysis of these
errors is ongoing). Figure 5.20a shows several examples of successful action se-
lection, where the spiking output from the basal ganglia to the thalamus is shown.
As can be seen, different groups of neurons stop firing at the same time, releasing
the inhibition in the thalamus and allowing that particular action to be performed.
Unfortunately, it is difficult to directly compare this kind of single cell response
with human data, because such data is generally scarce, and does not exist for this
particular task to the best of my knowledge.
However, we can compare the model to human performance in other ways.
For instance, Anderson, Kushmerick, and Lebiere (1993) provide a variety of
measures of human performance on this task. Figure 5.20b compares the time
taken between each move in the case where no mistakes are made. There are only
CHAPTER 5. BIOLOGICAL COGNITION – CONTROL 215
a)
Wait until
Recall=Where Attend to
2 next disk Move disk 1
Move disk 2
1.5
0.5
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Time (s)
b)
Figure 5.20: Behavior of the Tower of Hanoi model. a) The decoded spiking
activity from basal ganglia to thalamus. This activity shows which actions are
selected as the task progresses. Different shades of grey indicate the activity of
basal ganglia neurons associated with different actions. Some actions are labelled
as examples. b) Behavioral performance of the model compared to human sub-
jects.
CHAPTER 5. BIOLOGICAL COGNITION – CONTROL 216
two free parameters for the model: the amount of time needed to move a disk
(1.8 seconds) and the memory weighting factor that determines how quickly new
information is encoded in memory (0.08). All other timing parameters are taken
from the neurophysiology of the various brain regions (e.g. time constants of the
relevant neurotransmitter types, membrane time constants, etc.). As a result, the
timing behavior of the system is not the result of a many parameter fit to the human
data, but rather falls naturally out of the dynamics intrinsic to the SPA.
As can be seen, in both the human and model data, steps 1, 9, and 13 show
longer pauses as a new set of goals are established. As well, shorter pauses on
steps 3, 7, and 11 are also consistent across the model and data. The only point
of disagreement is step 5, where the model is taking longer than the human. We
suspect this is because humans are employing a heuristic shortcut that we have not
included in the set of rules used in the model (analogous to the use of the memory
system not originally included). In any case, the overall agreement between the
model and human data is quite good, suggesting that this is the first spiking, neu-
rally based model to capture many major features of biological cognition on this
task.
However, one main point of constructing a biological model is to make better
contact with neural data, not just to explain behavioral data. As I have mentioned,
the low-level spiking data is not available, but high-level fMRI data is. As shown
in figure 5.21, the SPA model matches well to the fMRI activity recorded in a
variety of cortical areas. While a symbolicist model of this data (ACT-R; Ander-
son et al., 2005) has been shown to have a similar fit, there are crucial differences
between how the SPA generates these results and how ACT-R does.
First, like most cognitive models, ACT-R will provide the same prediction for
each run of the model. As a result, the model predicts only mean performance. In
contrast, the SPA model predicts a distribution of results (as shown by the standard
deviation plotted in figure 5.21). The SPA model is thus much more amenable to
modeling individual differences found in populations of subjects.
Second, the SPA generates the predictions based on a biophysical property of
the model: neurotransmitter usage (see figure 5.21b). That is, fMRI data is pre-
dicted by neurotransmitter usage, which is filtered by a fixed model of how such
fluctuations demand energy and give rise to the blood-usage-based BOLD signal
(i.e. the BOLD filter). It is the BOLD signal that is actually measured by MRI
machines. In contrast, the ACT-R model relies on the proportion of time a module
is used to generate its predictions. There is thus no specification of a physiolog-
ical process that underwrites the BOLD signal in ACT-R predictions. This also
has the consequence that for the ACT-R fit to the fMRI data the BOLD filter is fit
CHAPTER 5. BIOLOGICAL COGNITION – CONTROL 217
a) b)
c) d)
Figure 5.21: fMRI results comparing humans data to the SPA model during a
Tower of Hanoi-like task. Human fMRI data is from Anderson et al. (2005).
Scans were taken when subjects pressed a button indicating a move. The first
button press was at 10 seconds. Thick black lines are the average human fMRI
data. Thin black lines are the average model data. a) Parietal cortex activity. One
standard deviation is shown in grey. b) The parietal activity from a single run.
Gray lines indicate the synaptic activity (neurotransmitter usage) that is used to
generate the fMRI prediction, which is shown in black. c) Motor cortex activity.
d) Prefrontal cortex activity.
CHAPTER 5. BIOLOGICAL COGNITION – CONTROL 218
• Open a blank Nengo workspace and create a new network named ‘Question
Answering’.
• Create five new ‘default’ ensembles with the names ‘A’ ,‘B’, ‘C’, ‘D’, and
‘E’. Use 300 neurons and 20 dimensions per ensemble.
generated, but that will not significantly affect the operation of this
network.
• Drag the ‘Binding’ template into the network. Set Name to ‘Bind’ and Name
of output ensemble to ‘D’. Use 30 neurons per dimension and don’t invert
either input.
• Project from the ‘A’ and ‘B’ ensembles to the ‘A’ and ‘B’ terminations on
the ‘Bind’ network.
• Drag another ‘Binding’ template into the network. Set Name to ‘Unbind’
and Name of output ensemble to ‘E’. Use 30 neurons per dimension and
check the box to invert input A but not B.
• Project from the ‘C’ ensemble to the ‘A’ termination on the ‘Unbind’ net-
work and from the ‘D’ ensemble to the ‘B’ termination of the ‘Unbind’
network.
• Open Interactive Plots and ensure the semantic pointer graphs for ‘A’, ‘B’,
‘C’, ‘D’, and ‘E’ are shown.
• Use Set value on the semantic pointer graph of ‘A’ and set it to ‘RED’.
Similarly, set the value of ‘B’ to ‘CIRCLE’, and the value of ‘C’ to ‘RED’.
As seen in the previous tutorial, binding the ‘A’ and ‘B’ inputs together results
in a semantic pointer in ‘D’ that does not share a strong similarity to any of the
vocabulary vectors. However, the added ‘Unbind’ network takes the generated
semantic pointer in ‘D’ and convolves it with the inverse of ‘C’, which is the same
as one of the inputs, resulting in an estimate of the other bound item.
CHAPTER 5. BIOLOGICAL COGNITION – CONTROL 220
• Alternate setting the value of ‘A’ to ‘RED’ and ‘BLUE’, the value of ‘B’
to ‘CIRCLE’ and ‘SQUARE’, and the value of ‘C’ to any one of these four
labels. Run and pause the simulation between changes (this is not necessary
if the simulation runs quickly).
The inputs to the ‘A’ and ‘B’ ensembles can be thought of as forming the statement
‘D’. The statements in this case could be interpreted as temporary associations,
such as “squares are blue” and “circles are red”. The input to the ‘C’ ensemble
then poses a question about the statement. Thus an input of ‘SQUARE’ asks “what
is associated with squares” and the expected answer is “blue”.
This network is not a very impressive demonstration of cognitive ability since
answers to questions are always provided at the same time as the questions them-
selves. However, the network is good start because it forms a bound representation
from its input and then successfully extracts the originally bound elements.
The usefulness of this operation becomes apparent when memory is intro-
duced to the network, so we will now add a memory buffer.
• Remove the projection from ‘D’ to ‘Unbind’ by clicking the ‘B’ termination
on ‘Unbind’ and dragging it off the ‘Unbind’ network.
• Drag the ‘Integrator’ template from the template bar into the ‘Question An-
swering’ network.
• Project from the ‘X’ origin of ‘Memory’ to the ‘B’ termination on ‘Unbind’.
The integrator network is quite simple, but behaviorally very important. This is
because it can act to prolong the availability of information in the system, and
hence act as a memory. With no input, the present state of the ensemble is pro-
jected back to itself, allowing it to retain that state. If input is present, it moves
the system to a new state until the input ceases, at which point the new state is
retained. So, without input the integrator acts as a memory, and with input it acts
to update its current state based on recent history. In either case, it helps to track
changes in the environment. Crucially, the decay of the memory is much slower
than (though related to) the decay of the feedback.
CHAPTER 5. BIOLOGICAL COGNITION – CONTROL 221
In previous tutorials, we have included default time constants for all synapses
(in ‘tauPSC’), but the models have been largely feedforward. As a result, the dy-
namics have been relatively straightforward. However, the integrator has a feed-
back connection, and so dynamics become more subtle (see the discussion of the
third NEF principle in section 2.3.3). In this model we have set the feedback
time constant to be artificially high (400ms), which gives a relatively stable mem-
ory (several seconds). However, a more physiologically realistic time constant
of 100ms (a time constant associated with NMDA receptors) would require us to
increase the number of neurons for the same memory performance, and would
significantly slow the simulation. So, I consider this setting to only be a useful
simplification for explanatory purposes.
Before proceeding, I will note that it is possible that the following simulation
gets some of the expected answers wrong. This is because I have attempted to use
as few neurons as possible to allow the network to run more quickly. Because the
vocabulary and neuron tunings are generated randomly, some networks may not
work perfectly. Increasing the number of neurons and the number of dimensions
can alleviate any problems, but will affect the performance of the simulation.
• Open Interactive Plots and show the same graphs as before if they are not
visible by default. (To see the ‘Memory’ network, you have to close and
reopen the Interactive Plots, but be sure to save the layout before doing so.)
• Right-click the semantic pointer graph for ‘E’ and choose ‘select all’.
• Set ‘A’ to ‘RED’, ‘B’ to ‘CIRCLE’, and ‘C’ to ‘RED’.
• Run the simulation for 0.5 seconds.
This results in the same behavior as before, where the result in ‘E’ is ‘CIRCLE’.
• With the simulation paused, set ‘A’ to ‘BLUE’ and ‘B’ to ‘SQUARE’.
• Run the simulation for another half second and pause.
During the first second of the simulation, the network receives inputs from the
‘A’ and ‘B’ ensembles and binds them (i.e., resulting in RED ~ CIRCLE). This
is stored in the ‘Memory’ ensemble. During the next second, a second binding
(i.e., BLUE ~ SQUARE) is presented. It too is added to the ‘Memory’ so the
value stored in the integrator memory after two seconds is approximately RED ~
CIRCLE + BLUE ~ SQUARE. As a result, the output ‘E’ remains as ‘CIRCLE’
because the ‘C’ input has stayed constantly equal to ‘RED’ and so unbinding will
still result in ‘CIRCLE’, which is bound to ‘RED’ in the memory.
CHAPTER 5. BIOLOGICAL COGNITION – CONTROL 222
• Right-click the semantic pointer graphs for ‘A’ and ‘B’ and select ‘release
value’. Run the simulation for 0.5 seconds.
You will notice that the result in ‘E’ is still ‘CIRCLE’ because the memory has
not changed, and it already encoded the inputs.
• Change ‘C’ to a different vocabulary vector (’BLUE’, ’CIRCLE’, ’SQUARE’).
• Run the simulation for 0.5 seconds and pause.
• Repeat the previous steps for each vocabularly vector.
For each of these cases, the memory is encoding the past state of the world, and
only the question is changing in ‘C’. The result in ‘E’ changes because it continues
to show the result of unbinding the semantic pointer in memory with the question
input. The network is now performing question answering, since it can remember
pairs of items and successfully recall which items are associated with each other.
So it has internally represented a simple structure, and can answer queries about
that structure.
However, we have not yet accomplished our original goal, which is to have
statements and questions supplied through a single ‘visual input’ channel and pro-
duce replies in a ‘motor output’. The final stage of building the question answer-
ing network introduces the control structures needed to shift between a state of
accepting new information to store in memory and a state of replying to questions
about those memories.
• Remove the ‘Question Answering’ network from the workspace.
• Create a new network named ‘Question Answering with Control’.
• Drag the ‘Network Array’ template from the template bar into the network.
Set Name to ‘Visual’, Neurons per dimension to 30, Number of dimensions
to 100, Radius to 1.0, Intercept (low/high) to -1 and 1 respectively, Max rate
(low/high) to 100 and 300, Encoding sign to ‘Unconstrained’, and Quick
mode to enabled.
• Repeat the previous step to create two additional network arrays named
‘Channel’ and ‘Motor’, with the same parameters as the ‘Visual’ network.
• Drag the ‘Integrator’ template into the main network. Set the name of the in-
tegrator to ‘Memory’ and give it 3000 neurons, 100 dimensions, a feedback
time constant of 0.4, an input time constant of 0.01, and a scaling factor of
1.0.
CHAPTER 5. BIOLOGICAL COGNITION – CONTROL 223
• Create a projection from the ‘Visual’ network array to the ‘visual’ termina-
tion.
• Drag the ‘Binding’ template into the ‘Question Answering with Control’
network. Set Name to ‘Unbind’, Name of output ensemble to ‘Motor’, and
Number of neurons per dimension to 30. Enable the Invert input A option.
• Project from the origin of the ‘Memory’ network to the ‘B’ termination of
the ‘Unbind’ network.
1 With a network array, a low-dimensional ensemble can be replicated such that neuron tuning
curves and decoding weights only need to be computed for a single dimension of the network.
Disabling ‘quick mode’ will create random tuning curves for all the neurons in the network. This
is extremely time-consuming, so it is highly recommended that quick mode be enabled.
CHAPTER 5. BIOLOGICAL COGNITION – CONTROL 224
• Project from the origin of the ‘Visual’ network to the ‘A’ termination of the
‘Unbind’ network.
At this stage, we have roughly recreated the basic ‘Question Answering’ network,
albeit with much larger populations representing a higher-dimensional space. The
remaining elements of the network will introduce action selection and information
routing.
• Drag the ‘Basal Ganglia’ template into the network. Set the name to ‘Basal
Ganglia’, the number of actions to 2, and the time constant to 0.01.
This created a subnetwork which is structured like the basal ganglia model de-
scribed earlier (section 5.2). To see the elements of this model, double-click the
subnetwork.
• Drag the ‘BG Rule’ template onto the new ‘Basal Ganglia’ network. Set
Rule Index to 0, Semantic Pointer to ‘STATEMENT’, Dimensionality to
100, tauPSC to 0.01, and enable Use Single Input.
Choosing Use Single Input sends any input to basal ganglia to all three input
elements of the basal ganglia (i.e., striatal D1 and D2 receptors, and STN).
• Drag another ‘BG Rule’ template onto the basal ganglia network. Set Rule
Index to 1, Semantic Pointer to ‘QUESTION’, Dimensionality to 100, tauPSC
to 0.01, and enable Use Single Input.
• Project from the ‘Visual’ network to both the ‘rule_00’ and ‘rule_01’ termi-
nations on the ‘Basal Ganglia’ network.
• Drag a ‘Thalamus’ template into the main network. Set Name to ‘Thala-
mus’, Neurons per dimension to 30 and Dimensions to 2.
CHAPTER 5. BIOLOGICAL COGNITION – CONTROL 225
• Add a termination to the ‘Thalamus’ network. Name it ‘bg’, set Input Dim
to 2, and tauPSC to 0.01. Click Set Weights and set the diagonal values of
the matrix (the two that are 1.0 by default) to -3.0.
• Create a projection from the output origin of ‘Basal Ganglia’ to the ‘bg’
termination of ‘Thalamus’.
The large negative weights between basal ganglia and thalamus reflect the in-
hibitory input of basal ganglia. Thus, when an action is selected, its activity goes
to zero, which will be the only non-inhibited channel into the thalamus, releasing
the associated action consequent. We can now define action consequents.
• Drag a ‘Gate’ template into the main network. Name the gate ‘Gate1’,
set Name of gated ensemble to ‘Channel’, Number of neurons to 100, and
tauPSC to 0.01.
• Drag a second ‘Gate’ template into the main network. Name the gate
‘Gate2’, set Name of gated ensemble to ‘Motor’, Number of neurons to 100,
and tauPSC to 0.01.
228
CHAPTER 6. BIOLOGICAL COGNITION – MEMORY AND LEARNING229
looks like a standard case of learning, but depends only on an active memory sys-
tem. That is, no connection weights are changed in the network even though the
system learns a general rule that allows past experience to change what behaviors
are chosen in the present.
Similarly, the traditional contrast between short-term (or “working”) memory1
(which is often taken to be activity in a recurrent network), and long-term memory
(which is often taken to be postsynaptic connection weight changes), is difficult to
sustain in the face of neural evidence. For instance, there are a wide range of time
scales of adaptive processes influencing neural firing in cortex (Ulanovsky et al.,
2004). Some short-term (seconds to minutes) changes are related to postsynaptic
weight changes (Whitlock et al., 2006), and others are not (Varela et al., 1997;
Romo et al., 1999b). Consequently, some “short-term” memories might be stored
in the system in virtue of connection weight changes, and others might be stored
in virtue of pre-synaptic processes, or stable network dynamics. The time scales
we use to identify kinds of memory behaviorally seem pick out more than one
neural mechanism.
Because of such complexities in the mapping between kinds of memory and
neural mechanisms, I am not going to provide a general description of mem-
ory and learning. Instead, I provide specific examples of how activity-based and
connection weight-based mechanisms for adaptation can be employed by the Se-
mantic Pointer Architecture. This provides for an admittedly modest selection of
adaptive mechanisms at play in the brain, but it gives an indication of how these
two common forms of adaptation integrate naturally with other elements of the
architecture.
More specifically, this chapter addresses two challenges. The first is related
to activity-based explanations of working memory, and the second is related to
learning connection weight changes. The first challenge is to extend current neu-
ral models of working memory to make them more cognitively relevant. There
are many neurally plausible models of working memory that rely on recurrently
connected networks (Amit, 1989; Zipser et al., 1993; Eliasmith and Anderson,
2001; Koulakov et al., 2002; Miller et al., 2003). These models are able to explain
the sustained firing rate found in many parts of cortex during the delay period
1 Sometimes these terms are distinguished and sometimes they are not. Additional distinctions
do not challenge the point being made here, which is a consequence of the fact that kinds of
memory are typically behaviourally defined, and the timescales of different neural mechanisms
(some activity-based, some connection-based) overlap. Consequently, there are medium-time-
scale behaviors that are activity-based, and others that are connection-based (and no doubt others
that are a combination of the two).
CHAPTER 6. BIOLOGICAL COGNITION – MEMORY AND LEARNING230
of memory experiments. Some are able to explain more subtle dynamics, such
as ramping up and down of single cell activity seen during these periods (Singh
and Eliasmith, 2006). However, none of these models address complex working
memory tasks that are essential for cognition, such as serial working memory (i.e.
remembering an ordered list of complex items). This is largely because methods
for controlling the loading of memories, clearing the network of past memories,
and constructing sophisticated representations, have not been devised. In the next
two sections, I describe a model that addresses many of these limitations.
The second challenge is to introduce a biologically realistic mechanism for
connection weight changes that can learn manipulations of complex representa-
tions. In the last two sections of this chapter, I describe a spike-timing-dependent
plasticity (STDP) rule that is able to learn linear and non-linear transformations
of the representations used in the SPA. I demonstrate the rule by showing that
it can be used to learn the binding operation employed in the SPA (i.e., circular
convolution), that it can be used to learn action selection in the model of the basal
ganglia (section 5.2), and that it can be used to learn different reasoning strategies
in different contexts to solve a language processing task. The tutorial at the end
of this chapter demonstrates how to use this rule in Nengo.
0.6
0.5
0.4
0.3
0.2
0.1
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Item Number
Figure 6.1: Recall accuracy from human behavioural studies showing primacy
and recency effects (data from Jahnke (1968)). Participants were asked to recall
ordered lists of varying length.
a result, cognitive psychologists have studied these more demanding tasks since
the 1960s (Jahnke, 1968; Rundus, 1971; Baddeley, 1998). The ability to store and
recall items in order is called serial working memory. Unlike the simple single
target case, serial working memory is seldom studied in animal single cell studies
(Warden and Miller, 2007), and there are no spiking single cell models that I
am aware of (although there are some neurally inspired models, e.g., Beiser and
Houk (1998); Botvinick and Plaut (2006)). It is thus both crucial to have serial
working memory in a cognitive architecture, and unclear how such a function can
be implemented in a biologically realistic network.
In studying serial working memory, two fundamental regularities have been
observed: primacy, and recency. Both primacy and recency can be understood
by looking at an example of the results of a serial recall task, like those shown
in figure 6.1. In this task, subjects are shown lists of items of various lengths,
and asked to recall the items in their original order. Primacy is identified by the
observation that items appearing earlier in the list have a greater chance of being
recalled accurately, regardless of the length of the list. Recency is identified by the
observation that the items most recently presented to subjects have an increased
chance of being recalled as well. Together, primacy and recency account for the
typical U-shaped response probability curve seen in serial working memory tasks.
CHAPTER 6. BIOLOGICAL COGNITION – MEMORY AND LEARNING232
Interestingly, this same U-shape is seen in free recall tasks (where order infor-
mation is irrelevant; see figure 6.6). So it seems likely that the same mechanisms
are involved in both free recall and serial recall. In fact, as I discuss in more de-
tail in the next section, it seems likely that all working memory behavior can be
accounted for by a system that has serial working memory. In contrast, models
of working memory that can account for the behavior of monkeys on single item
tasks cannot account for serial working memory behavior. Consequently, serial
working memory is of more fundamental importance when considering human
cognition.
Recently, Xuan (pronounced “Sean”) Choo in my lab has implemented a spik-
ing neural network model of serial working memory using the resources of the
SPA. Based on a consideration of several models from mathematical and cog-
nitive psychology (Liepa, 1977; Murdock, 1983, 1993; Henson, 1998; Page and
Norris, 1998), Xuan has proposed the ordinal serial encoding (OSE) model of
working memory (Choo, 2010). In the next section, I present several examples of
how this model can account for human psychological data on serial and free recall
tasks. First however, let us consider the component parts, and the basic functions
of the model.
The encoding and decoding of items to be remembered by the OSE model is
depicted in figure 6.2. As can be seen there, the model consists of two main com-
ponents, an input working memory and an episodic memory. These are taken to
map onto cortical and hippocampal memory systems respectively. Both memories
are currently modeled as neural integrators (see section 2.3.3). A more sophisti-
cated hippocampal model (e.g., Becker, 2005) would improve the plausibility
of the system, although it would presumably not significantly change the result-
ing performance. Notably, modeling these memory systems as neural integrators
amounts to including past models of single item working memory as a component
of this model (such as the model proposed in Singh and Eliasmith (2006)).
The input to the model consists of a semantic pointer representing the item,
and another semantic pointer representing its position in a list. These two rep-
resentations are bound, using a convolution network (section 4.2), and fed into
the two memory systems. Simultaneously, the item vector alone is fed into the
two memories to help enforce the item’s semantics for free recall. Each mem-
ory system adds the new representation to the list that is currently in memory,
and the overall representation of the sequence is the sum of the output of the two
memories.
Decoding of such a memory trace consists of subtracting already recalled
items from the memory trace, convolving the memory trace with the inverse of
CHAPTER 6. BIOLOGICAL COGNITION – MEMORY AND LEARNING233
Input Episodic
Buffer Buffer
+ + Memory Cleanup
γ ρ
Memory Memory +
Input Episodic Response
Trace + Trace Suppression
Figure 6.2: Network level diagrams of the OSE encoding network (left) and the
OSE item recall network (right). The encoding network receives an item and posi-
tion semantic pointer as input, which are bound and sent to both memory systems.
This is added to the current memory state. Each memory stores the information
by integrating it over time, although the decay rates (γ and ρ) are different. The
decoding network begins with the originally encoded memory and decodes it by
unbinding the appropriate position. The result is cleaned up and placed in a mem-
ory that tracks already recalled items. The contents of that memory is subtracted
from the originally encoded sequence before additional items are recalled.
CHAPTER 6. BIOLOGICAL COGNITION – MEMORY AND LEARNING234
the position vector, and passing the result through a clean-up memory (section
4.5). In short, the decoding consists of unbinding the desired position from the
total memory trace, and cleaning up the results. Concurrently, items which have
already been generated are subtracted so as to not be generated again. In the case
of free recall (where position information is not considered relevant), the unbind-
ing step is simply skipped (this can be done by setting the position vector to be
an identity vector). The equations describing both the encoding and decoding
processes can be found in appendix B.3.
As can be seen from figure 6.2, this model has two free parameters, ρ and γ.
These govern the dynamic properties of the two different memory systems cap-
tured by the model. Specifically, ρ captures the effect of rehearsing items early in
the list during the encoding process, and is associated with hippocampal function.
The value of this parameter was set by fitting an experiment conducted by Rundus
(1971) in which the average number of rehearsals was found as a function of the
item number in a serial list tasks. The other system, influenced by γ, character-
izes a generic cortical working memory process, and experiences a fading of its
memorized representation over time. The value of this parameter was set by re-
producing the simple working memory experiment described in Reitman (1974)
and choosing the value that allowed the accuracy of the model matches that of hu-
mans. Reitman’s experiment showed that human subjects could recall on average
65% of the items that they recalled immediately after list presentation after a 15
second delay (when rehearsal was not permitted).
It is important to note that the two free parameters of this model were set by
considering experiments which are not being used to suggest that the model is
a good model of serial recall. That is, they were set independently of the test
experiments considered in the next section. Consequently, the performance of the
model on these new tasks in no way depends on tuning parameters to those tasks.
Finally, it is worth highlighting that the preceding description of this model is
reasonably compact. This is because we have encountered many of the underly-
ing elements – the necessary representations, the basic transformations, and the
dynamical circuits – before in the Semantic Pointer Architecture. So, in many
ways this working memory system is not a new addition to the SPA, but rather a
recombination of functions already identified as central to the architecture (e.g.,
integration, binding, and clean-up). This is encouraging because it means our
basic architecture does not need to get more complicated in order to explain addi-
tional, even reasonably sophisticated, functions.
CHAPTER 6. BIOLOGICAL COGNITION – MEMORY AND LEARNING235
Model Data
Human Data
1 1
0.9 6 items 0.9 6 Items
10 Items 10 Items
Figure 6.3: A comparison of human and model data on serial list recall tasks. The
model data is presented with 95% confidence intervals while for human data only
averages were reported. The human data is from Jahnke (1968).
Probability of Selection
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 Item 1 Item 2 Item 3 Item 4 Item 5 Item 6
Figure 6.4: Transposition gradients for serial recall tasks. Given the task of recall-
ing six items in order, these plots show the probability of selecting each item for
each of the six possible positions in the list. The transposition gradients thus show
the probabilities of errors involving recalling an item outside its proper position.
a) The six items in the list are all easy to distinguish. b) Even numbered items in
the list are similar and more easily confused. Human data is from Henson et al.
(1996).
how well items can be recalled. Henson et al. (1996) also designed an experi-
ment in which they presented subjects with lists containing confusable and non-
confusable letters. Because the stimuli were heard, confusable letters were those
which rhymed (e.g. “B”, “D”, and “G”), while non-confusable letters did not
(e.g., “H”, “K”, and “M”). The experimenters presented four different kinds of
lists to the subjects: lists containing all confusable items, those containing confus-
able items at odd positions, those containing confusable items at even positions,
and those containing no confusable items. The probability of successful recall for
these lists for both the human subjects and the model are shown in figure 6.5; a
comparison of the transposition gradients from the model and human subjects on
a recall task with confusable items at even positions is given in figure 6.4b..
Again, this example shows that the model does a good job of capturing behav-
ioral data, both the likelihood of successful recall and the pattern of errors that are
observed in humans. The same model has also been tested on several other serial
list tasks, including delayed recall tasks, backwards recall tasks, and combinations
of these (Choo, 2010), although I do not consider these here
More important for present purposes, the same model can also explain the
results of free recall experiments. As shown in figure 6.6, the accuracy of recall
CHAPTER 6. BIOLOGICAL COGNITION – MEMORY AND LEARNING237
0.9 0.9
0.8 0.8
Probability of Successful Recall
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
Figure 6.5: Serial recall of confusable lists. These graphs plot human and model
performance recalling lists containing four different patterns of easily confus-
able versus non-confusable items. Model data includes 95% confidence intervals,
while for human data only averages were reported. The human data is from Hen-
son et al. (1996), experiment 1.
CHAPTER 6. BIOLOGICAL COGNITION – MEMORY AND LEARNING238
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
10 Items 10 Items
0.1 20 Items 0.1 20 Items
30 Items 30 Items
0 0
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Item Number Item Number
Figure 6.6: Human and model data for free recall tasks. Unlike the serial recall
task of figure 6.3, subjects can repeat the contents of the lists in any order. The free
recall results continue to demonstrate primacy and recency effects with U-shaped
curves. Human data is from Postman and Phillips (1965).
for the model is very similar to that of humans for a wide variety of list lengths.
In these tasks, there is no constraint on the order in which items are recalled from
memory. Nevertheless, both the model and human data show the typical U-shaped
response probability curves.
Taken together, these examples of the OSE model performance on a variety
of tasks (none of which were used to tune the model) can make us reasonably
confident that some of the principles behind human working memory are captured
by the model. Because it uses the same kinds of representations as the rest of the
SPA, we can be confident that it will integrate easily into larger scale models. I
take advantage of this in the next chapter.
However, before leaving consideration of this model I want to highlight what I
think is perhaps its most theoretically interesting feature: namely, that this model
only works if it is implemented in neurons. As demonstrated by figure 6.7a, if
we directly simulate the equations that describe this model (see appendix B.3),
it is unable to accurately reproduce the recency and primacy effects observed in
the human data. Initially, it seemed that this failure might have been caused by
the semantic pointer vectors in the system becoming arbitrarily long as additional
items were added to the memory trace. Consequently, we also implemented the
model using standard vector normalization, which guarantees that the vectors al-
ways have a constant length. But again, as seen in figure 6.7b, the model is unable
CHAPTER 6. BIOLOGICAL COGNITION – MEMORY AND LEARNING239
Figure 6.7: Non-neural implementations of the recall model. a) The recall model
is implemented by directly evaluating the equations in appendix B.3. Vectors
are not normalized and can grow arbitrarily long. b) The recall model is again
implemented without spiking neurons, but vector lengths are held constant using
ideal vector normalization. Varying the parameters does not help. The effects of
changing ρ are shown.
to capture the human data (i.e. neither case has the appropriate U-shaped curve,
although normalization is closer).
Consequently, we realized that one of the main reasons that this model is able
to capture the human data as it does, is that the individual neurons themselves
saturate when participating in the representation of large vectors. This satura-
tion serves as a kind of “soft” normalization, which is neither ideal mathematical
normalization, nor a complete lack of normalization. Instead, it is a much more
subtle kind of constraint placed on the representation of vectors in virtue of neu-
ron response properties. And, crucially, this constraint is directly evident in the
behavioral data (i.e., it enables reconstructing the correct U-shaped curve). This
is theoretically interesting, because it provides an unambiguous example of the
importance of constructing a neural implementation for explaining high-level psy-
chological behavior. All too often, researchers consider psychological and neural
level explanations to be independent: a view famously and vigorously champi-
oned by Jerry Fodor (Fodor, 1974). But in this case, the dependence is clear.
Without constructing the neural model, we would have considered the mathemati-
cal characterization a failure, and moved on to other, likely more complex, models.
However, it is now obvious that we would have been doing so unnecessarily. And
CHAPTER 6. BIOLOGICAL COGNITION – MEMORY AND LEARNING240
Figure 6.8: Spike-timing dependent plasticity (STDP). This figure shows the crit-
ical window of spike-timing for affecting synaptic weight change. The percentage
change in the postsynaptic current (PSC) amplitude at 20-30 min after the repet-
itive spiking (60 pulses at 1 Hz) is plotted against the spike timing. The small
inset images above indicate that when the spikes generated by a neuron regularly
came before PSCs induced in it, the amplitude of subsequent PSCs at that synapse
decreased, and when the spikes came after the induced PSCs, the amplitude in-
creased. The thick line indicates a commonly used “adaptation function” that
summarizes this effect. (Adapted from Bi and Poo (1998) with permission).
CHAPTER 6. BIOLOGICAL COGNITION – MEMORY AND LEARNING243
Figure 6.9: Reproduction of STDP timing and frequency results with Trevor’s
rule.
firing rate both when rewards that are not predicted occur, and when predicted
rewards do not occur (Hollerman and Schultz, 1998). Consequently, it has been
suggested that dopamine can act as a modulatory input to cortex and parts of the
basal ganglia (e.g., striatum), helping to determine when connections should be
changed in order to account for unexpected information in the environment. It is
now well-established that this kind of learning has a central role to play in how
biological systems learn to deal with contingencies in their world (Maia, 2009).
Unsurprisingly, such learning is important in SPA models that seek to explain such
behavior (see section 6.5).
So, while the source of the error signal may be quite varied, it is clear that
biological learning processes can take advantage of such information. To demon-
strate that Trevor’s rule is able to do so, figure 6.10 shows a simple example of
applying this rule to a scalar representation, with feedback error information. The
left-hand side of the figure shows the structure of a simple circuit that generates
an error signal, and the right hand side shows an example run during which the
improvement in the receiving population’s ability to represent the input signal is
evident. To keep things simple, we have included the generation of the error signal
in the network. However, the error signal could come from any source, internal or
external to the network.
This example demonstrates that the rule can learn a simple communication
channel, but of course much more sophisticated nonlinear functions are important
for biological cognition. A particularly important and sophisticated transforma-
tion in the SPA is circular convolution: recall that it is used for binding, unbind-
ing, and content sensitive control. Given the central role of convolution in the
architecture, it is natural to be concerned that it might be a highly specialized,
hand-picked, and possibly unlearnable transformation. If this was the case, that
would make the entire architecture seem much less plausible. After all, if binding
cannot be learned, we would have to tell a difficult-to-verify evolutionary story
about its origins. Fortunately, binding can be learned, and it can be learned with
this same learning rule used in to learn a communication channel.
Figure 6.11 shows the results of this rule being applied to networks with the
same structure as that shown in figure 6.10a, but which is given examples of bind-
ing of 3-dimensional vectors that generate the error signal. These simulations
demonstrate that the binding operation we have chosen is learnable using a bi-
ologically realistic, spike-time sensitive learning rule. Specifically, these results
show that the learned network is at least as good as an optimal NEF network with
the same number of cells.
This simple demonstration does not settle the issue of how binding is learned,
CHAPTER 6. BIOLOGICAL COGNITION – MEMORY AND LEARNING245
a) b)
1
input 0.8
0.6
err 0.4
Decoded Value
0.2
0
-0.2
output
-0.4
-0.6
Plastic connection -0.8 Input dimension 1
Input dimension 2
-1 Output dimension 1
Output dimension 2
Figure 6.10: A learning network structure and sample run. a) The network con-
sists of an initially random set of connections between an input and an output
population that uses Trevor’s rule to learn a desired function (dashed line). The
‘err’ population calculates the difference, or error, between the input and output
populations, and projects this signal to the output population. b) A sample run of
the network is learns a simple 2-dimensional communication channel between the
input and the output. Over time, the decoded value of the output population (solid
lines) begins to follow the decoded value of the input population (dashed lines),
meaning that they are representing the same values.
CHAPTER 6. BIOLOGICAL COGNITION – MEMORY AND LEARNING246
1
Learned
Optimal
95% confidence interval
Normalized error
0
0 50 100 150 200 250 300 350 400
Time(s)
Figure 6.11: Learning circular convolution of vectors. This graph shows the use of
Trevor’s rule to learn a multi-dimensional, nonlinear, vector function. Specifically,
the network learns the binding operator (circular convoluation) used in the SPA
on 3D input vectors. As can be seen, the learned network does as well as the one
whose weights are optimally computed using the NEF methods. (note: rate rule,
1300 neurons in control, 500 in learning... placeholder???).
CHAPTER 6. BIOLOGICAL COGNITION – MEMORY AND LEARNING247
The basal ganglia is understood to play a central role in this kind of “instru-
mental conditioning” (see Maia (2009) for a review). Instrumental conditioning
has been discussed in psychology since the 19th century, and identifies cases of an
animal changing its behavior in order to give rise to desired results. This is now
often thought of as an instance of reinforcement learning, an approach to learning
that characterizes how an agent can learn to behave so as to maximize rewards
and minimize punishment in an environment. Famously, some of the algorithms
of reinforcement learning have been mapped to specific neural signals recorded
from animals involved in instrumental conditioning (Schultz et al., 1997).
As briefly mentioned in the previous section, the neurotransmitter dopamine
is often singled out as carrying error signals relevant for such learning. More
specifically, dopaminergic neurons in the substantia niagra pars compacta (SNc)
and ventral tegmental area (VTA) have been shown to mediate learning in the
cortical-striatal projection (Calabresi et al., 2007). These neurons fire in a manner
that seems to encode a reward prediction error (Bayer and Glimcher, 2005). That
is, their firing increases if the animal gets an unexpected reward, and is suppressed
if the animal gets no reward when it is expecting one. This information is precisely
what is needed for the animal to determine how to change its future actions in
order to maximize its reward.
Conveniently, we can simply augment the previous basal ganglia model (figure
5.1) to include these additional dopamine neurons, and their relevant connections
to striatal neurons (see figure 6.12). Because dopamine is largely modulatory, the
general functioning of the model is not affected in any way, but we can now ex-
plore various means of exploiting the dopamine signal (by using our rule), to learn
new actions given environmental contingencies. Because it is a modulatory neu-
rotransmitter, we can think of dopamine as providing a channel of input separate
from excitatory cortical input, which does not directly cause neurons to fire more
or less, but rather informs the synapse how to change given recent rewards (and
recent cortical and striatal activity).
For present purposes, we can explore simple but common reinforcement learn-
ing tasks used in animal experiments called “bandit tasks”. These tasks are so
named because the rewards provided are similar to those given by slot machines
(i.e., “one-armed bandits”). For example, in a 2-armed bandit task, a rat enters a
simple maze like that shown in figure 6.13a, and must decide whether to take its
chances getting rewarded at the left or right reward sites. The reward sites have
different probabilities of providing a reward, so the animal is faced with a fairly
complex task. It is never guaranteed a reward, but instead must be sensitive to the
probability of reward.
CHAPTER 6. BIOLOGICAL COGNITION – MEMORY AND LEARNING249
modulatory
excitatory cortex
inhibitory Cortex
striatum
STN
BG output SNc/VTA
Figure 6.12: The basal ganglia with dopaminergic projections included. The sub-
stantia niagra pars compacta (SNc) and ventral tegmental area (VTA) which con-
tain efferent dopamine neurons has been added to the basal ganglia circuit from
figure 5.1. The modulatory dopamine connections are shown with dotted lines.
These are typically taken to carry information about reward prediction error, based
on current sensory information.
CHAPTER 6. BIOLOGICAL COGNITION – MEMORY AND LEARNING250
c)
MODEL
Figure 6.13: The two-armed bandit task. a) The maze used to run the task. The rat
waits at D during a delay phase. A bridge between G and A then lowers allowing
the animal to move (the go phase). The animal reaches the decision point A during
the approach phase, and turns either left or right. Reward is randomly delivered
at Rw during the reward phase. Finally, at Rt the animal returns to the delay area
during the return phase. b) Experimental data from a single trial (black line),
and a mathematical reward model (light gray). The values in brackets above the
graph indicate the probability of getting a reward at the left and right reward sites.
The animal clearly learns to favor the more rewarded site as it switches over the
course of the trial. c) The behavioral response of the spiking basal ganglia model
with learning applied in the striatum. The model also learns to favor the more
rewarded location as reward contingencies change. (Figures a) and b) adapted
from Kim et al. (2009) with permission ???get rid of gray line and colors)
CHAPTER 6. BIOLOGICAL COGNITION – MEMORY AND LEARNING251
To examine the animal’s ability to track these reward probabilities, the like-
lihood of reward is manipulated during the task. As shown in figure 6.13b, the
animal adjusts its behavior to more often select the site that is rewarded with the
greatest frequency. The model does the same thing, slowly adjusting its choices
of which way to turn to reflect learned reward contingencies. This occurs in the
model because the model adjusts which cortical state matches the precondition for
the actions by changing the connection weights between cortex and striatum (i.e.,
those that implement the Mb matrix in the model).
Because the striatum is implicated in action selection, and it receives a dopamine
signal that indicates prediction errors, it is a natural structure to record single cell
responses from during such tasks. As shown in figure 6.14a, single neurons in the
ventral striatum have responses that vary depending on which phase of the task the
animal is currently in. These same kind of responses are evident in the neurons in
the striatal cells of the model, as demonstrated in figure 6.14b.
While the task considered here is simple, the model is clearly consistent with
both behavioral and neural data from this task. Given that the model is identical
to that used for implementing much more sophisticated action selection in other
contexts (e.g. the Tower of Hanoi), these results are an encouraging indication that
the same learning rule employed here can support the learning of more sophisti-
cated state/action mappings as well. So, I take this model to show that the SPA can
naturally incorporate reinforcement signals to learn new actions in a biologically
realistic manner.
However, this is only one of many kinds of learning accomplished by the brain.
As well, this example does not show how low-level learning rules might relate
to high-level cognitive tasks. In the next section, I consider a more cognitive
application of the same rule.
a)
b)
MODEL
Figure 6.14: A comparison of spiking data from the rat striatum to the model. a)
Spike train data over several trials recorded from a single neuron in the ventral
striatum. b) A comparison to the spike trains of several neurons in a single trial
from the ventral striatum of the extended basal ganglia model. The model and data
share several properties, such as a slight decrease in firing during the delay phase,
a small increase in firing during the end of the approach phase, and vigorous firing
during the reward phase, with a decrease to delay phase levels during the return
phase. (Figure a) from Kim et al. (2009) with permission)
CHAPTER 6. BIOLOGICAL COGNITION – MEMORY AND LEARNING253
system that could learn a new rule based on past examples. I emphasized there
that no connection weights were changed in the system, despite this performance.
Instead, the system relied on a working memory to store a rule over the presen-
tation of several examples. This has the advantage of being both very rapid, and
immediately accessible for further induction. However, it has the disadvantage
that once the task is complete, the induced rule is essentially forgotten. In people,
however, effective reasoning strategies can be stored in long-term memory and
used in the very distant future.
In this section, I consider a similar kind of learning to that in the RPM, but with
an emphasis on addressing this longer-term behavior (Eliasmith, 2004, 2005c).
For variety, I consider a different task: the Wason card selection task (Wason,
1966). This task is challenging because it requires symbolic reasoning and strat-
egy changes across contexts. In the Wason task, participants are given a condi-
tional rule of the form “if P, then Q” (see figure 6.15). They are then presented
with four cards, each of which has either P or not-P on one side, and either Q or
not-Q on the other. The visible sides of each card show P, not-P, Q, and not-Q. The
task is for the participant to indicate all of the cards that would have to be flipped
over to determine whether the conditional rule is true (i.e., is being obeyed). The
logically correct answer is to select the cards showing P and not-Q.
There are interesting psychological effects in such a task. If the task is given
in an abstract form, such as “if a card has a vowel on one side, then it has an even
number on the other”, the majority of participants choose P (the vowel) and Q (the
even number) (Cheng and Holyoak, 1985; Oaksford and Chater, 1994). However,
if the task is presented using familiar content (remember that most participants
are undergraduates), such as “if a person is drinking beer, then that person must
be over 21 years old”, the majority choose P (drinking beer) and not-Q (under 21)
(Cox and Griggs, 1982). The change in mean performance of subjects is huge, for
example going from 15% correct on an abstract task to 81% correct on a familiar
version (Johnson-Laird et al., 1972). In other words, structurally identical tasks
with different contents lead to differing performance. Explaining this “content
effect” requires understanding how symbolic manipulation strategies may change
based on the content of the task.
Initial explanations of Wason task performance were based on just the famil-
iarity of the contexts. However, later explanations suggested that the distinction
between deontic and non-deontic situations was of greater importance. In deon-
tic situations the rule expresses a social or contractual obligation. In such situa-
tions, human performance often matches the logically correct choices, regardless
of familiarity or abstractness (Sperber et al., 1995). To explain this, Cosmides
CHAPTER 6. BIOLOGICAL COGNITION – MEMORY AND LEARNING254
P not P Q not Q
A B 4 3
not
drinking
drinking 25 17
Figure 6.15: The Wason selection task. Four cards are presented to the partici-
pant. Each card has information on both sides. A rule is stated that relates the
information on both sides of the cards (e.g., if there is a vowel on one side, then
there is an even number on the other side). The participant must select the cards
to flip over that confirm whether the rule is being followed. The top row shows
the general pattern of cards used in the task. Because the rules are always material
conditionals, the correct response is always to flip over the first and last cards.
However, people respond differently given different contents (showing a “content
effect”). Darkened cards indicate the most common selections.
CHAPTER 6. BIOLOGICAL COGNITION – MEMORY AND LEARNING255
N=2000
i h g D=100
perform correct answer correct
convolution and inverse rule answer
N=2000 N=2000
D=100 D=100
Figure 6.16: The Wason task network. Circles indicate neural populations (N is
the number of neurons, D is the dimensionality of the value represented by the
neurons). In this network a represents the input rule, b represents the context
(based on the semantics of a), c associates the context with a transformation (i.e.,
a reasoning strategy), d and e apply the strategy (i.e., compute the convolution of
the rule and the transformation), f is the result of the transformation (i.e., the an-
swer), g is feedback indicating the correct answer, h and i induce the relationship
between the correct answer and the rule (i.e., convolve the inverse of the answer
with the rule), and j represents the induced transformation. Arrows indicate neural
connections. Population c is recurrently connected. The dotted rectangle indicates
the population where associative learning occurs.
CHAPTER 6. BIOLOGICAL COGNITION – MEMORY AND LEARNING257
a) b)
Associative Learning for a 6D Vector Spikes for Every 10 th Neuron
1 100
Learning Recall
Similarity to Possible Responses
80
0.5
Neuron Number
60
0
40
−0.5
20
−1 0
0 0.5 1 1.5 2 0 0.5 1 1.5 2
Time (s) Time (s)
Figure 6.17: Associative learning using Trevor’s rule in a spiking network for two
6D vectors. a) The thick black line indicates the input context signal. There are
three contexts, +1, -1, and 0. The grayscale lines each indicate the represented
value of one element of the 6D vector. The dotted lines indicate the ideal answers.
During the first half of the simulation, two random vectors (coming from j in
figure 6.16) are being learned under two different contexts. During the second
half of the simulation, only the context signal is applied to the population, which
then responds with the vector encoded in that context. The network successfully
associates the context signal with a close approximation to the original random
vector. b) The spike trains of the neurons in the associative memory population
(for the first half of the run, and every tenth neuron, for legibility). There are 1000
neurons in this population. The synaptic weights are initialized to random values.
nal is the context from b. Trevor’s rule performs exactly this kind of learning for
arbitrary vector transformations. Consequently, we should not be surprised that
the model can successfully learn the mapping from context to reasoning strategy
using a biologically realistic rule. The results of performing the context-strategy
association alone is shown in figure 6.17. However, the context signal in this case
is 1-dimensional and the vector it is mapped to is only 6-dimensional for sim-
plicity. In the full model, the context is a vector derived from the semantics of
the rule to be manipulated, and the representations and transformations are 100-
dimensional.
The representations used in the model are of the form:
rule = relation ~ implies + antecedent ~ vowel + consequent ~ even
for abstract rules, and
CHAPTER 6. BIOLOGICAL COGNITION – MEMORY AND LEARNING258
for facilitated rules. Base semantic pointers are chosen randomly, and are bound
for combinatorial concepts (e.g. notOver21 = not ~ over ~ 21).
To perform the basic task, the model must learn the appropriate transformation
for each of the two contexts. Figure 6.18 shows the entire simulation. The first
half of the simulation is a training period during which the model is presented
with a rule and the appropriate response. The context is determined by examining
the content of the rule, and the ‘correct response’ is determined by direct feedback
from the environment. The model is then tested by presenting one rule at a time,
without learning or feedback. As shown, the model learns the appropriate, distinct
responses under the abstract and facilitated contexts.3
It may seem odd that the model is taught the ‘logically incorrect’ response
during the abstract context. However, I am assuming that this is plausible because
the ‘logically incorrect’ answer is learned as a result of past experiences in which a
bi-conditional interpretation of the “if-then” locution has been reinforced (Feeney
and Handley, 2000). The supposition is that when parents say, for example, “If
you eat your peas, then you get some ice cream”, they actually mean (and children
learn) a bi-conditional relation (i.e., if they don’t eat their peas, they don’t get ice
cream). In the Wason task, the ‘correct’ answer is always assumed to be a material
conditional (i.e., if they don’t eat their peas, they may or may not get ice cream).
I am assuming that the bi-conditional interpretation is a default that is used in the
abstract, unfamiliar context. In the facilitated context, the expected response is the
‘logically correct’ answer alcohol and not ~ over21, because this is assumed to
be learned through experiences where the content specific cues indicate that this
specific rule needs a material conditional interpretation. This characterization is
consistent with PRS.
Nevertheless, this kind of result is not especially exciting, since it could be
taken to merely show that we can teach a neural net-like model to memorize past
responses and reproduce them in different contexts. But, much more is going on
in this model. Specifically, we can show that here, as in the RPM model (see
section 4.6), the system has the ability to syntactically generalize within a con-
text. Syntactic generalization, as I discussed earlier, has often been identified as a
hallmark of cognition.
3 This performance is 95% reliable (N=20) over independent simulations. The variability is due
to the fact that the 25 different symbol vectors are randomly chosen at the beginning of each run.
CHAPTER 6. BIOLOGICAL COGNITION – MEMORY AND LEARNING259
vowel: 0.957 not over21: 0.957 vowel: 0.965 not over21: 0.970
even: 0.928 drink: 0.771 even: 0.941 drink: 0.783
1.2
abstract facilitated abstract facilitated
1
Similarity to Possible Responses
0.8
0.6
0.4
0.2
−0.2
−0.4
Learning on Learning off
0 1.2 2.4 3.6 4.8 6
Time (s)
Figure 6.18: Model results for learning and applying inferences across the two
contexts. The time-varying solid lines indicate the similarity of the model’s de-
coded spiking output from population f to all possible responses. The vector
names and numerical similarity values of the top two results (over the last 100ms
of that part of the task) are shown above the similarity plot. After learning, the
system reliably performs the correct, context-dependent transformation, produc-
ing the appropriate response.
CHAPTER 6. BIOLOGICAL COGNITION – MEMORY AND LEARNING260
0.4
Similarity to Possible Responses
0.3
0.2
0.1
−0.1
−0.2
−0.3
−0.4
Learning on Learning off
−0.5
0 1.2 2.4 3.6 4.8 6
Time (s)
For the second simulation, we can define several rules, at least three of which
are of the facilitated form (e.g., “if you can vote then you are over 18”). During
the learning period, two different examples of a facilitated rule are presented to
the model. Learning is then turned off, and a novel rule is presented to the model.
This rule has similar but unfamiliar content, and the same structure as the previous
rules. The results of the simulation are shown in figure 6.19. As shown, the
model is able to correctly reason about this new rule, providing the P and not-Q
response. So, the model clearly has not memorized past responses, because it has
never produced this response (i.e. not~over16+drive) in the past. This is a clear
case of syntactic generalization, since it is the syntax that is consistent between
the examples and the new rule, not the semantics.
In essence, the semantics is still being used, because it is determining that the
CHAPTER 6. BIOLOGICAL COGNITION – MEMORY AND LEARNING261
context of the new rule is similar to the already encountered rules. However, the
inference itself is driven by the syntactic structure of the rule. Multiple exam-
ples of the combination of semantics and syntax are needed to allow these to be
appropriately separated. This is consistent with the fact that performance on this
task improves with instruction that emphasizes syntax over content (Rinella et al.,
2001).
Because this model is implementing a psychological theory at the neural level,
there are several ways in which we can examine the model. For example, one key
feature of models developed with the SPA approach is robustness to both back-
ground firing noise and neuron death. Since vector representations are distributed
across a group of neurons, and since this distribution does not require a direct
mapping between particular neurons and particular values in the vector, perfor-
mance degrades gracefully with damage. To demonstrate this in the current model,
we can randomly remove neurons from the “rule and transformation” population
(population d in figure 6.16). The results shown in figure 6.20 were generated
after training, but before testing on the same task as shown in figure 6.18. On av-
erage, accurate performance occurs even with an average of 1,221 (variance=600,
N=500) neurons removed, out of 3,168.
A B
1 40
Similarity to Possible Responses
0.8 35
30
0.6
Number of Runs
25
0.4
20
0.2
15
0
10
−0.2 First error
5
−0.4 0
0 500 1000 1500 2000 2500 3000 3500 0 500 1000 1500 2000 2500 3000
Number of Neurons Destroyed Number of Neurons Destroyed Before First Error
Figure 6.20: Performance of the Wason model under random destruction of neu-
rons in population (d) in figure 6.16. A. The effect of continually running the
model while random neurons are destroyed until none are left, showing the three
symbols whose VSA representation most closely match the generated output. An
error is made if either one of the original two top responses falls below the third.
B. A histogram of the number of neurons destroyed before the first error for 500
experiments (mean=1221).
CHAPTER 6. BIOLOGICAL COGNITION – MEMORY AND LEARNING262
We can also explicitly test the robustness of the model to intrinsic noise, in-
stead of neuron death. In this case, the model functions properly with up to 42%
Gaussian noise used to randomly vary the connection weights.4 Random vari-
ation of the weights can be thought of as reflecting a combination of impreci-
sion in weight maintenance in the synapse, as well as random jitter in incoming
spikes. Eliasmith and Anderson (2003) have explored the general effects of noise
in th NEF in some detail, and shown these representations to be robust (see also
Conklin and Eliasmith (2005) for a demonstration of robustness in a model of
rat subiculum). In short, these results demonstrate a general robustness found
in SPA models, which is present largely because these models are based on the
NEF, and hence deploy distributed representations. Conveniently, such biologi-
cally plausible robustness allows close comparison of NEF and SPA models to ei-
ther micro-lesion (ref macneil???) or large-scale lesion data (ref???hemingelect)
if it is available.
In many ways, the robustness of the model is a result of its low-level neural im-
plementation. But, as mentioned earlier, the model is also implementing a cogni-
tive hypothesis (i.e., that domain general mechanisms can learn context-sensitive
reasoning). As a result, the model can also be examined in a very different way,
relating it to more traditionally “cognitive” data. For example, we can use the
model to gain some insight into how humans are expected to perform given exam-
ples of novel syntactic transformations. Qualitatively, the model is consistent with
evidence of practice effects in reasoning tasks (Rinella et al., 2001). However, we
can also more specifically examine performance of the model as a function of the
number of examples. In table 6.1 we can see that generalization improves rapidly
with the first three examples, and then improves less quickly. We can thus predict
that this would translate behaviorally as more rapid and more consistent responses
after three examples compared to one example.
Beyond these behavioural predictions, the model also embodies a specific pre-
diction for cognitive neuroscience: namely, that the central difference between
the abstract and content facilitated versions of the Wason task is the context signal
being provided from the VMPFC, not the mechanism of reasoning itself. This
is very different from the SCT suggestion that evolutionarily distinct reasoning
mechanisms are necessary to account for performance differences on these two
versions of the task (Cosmides, 1989). Experiments performed after the presen-
tation of an early version of this model (Eliasmith, 2004) have highlighted the
4 Each
synaptic connection weight was independently scaled by a factor chosen from N(1,σ 2 ).
The model continued to function correctly up to σ 2 =0.42.
CHAPTER 6. BIOLOGICAL COGNITION – MEMORY AND LEARNING263
• Add a termination to the ‘error’ ensemble. Name the termination ‘post’, set
the input dimension of this termination to 1, and set the connection weight
to -1.
• Project from the ‘pre’ ensemble to the ‘pre’ termination on the ‘error’ en-
semble and likewise project from the ‘post’ ensemble to the ‘post’ termina-
tion.
The two terminations added to the ‘error’ ensemble will subtract the value of the
‘post’ ensemble from the value of the ‘pre’ ensemble, resulting in a representation
of the difference being stored in ‘error’. Minimizing this difference in values
will push the ‘post’ ensemble towards representing the same value as the ‘pre’
ensemble, creating a communication channel.
To test this channel, we need to add a few extra elements to the network. First,
we’ll create an input function to generate data to send through the communication
channel. Then we’ll add a gate to the ‘error’ network that will allow us to switch
learning on and off and, finally, we’ll add a second error population that will
calculate error directly (not using neural computations).
• Create a new function input. Name the input ‘input’, set Output Dimensions
to 1 and click the ‘Set Functions’ button.
• From the drop-down list, select ‘Fourier Function’ and click Set.
• Set Fundamental to 0.1, Cutoff to 10, RMS to 0.5, and Seed to an integer.
• Right-click the created function and select ‘Plot function’. Set the index to
0, start to 0, increment to 0.01, and end to 10.
The function that is graphed will provide the randomly varying input our commu-
nication channel. The function is defined in terms of its frequency components,
but the details of these aren’t important as the function only needs to provide inter-
esting and variable output. The Seed parameter can be changed to draw different
random values.
• Close the graph of the input function and add a decoded termination to the
‘pre’ ensemble. The termination should have one dimension and a connec-
tion weight of 1.
• Add a gate to the network. Name it ‘gate’, set the gated ensemble to ‘error’,
set the number of neurons to 100, and set the time constant to 0.01.
• Add a constant function named ‘switch’, with a default value of 0. You will
need to click the ‘Set Functions’ button again to switch from using a Fourier
Function to a Constant Function.
• Create a new ensemble named ‘actual error’. Set the number of nodes,
dimensions, and radius each to 1.
• Click the ‘actual error’ ensemble and, with this ensemble selected, use the
drop-down menu at the center of Nengo’s top menu bar to change the sim-
ulation mode from spiking to direct.
• Add a ‘pre’ termination to the ‘actual error’ population that has a connection
weight of 1 and a ‘post’ termination that has a connection weight of -1 (as
before, with the ‘error’ population).
• Project from the ‘pre’ and ‘post’ populations to their respective terminations
on the ‘actual error’ population.
The interactive plots viewer should display value plots for each of the ensembles
in the network, including our single-neuron ‘actual error’ ensemble that is running
in direct mode. The label for the ‘actual error’ ensemble itself isn’t shown in the
default layout since this ensemble isn’t really part of the network but has been
added to help analyze the system. The connection weight grid, located in the
bottom-right corner of the viewer, is a visualization of connections between the
‘pre’ and ‘post’ ensembles. Red squares represent excitatory connections and blue
squares represent inhibitory ones; the intensity of the colour indicates the strength
of the connection. The slider control in the top-left corner can be used to toggle
the gate on the error population. Since we set the value of the ‘switch’ function
to zero, the gate is initially closed, no error signal will be created, and thus no
learning will occur.
• Run the simulation.
The ‘actual error’ plot should shown an unsurprisingly large amount of error be-
tween the two populations since no learning is occurring. The connection weights
values should also not change significantly in the grid visualization.
• Reset the simulation.
• Set the value of the ‘switch’ input to 1 using the slider.
• Run the simulation.
After a short initial period of learning, the error reported in this simulation should
be noticeably smaller and more stable than the error without learning. Looking
at the graphs of the ‘pre’ and ‘post’ values should also reveal that the value of
‘post’ does roughly track the value of ‘pre’. There’s a good chance that you will
still not be able to see the connection weights change in the grid. Most of the
large weight changes occur in the first second of the simulation and the gradual
changes are difficult to discern visually. To see the differences in weights, rewind
the simulation to the beginning and drag the time slider quickly from the start
of the simulation (after the initial connection weights are displayed) to the last
recorded time. You should be able to notice some differences, although they may
still be subtle.
Extending the learning network to learn functions only requires changing the
error signal used to modulate the plastic connection.
• Create a new decoded origin on the ‘pre’ ensemble. Set its name to ‘square’
and the number of output dimensions to 1, then press ‘Set Functions’.
CHAPTER 6. BIOLOGICAL COGNITION – MEMORY AND LEARNING268
• Set the function to ‘User-defined Function’, click Set, and type ‘x0*x0’ as
the expression. Complete the origin with default values.
• Remove the projections from the ‘X’ origin of ‘pre’ to the two error ensem-
bles, and replace them with projections from the ‘square’ origin.
• Run the simulation in Interactive Plots, with the ‘switch’ input set to 1.
The ‘square’ origin is computing the squared value of the variable represented by
the ‘pre’ ensemble. This transformation is identical to the example we tested in
the section 3.8 tutorial, but we’re using it here to generate an error signal that will
direct learning between the ‘pre’ and ‘post’ ensembles. We could certainly have
projected directly from the ‘pre’ ensemble to a decoded termination on the ‘post’
ensemble instead of setting up the learned connection, so this example is a bit of
a ‘toy’ demonstration of the capabilities of the learning technique as opposed to
a model of real biological processes. Nonetheless, you will likely encounter one
interesting effect while running the simulation that is a direct result of learning
with synaptic weights: namely, it may take the network a rather long time to cor-
rect its earlier training. The weights on the ‘post’ network were not re-initialized
to random values before we ran the latest simulation, so they will still retain the
learned weights from the communication channel. Negative values in particular
are mapped to very different values under the communication channel and squar-
ing training schemes, so expect these values to have noticeable errors until they
are retrained.
Chapter 7
269
CHAPTER 7. THE SEMANTIC POINTER ARCHITECTURE (SPA) 270
transformation
neural ensemble
error signal
symbolic icon of
compress dereference this subsystem
action selection SP
SP
SP
interface
internal subsystem
semantic pointer
hierarchy
auditory
selection
Figure 7.2: A schema of Semantic Pointer Architecture models. This figure con-
sists of several interacting subsystems depicted in figure 7.1. Dark black lines
represent projections between parts of the system, while thinner lines indicate
control or error signals. An ‘internal subsystem’ may be concerned with functions
like working memory, encoding a conceptual hierarchy, etc. Not all information
flow is captured by this schema (e.g., many error signals are not present).
Not all models will have all of the same components, though ideally no models
will have conflicting components. My intent is, after all, to provide a means of
constructing unified models of biological cognition. Filling in such a schema is
essentially how I think we ought to build a brain.
That being said, it should be evident that my description of the SPA does not
make this a simple matter. The SPA is clearly not yet detailed enough to specify
a complete (or even stable) set of subsystems and their functions. It may even
be helpful to think of the SPA as providing something more like a protocol de-
scription than an architecture – it describes how neural systems can communicate
and control the flow of information. However, the previous chapters also include
specific commitments to particular processes occurring in identified parts of the
system, which is why I have chosen to call it an architecture.
It is, however, an architecture in development (see section 10.2 for further dis-
cussion). It is thus more useful to provide example applications of the architecture
than an application independent specification, as I have been doing throughout
the book. And, in the next section I provide the final example in the book. Here,
unlike with past examples, my purpose is to bring together the parts of the archi-
CHAPTER 7. THE SEMANTIC POINTER ARCHITECTURE (SPA) 272
a) Input
0 1 2 3 4 5 6 7 8 9
W R L M C A V F
[ ] ? P K
b) Output
0 1 2 3 4
5 6 7 8 9
Figure 7.3: Input and output with examples for the Spaun model. a) The complete
set of possible inputs with hand-drawn examples for each symbol. Numbers are
used as the conceptual domain. Letters on the second row indicate which task is
being tested: W for copy drawing; R for recognition; L for reinforcement learn-
ing; M for serial memory; C for counting; A for question answering; V for rapid
variable creation; F for fluid reasoning. The symbols on the third row are for task
control: the brackets are used to indicate grouping; the question mark is used to
indicate that a response is expected; and the letters are question modifiers where
P is for ‘position’ and K is for ‘kind’. b) The complete set of output with Spaun
drawn examples of each kind of symbol. The thin solid lines indicate prepara-
tory movements for which the ‘pen’ is not down. Large dots indicate the starting
position of the pen-down movements.
CHAPTER 7. THE SEMANTIC POINTER ARCHITECTURE (SPA) 274
Table 7.1: A description and justification for the eight example tasks performed
by Spaun.
Name Description Justification
Copy drawing The system must reproduce Demonstrates that low-level
the visual details of the input visual information is
(e.g. different 3s might look preserved and accessible to
different). drive the motor response.
Recognition The system must identify the Demonstrates that input
variable input as a member of variability can be accounted
a category. for. Allows the generation of
stable concepts.
Reinforcement The system must learn Demonstrates reinforcement
learning environmental reward learning in the context of a
contingencies. cognitive model.
Serial memory The system must memorize Demonstrates the integration
and reproduce an ordered set of serial memory into a
of items. cognitive model. Needed for
several other more complex
tasks.
Counting The system must be able to Demonstrates flexible action
count from a presented selection. Demonstrate
starting position for a given knowledge of the conceptual
number of steps. domain.
Question The system must answer Demonstrates the ability to
answering questions about the contents construct (bind), manipulate,
of structured representations and extract various
in working memory. information from internal
representations.
Rapid variable The system must identify Demonstrates a neural
creation syntactic patterns, and architecture able to meet
respond to them without these cognitive demands.
connection weight changes.
Fluid The system must solve Demonstrates internal
reasoning problems analogous to those generation of a solution to a
on the Raven’s Progressive challenging fluid reasoning
Matrices. task.
CHAPTER 7. THE SEMANTIC POINTER ARCHITECTURE (SPA) 275
Episodic Working
Memory Memory
Acon Selecon
Figure 7.4: The Spaun architecture. Thick lines indicate possible information
flow between elements of cortex, thin lines indicate flow of information between
the action selection mechanism and cortex, and rounded boxes indicate control
states that can be manipulated to control the flow information within and between
subsystems. See text for details.
and context. The action selection mechanism is the basal ganglia model described
throughout chapter 5. The five subsystems, from left to right, are used to: 1) map
the visual hierarchy output to a conceptual representation; 2) encode structure in
the input through binding; 3) transform the input to generate a response appropri-
ate to the task; 4) decode the result of model processing to a temporally ordered
output; 5) map each output item to a semantic pointer that can be used to drive
the motor hierarchy. Several of these subsystems have multiple components able
to perform the identified function. For instance, the working memory subsystem
includes six independently manipulable memories, each of which can store a se-
mantic pointer. This is essential for independently tracking input, output, task
state, and so on. Additional details necessary to fully re-implement the architec-
ture are available in (choo & eliasmith, 2011, online tech report???). The model
can also be downloaded from ???.
Notice that the architecture is not determined by the set of tasks being im-
plemented, but rather captures more general information processing constraints.
This helps make the model extensible to a wide variety of tasks, beyond those
considered here. As a result, I believe Spaun demonstrates that the SPA provides
a general method for building flexible, adaptive and biologically realistic models
of the brain. Before returning to this point, let me turn to consideration of the
specific tasks identified earlier.
7.3 Tasks
Each of the following sections describes one of the tasks in more detail, showing
input and output examples, and providing summary performance information as
appropriate.
Spaun categorizes the digit, and then uses a map for that category to determine
how semantic pointers of that category result in an ordered set of coordinates in
2D space that drive the motor system. The mapping for each category is linear,
and is learned based on example digits/motor command pairs (we use 5 pairs for
each digit).
• figure of examples of input/output for all numbers
• compute average similarity of standard drawing to input vs. average sim-
ilarity of copy drawing to input using some measure (dot product?) and
report for each number
Notably, the reason the mapping from the visual semantic pointer to the motor
semantic pointer is linear is because of the sophistication of the compression and
decompression algorithms built into the perceptual and motor systems. In general,
it would be incredibly difficult to attempt to map directly from pixel-space to
muscle-space.
Spaun’s ability to perform the copy drawing task demonstrates that it has ac-
cess to the low-level perceptual (i.e. deep semantic) features of the representations
it employs, and that it can use them to drive behavior.
7.3.2 Recognition
The recognition task requires Spaun to classify its input, using the classification
to generate a motor response that reproduces Spaun’s default written digit. To
perform classification, Spaun takes the 50D semantic pointer produced by the vi-
sual hierarchy, and passes it through an associative, clean-up-like structure which
maps it to a canonical semantic pointer. This pointer is then routed to the motor
system by the basal ganglia, which then maps it to a control signal appropriate for
drawing the relevant number.
• show example successes
• show example failures (any ’i don’t knows’ or non-responses? probably
not)
• determine percent accuracy over non-trained set
Successful recognition of hand-written input demonstrates Spaun’s ability to ‘see
through’ the variability often encountered in naturalistic input. In short, it shows
that variable input can be categorized, and that those categories can effectively
drive action.
CHAPTER 7. THE SEMANTIC POINTER ARCHITECTURE (SPA) 278
the form:
where the γ and ρ parameters are set to the same values as previously (see section
6.3).
Typical input for this task would be ‘3 2 4 2 3 <variable pause> K ?’, where
the letter at the end of the list indicates if the list should be reproduced backwards
(‘K’) or forwards (‘P’).
• figure showing the input and reproduction for a list with/without errors
The generation of human-like memory response curves in Spaun show that its
working memory performance maps well to known behavior. Because working
memory is central to several subsequent tasks, this shows that Spaun can perform
these tasks with appropriate limitations.
7.3.5 Counting
The counting task demonstrates simple learned sequences of action, similar to
the examples in section 5.4 and 5.6. One interesting difference is that the internal
representations themselves contain more structure. Specifically, the representation
for numbers are constructed through recursive binding. This proceeds as follows:
2. Construct the next number in the sequence through self-binding, i.e. two =
one ~ one
3. Repeat
This process will construct distinct but related vectors for numbers. Crucially, this
method of constructing number vectors encodes the sequence information asso-
ciated with numerical concepts. The relationship of those conceptual properties
to visual and motor representations is captured by the clean-up-like associative
1A unitary vector is one which maintains a length of 1 when convolved with itself.
CHAPTER 7. THE SEMANTIC POINTER ARCHITECTURE (SPA) 280
memory introduced earlier that maps these vectors onto their visual and motor
counter-parts (and vice versa). As a result, visual semantic pointers are associated
with conceptual semantic pointers, which are conceptually inter-related through
recursive binding. This representation serves to capture both the conceptual se-
mantic space and the visual semantic space, and link them appropriately. I should
note that these are the representations of numbers used in all previous tasks as
well.
The input presented to the model for the counting task consists of a starting
number and a number of positions to count, e.g. ‘2 3 ?’. The model then generates
its response, e.g. ‘5’. This kind of silent counting could also be thought of as an
‘adding’ task. In any case, given the representations described above, counting is
a matter of recognizing the input and then having the basal ganglia bind the base
unitary vector with that input the given number of times to generate the final num-
ber in the sequence. Given a functioning recognition system, working memory,
and action selection mechanism, the task is relatively straight forward.
Successful counting demonstrates the flexible action selection that is a central part
of Spaun’s behavior. It also shows that the model has an understanding of order
relations over numbers, and can exploit that knowledge to produce appropriate
responses.
of the possible inputs must be recognizable in this task context. After recognition,
the items are sent to the serial working memory for encoding into an internal
representation. Basal ganglia controls the processing, based on currently available
visual input, and so knows that input after ‘]’ indicates the kind of question to be
asked. This results in setting up the appropriate routing within cortex for applying
the next number input to the memorized representation appropriately.
• percent correct for each kind of question as the length of the input varies?
The ability of Spaun to answer questions about structured input shows that it has
various kinds of access to the information it encodes.
Training Set
Test Case
Hadley suggests that this task requires rapid variable creation because the second
last item in the list can take on any form, but human cognizers can nevertheless
CHAPTER 7. THE SEMANTIC POINTER ARCHITECTURE (SPA) 282
identify the overall syntactic structure and identify ‘quoggie zarple’ as the appro-
priate response. So it seems that a variable has been created, which can receive
any particular content, and that will not disrupt generalization performance.
Hadley argues that learning rules cannot be used to solve this problem, given
the known constraints on the speed of biological learning. Weight changes do
not happen quickly enough to explain how these brief inputs can be successfully
generalized within seconds. He also argues that approaches such as neural black-
board architectures (NBAs; see section 9.1.3) would need to pre-wire the appro-
priate syntactic structures. Finally, he suggests as well that VSAs cannot solve
this problem. This is because, as he correctly notes, the binding operation in
VSAs will not provide a syntactic generalization. However, the SPA has addi-
tional resources, most importantly it has inductive mechanisms employed both
in the Raven’s example (see section 4.6), and the Wason example (section 6.6).
The Raven’s example, in particular, did not rely on changing neural connection
weights to perform inductive inference. As a result, we can use this same mecha-
nism to perform the rapid variable creation task.
To test rapid variable creation, we can employ the same representations used
in the previous task. For example, an input might be ‘[ 1 1 2 3 ] [ 2 3 ] [1 1 5
3 ] [ 5 3 ] [1 1 6 3 ] [ 6 3 ] [1 1 1 3 ] ?’. The expected response in this case is
‘1 3’. The parallel to the previous example with words should be clear. The first
item is a constant repeated twice, the third item is a variable, and the last item is a
constant. Spaun is able to solve this task by performing induction on the presented
input/output pairs to infer the transformation rule that is being applied to the input,
just as in the Raven’s task.
• figures showing responses to various inputs; including tasks that are more
difficult than those considered by Badley, i.e. multiple variables.
Spaun’s ability to successfully perform the rapid variable creation task demon-
strates that the non-classical assumptions behind the model do not prevent it from
reproducing this important cognitive behavior. In other words, it shows that bio-
logically realistic, connectionist-like approaches can solve this outstanding prob-
lem.
CHAPTER 7. THE SEMANTIC POINTER ARCHITECTURE (SPA) 283
7.3.9 Discussion
While the above set of tasks are defined over limited input and a simple seman-
tic space, I believe they demonstrate the unique potential of the SPA to provide
a general method for understanding biological cognition. Notably, it is straight-
forward to add other tasks into the Spaun architecture that can be defined over
structured representations in the same semantic space. For instance, including ad-
ditional forms of recall, backwards counting, n-back tasks, pattern induction, and
so on would result from minor additions to the state-action mappings in the basal
ganglia.
More importantly, the methods used to construct this simple model are not
restricted to this particular semantic or conceptual space, and hence increases in
the complexity of the input structure, conceptual structure, and output modality
are direct extensions of the same principles used in this simple case. Overall, the
purpose of Spaun is to show how general purpose SPA models can be generated
using the principles described throughout the book.
CHAPTER 7. THE SEMANTIC POINTER ARCHITECTURE (SPA) 284
• figure showing the effects of asking for answers at weird times, interrupting
the task and going on to a new one, etc.?
Of course, Spaun is also a meagre beginning. There are many more ways it could
be unified, flexible, and robust. I discuss several of these in section 10, which
discusses challenges for the SPA. Nevertheless, it is a reasonable beginning and,
as I explore in subsequent chapters, I believe it has several advantages over past
approaches. However, one of its advantages is somewhat subtle, and does not
fit naturally into discussions of current approaches to cognitive modelling. So, I
consider it next.
CHAPTER 7. THE SEMANTIC POINTER ARCHITECTURE (SPA) 285
a) b)
tex x tex
cor cor
x
Mc
Mb
max(y)
y
thalamus thalamus
basal ganglia basal ganglia
vector-based view of the SPA, we would say that the semantic pointer (qua vector)
is measured against familiar states in the basal ganglia, resulting in the selection
of a new cortical state most appropriate under the given measure.
Interestingly, because the SPA employs high-dimensional vectors as its cen-
tral representations, this exact same process has a very natural statistical interpre-
tation, as shown in figure 7.5b. Mathematically, all objects (scalars, functions,
fields, etc.) can be treated as vectors. So, we could just simply stipulate that
semantic pointer vectors are probability distributions. But, this does not provide
us with an actual interpretation, mapping specific distributions onto brain states.
Consequently, we must be careful about our probabilistic interpretation, ensuring
the already identified neural computations are still employed, and that the distri-
butions themselves make biological sense.
Perhaps surprisingly, the interpretation provided here lets us understand the
cortical-basal ganglia-thalamus loop as implementing a standard kind of Bayesian
inference algorithm known as Empirical Bayes. Specifically, this algorithm re-
lates perceptual observations (X), parameters of a statistical model of that input
(θ ), and so-called “hyperparameters” (α) that help pick out an appropriate set
of model parameters. Essentially, identification of the hyperparameters allow the
brain to determine a posterior distribution on the model parameters based on the
observation. We can think of this distribution as acting like a prior when relating
CHAPTER 7. THE SEMANTIC POINTER ARCHITECTURE (SPA) 288
a constant, and because it is very difficult to compute. In short, it does not change the answer we
compute except by a scaling factor, which is typically not important.
5 I am assuming αand X are conditionally independent given θ .
CHAPTER 7. THE SEMANTIC POINTER ARCHITECTURE (SPA) 289
where c is the motor command to execute, and so ρ(c|θ ) would encode the previ-
ously learned mapping between a model of the world and motor commands. This
characterization preserves a natural role for the basal ganglia in action selection.
More generally, the probabilistic interpretation of the basal ganglia processing
loop as an iterative inference process is consistent with all aspects of my previous
discussions of the basal ganglia.6
6 Three brief technical points for those who might be concerned that a probabilistic character-
ization assumes very different cortical transformations than before. First, the convolution of two
probability distributions is the distribution of the sum of the original variables, so the previous
binding operation is well characterized probabilistically. Second, the superposition and binding
operations do not significantly alter the length of the processed variables. Specifically, the ex-
pected length of two unit length bound variables is one, and the superposition is well-normalized
by the saturation of neurons. Third, the multiplication and integration operation I have used several
times is identical to matrix-vector multiplication.
CHAPTER 7. THE SEMANTIC POINTER ARCHITECTURE (SPA) 290
Let me step away from the formalism for a moment to reiterate what I take
to be the point of this exercise in re-interpretation of the cortex-basal ganglia-
thalamus loop. Recall that a central distinction between symbolicism and con-
nectionism is that the former has strengths in language-like rule processing and
the latter has strengths in probabilistic inference. I have spent most of the book
showing how semantic pointers can be understood as implementing language-
like rule processing, and am here reiterating that they can also be understood as
implementing statistical inference (I had suggested this only for perception and
motor control earlier, in chapter 3). As a consequence, I believe that these two
consistent and simultaneously applicable interpretations of the SPA highlight a
fundamental unification of symbolic and connectionist approaches in this archi-
tecture. And, it begins to suggest that the SPA might be in a position to combine
the strengths of past approaches. Furthermore, this kind of unification is not avail-
able to approaches, like ACT-R, which do not preserve the internal structure of the
representations employed.
This, however, is only one preliminary comparison of the SPA to past work.
In the second part of the book, I provide a more thorough comparison.
ing Nengo with Matlab, please see the online tutorials at http://compneuro.
uwaterloo.ca/cnrglab/?q=node/2.
• Open a blank Nengo workspace and press Ctrl+P to open the scripting
console.
7 d e f B( s t a t e = ’B ’ ) :
8 s e t ( s t a t e = ’C ’ )
9 d e f C( s t a t e = ’C ’ ) :
10 s e t ( s t a t e = ’D ’ )
11 d e f D( s t a t e = ’D ’ ) :
12 s e t ( s t a t e = ’E ’ )
13 d e f E ( s t a t e = ’E ’ ) :
14 s e t ( s t a t e = ’A ’ )
15
16 c l a s s S e q u e n c e ( SPA ) :
17 d i m e n s i o n s =16
18 s t a t e =Buffer ()
19 BG= B a s a l G a n g l i a ( R u l e s ( ) )
20 t h a l = Thalamus (BG)
21 i n p u t = I n p u t ( 0 . 1 , s t a t e = ’D ’ )
22
23 s e q = S e q u e n c e ( )
The first line of the file ties the demo script to other script files that handle
interactions with Nengo and enable basic scripting functions. The second line
uses the variable D to store the number of dimensions we’d like each ensemble
in the network to represent. The next block of code defines the ‘Rules’ class
which sets the rules that the basal ganglia and thalamus will instantiate. The rule
definitions all follow the following format:
d e f RuleName ( i n p u t S t a t e = ’ I n p u t S P ’ ) :
s e t ( o u t p u t S t a t e = ’ OutputSP ’ )
RuleName sets the name of the rule that will appear in the interactive graphs,
InputSP is the name of a semantic pointer in the vocabulary that an input from
the enemble named inputState must match for the rule to fire, and OutputSP is
the name of a semantic pointer in the vocabulary that is sent to the ensemble
outputState when the rule fires. Note that in the spa_sequence.py demo, both
inputState and outputState are set to the same ensemble, which is simply named
‘state’. The rules given in the demo specify a chain of rules that loop through
the states; when ‘A’ is the dominant state, ‘B’ is sent as an input, when ‘B’ is the
dominant state, ‘C’ is sent, and so on.
The ‘Sequence’ class which is defined below the ‘Rules’ class declares four
objects: a buffer, a basal ganglia component, a thalamus component, and an input.
The buffer is an integrator that serves as a generic memory component and it
CHAPTER 7. THE SEMANTIC POINTER ARCHITECTURE (SPA) 293
is assigned to a variable named ‘state’. The basal ganglia and thalamus act as
previously described, but note that the class ‘Rules’ is invoked within the brackets
of the basal ganglia constructor on line 19 of the listing and the name of the basal
ganglia variable is given on the following line when the thalamus is created. These
steps are required to create basal ganglia and thalamus components that conform
to the specified rules. The last object is an input function. The syntax for creating
an input function is as follows:
I n p u t ( d u r a t i o n , t a r g e t = ’SP ’ )
The first parameter, duration, specifies how many seconds the input will be
presented for, the parameter target gives the population that will receive input
from the function, and SP is the name of the semantic pointer that will be given as
input. The four objects created in the ‘Sequence’ class are assigned to variables
named ‘state’, ‘BG’, ‘thal’, and ‘input’ and these variable names are used to name
the items created in the Nengo workspace. The final line of the script runs the
‘Sequence’ class, which generates all the network objects described above and
connects them together to form the ‘Sequence’ network.
Since we have a script that will generate the network automatically, it is easy
to make adjustments to the network and test them quickly. For example, we can
replicate the result shown in figure 5.7 of section 5.4 in which the sequence is
interupted by constant input.
• Edit the line declaring the input object to read ‘input=Input (10, state =’A’)’.
This is line 20 of the program listing above.
• Open the network with Interactive Plots and run the simulation.
The input to the state buffer is now presented for ten seconds instead of the pre-
vious 0.1 seconds. As reported in chapter 5, this prevents the activation of the
second rule in the sequence despite the high utility of the rule because the input
value drives the network back to its initial state. The solution described in the
CHAPTER 7. THE SEMANTIC POINTER ARCHITECTURE (SPA) 294
295
Chapter 8
8.1 Introduction
To this point, I have completed my description of the functional and method-
ological ideas behind the Semantic Pointer Architecture. Along the way, I have
mentioned several unique or particularly crucial features of the SPA and provided
a variety of examples to highlight these attributes. However, I have not had much
opportunity to contextualize the SPA in the terrain of cognitive theories more gen-
erally. This is the task I turn to in the next three chapters.
As you may recall from chapter 1, cognitive science has been dominated by
three main views of cognitive function: symbolicism, connectionism, and dynam-
icism. In describing and comparing these views, I argued for the surprising find-
ing that despite occasionally harsh exchanges between proponents of these views,
there is a core consensus on criteria for identifying cognitive systems. I dubbed
this core the Quintessential Cognitive Criteria (QCC; see table 8.1), and briefly
enumerated them at the end of section 1.3. At the time, I made no attempt to jus-
tify this particular choice of criteria, so I could turn directly to the presentation of
the SPA.
In this chapter, I return to the QCC to make it clear why each is crucial to
evaluating cognitive theories. In the next chapter, I briefly describe several other
architectures for constructing cognitive models to both introduce the state-of-the-
art, and to provide an appropriate context for evaluating the SPA. This allows me
to highlight similarities and differences between the SPA and past approaches.
In the final chapter, I briefly address several ramifications of the SPA regarding
central cognitive notions including ‘representation’, ‘dynamics’, ‘inference’, and
296
CHAPTER 8. EVALUATING COGNITIVE THEORIES 297
‘concept’.
8.2.1.1 Systematicity
The fact that cognition is systematic – that there is a necessary connection between
some thoughts and others – has been recognized in several different ways. Gareth
Evans (1982), for instance, identified what he called the Generality Constraint. In
short, this is the idea that if we can ascribe a property to an object, then we can
ascribe that same property to other objects, and other properties to the original
object (e.g., anything ‘left of’ something can also be ‘right of’ something). For
Fodor and Pylyshyn (1988a), the same observation is captured by their system-
aticity argument. This argument is, in brief, that any representational capacities
we ascribe to a cognitive agent must be able to explain why our thoughts are sys-
tematic. They must explain, in other words, why if we can think the thought “dogs
chase cats”, then we can necessarily also think the thought that “cats chase dogs”.
They argue that a syntactically and semantically combinatorial language is able
to satisfy this constraint. A combinatorial language is one which constructs sen-
tences like “dogs chase cats” by combining atomic symbols like “dogs”, “chase”,
and “cats”. As a result, if one of those symbols is removed from the language,
a whole range of sentence-level representations are systematically affected. For
instance, if we remove the symbol “cats” then we can think neither the thought
“dogs chase cats” nor the thought that “cats chase dogs”.
While many non-classicist researchers disagree with the assumption that only
a combinatorial language with atomic symbols can explain human systematicity,
the observation that our thoughts are systematic is widely accepted. This is likely
because systematicity is so clearly evident in human natural language. As a result,
whatever representational commitments our cognitive architecture has, it must be
able to describe systems that are appropriately systematic.
CHAPTER 8. EVALUATING COGNITIVE THEORIES 299
8.2.1.2 Compositionality
cognitive systems sometimes draw on a wealth of experience about how the world
works when interpreting compositional structures. Although we have much to
learn about human semantic processing, what we do know suggests that such pro-
cessing can involve most of cortex (Aziz-Zadeh and Damasio, 2008), and can rely
on information about an object’s typical real-world spatial, temporal, and rela-
tional properties (Barsalou, 2009).
I want to be clear about what I am arguing here: my contention is simply that
ideal compositionality is not always satisfied by cognitive systems. There are, of
course, many cases where the semantics of a locution, e.g. “brown cow”, is best
explained by the simple, idealized notion of compositionality suggested by Fodor
and Pylyshyn. The problem is that the idealization misses a lot of data (such as
the examples provided). Just how many examples are missed can be debated, but
in the end an understanding of compositionality that misses fewer should clearly
be preferred.
Thus, determining how well an architecture can define systems that meet the
(non-idealized) compositionality criteria will be closely related to determining
how the architecture describes the processing of novel, complex representations.
We should be impressed by an architecture which defines systems that can provide
appropriate interpretations that draw on sophisticated models of how the world
works – but only when necessary. That same theory needs to capture the simple
cases as well. No doubt, this is one of the more challenging, yet important, criteria
for any architecture to address.
In sum, compositionality is clearly important, but it is not only the simple
compositionality attributed to formal languages that we must capture with our
cognitive models. Instead, the subtle, complex compositionality displayed by real
cognitive systems must be accounted for. The complexity of compositionality thus
comes in degrees, none of which should be idealized away. An architecture that
provides a unified description spanning the observed degrees of compositional
complexity will satisfy this criteria best.
8.2.1.3 Productivity
third central feature of cognition. Similarly, Jackendoff (2002) identifies the third
challenge for cognitive theories as one of explaining the “problem of variables”.
This is the problem of having grammatical templates, in which any of a variety of
words can play a valid role. Typically, these variables are constrained to certain
classes of words (e.g. noun phrases, verbs, etc.). Despite such constraints, there
remains a huge, possibly infinite, combination of possibly valid fillers in some
such templates. As a result, Jackendoff sees the existence of these variables as
giving rise to the observed productivity of natural language.
The claim that productivity is central for characterizing cognition has received
widespread support. Productivity is, however, clearly an idealization of the per-
formance of any actual cognitive system. No real cognitive system can truly re-
alize an infinite variety of sentences. Typically, the productivity of the formalism
used to describe a grammar outstrips the actual productivity of a realized system.
For instance, there are sentences which are “grammatically” valid that are neither
produced nor understood by speakers of natural language. Examples of such sen-
tences can be generated using a grammatical construction called “recursive center
embedding”. An easily understood sentence with one such embedding is “The
dog the girl nuzzled chased the boy.” A difficult to understand example is “The
dog the girl the boy bit nuzzled chased the boy”, which has two embeddings. As
we continue to increase the number of embeddings, all meaning becomes lost to
natural language speakers. The reason, of course, is that real systems have finite
resources.
The two most obvious resource limitations are time and memory. These are,
in fact, intimately linked. As we try to cram more things into memory, two things
happen: we run out of space; and what is in memory is forgotten more quickly.1
So, when we are trying to process a complex sentence in “real-time”, as we keep
adding items to memory (such as embedded clauses), we eventually run out of
resources and are not able to make sense of the input. Cognitive systems are
limited in their capacity to produce and comprehend input.
In short, real-world cognitive systems have a limited form of productivity: one
which allows for a high degree of representational flexibility, and supports the gen-
eration and comprehension of fairly complex syntactic structures. While cognitive
systems are representationally powerful, perfect productivity is not a reasonable
expectation, and hence is not a true constraint on cognitive architectures. It is,
1 I’m thinking here mainly of working memory, which is most relevant for processing informa-
tion on the time scale of seconds. However, similar resource constraints exist for both briefer and
more extended forms of memory.
CHAPTER 8. EVALUATING COGNITIVE THEORIES 302
Jackendoff (2002) identifies as his first challenge for cognitive modelling “the
massiveness of the binding problem”. He argues that the binding problem, well-
known from the literature on perceptual binding, is much more severe in the case
of cognitive representations. This is because to construct a complex syntactic
structure, many “parts” must be combined to produce the “whole”. Jackendoff
provides simple examples which demand the real-time generation of several types
of structure, including phonological, syntactic, and semantic structure. He also
notes that each of these structural parts must be inter-related, compounding the
amount of binding that must be done. It is clear to Jackendoff that traditional
accounts of perceptual binding (e.g. synchrony) have not been designed for this
degree of structural complexity (even if, as he acknowledges, such complexity
could be found in a complex visual scene).
Given that there is a massive amount of binding in cognitive systems, is not
surprising that in some circumstances the same item may be bound more than
once. Jackendoff identifies this as a separate challenge for connectionist imple-
mentations of cognition: “the problem of two”. Jackendoff’s example of a repre-
sentation that highlights the problem of two is “The little star’s beside a big star”
(p.5). Identifying this as a problem is inspired by the nature of past connectionist
suggestions for how to represent language that are familiar to Jackendoff. Such
suggestions typically ascribe the representation of a particular word, or concept,
to a group of neurons. Thus, reasoned Jackendoff, if the same concept appears
twice in a single sentence, it will not be possible to distinguish between those two
occurrences, since both will be trying to activate the same group of neurons.
While the problem of two poses certain challenges to understanding cognition
as implemented in neurons, it is not obviously separate from the more general
considerations of systematicity and binding. After all, if an approach can bind
a representation for “star” to two different sentential roles, it should solve this
CHAPTER 8. EVALUATING COGNITIVE THEORIES 303
problem. Consequently, I have placed the problem of two under this criterion, as
it relates to “binding” more generally.
To summarize, Jackendoff sees the scalability of proposed solutions to binding
as being essential to their plausibility. The degree to which a proposed architec-
ture is cognitive is directly related to how well it can generate appropriately large
structures, in the same length of time and with the same kind of complexity as
observed in real cognitive systems. So, we can see the “massiveness of bind-
ing” challenge as emphasizing the practical difficulties involved in constructing
a suitably productive system. This problem thus straddles the distinction I have
made between representational concerns and performance concerns. Regardless
of where it is placed on the list of cognitive criteria, it highlights that a good
cognitive architecture needs to identify a binding mechanism that scales well.
8.2.2 Performance concerns
While criteria related to representational structure focus on the theoretical com-
mitments that underwrite cognitive architectures, criteria related to performance
concerns are directed towards actually implementing a cognitive system.
• “The square being red triggles the square being big” implies that the square
is big and the square is not red; also
• “The dog being fuzzy triggles the dog being loud” implies that the dog is
loud and the dog is not fuzzy.
Now I ask the question: What does “the chair being soft triggles the chair being
flat” imply? If you figured out that this implies that the chair is flat and not soft,
CHAPTER 8. EVALUATING COGNITIVE THEORIES 304
you have just performed syntactic generalization. That is, based on the syntactic
structure presented in the two examples, you determined what the appropriate
transformation to that structure is in order to answer the question. Evidently, the
problem of rapid variable creation (see section 7.3.7) is just a simpler form of
syntactic generalization as well. While the emphasis in that case is on speed,
the problem being solved is essentially one of identifying syntactic structure, and
using it in a novel circumstance.
There are, in fact, even simpler examples of syntactic generalization, such
as determining when contractions are appropriate (Maratsos and Kuczaj, 1976).
However, the example I provided above shows the true power of syntactic gener-
alization. That is, it demonstrates how reasoning can be sensitive to the structure,
not the meaning, of the terms in a sentence. Taken to an extreme, this is the basis
of modern logic, and it is also why computers can be used to perform reasonably
sophisticated reasoning.
It should not be surprising, then, that if we adopt the representational com-
mitments of computers, as symbolicism does, syntactic generalization becomes
straightforward. However, if we opt for different representational commitments,
as do both dynamicism and connectionism, we must still be able to explain the
kind of syntactic generalization observed in human behavior. In short, syntactic
generalization acts as a test that your representational commitments do not lead
you astray in your attempt to explain cognitive performance.
I should note here, as I did for several of the criteria in the previous section,
that human behavior does not always conform to ideal syntactic generalization.
As described in section 6.6, there are well-known effects of content on the rea-
soning strategies employed by people (Wason, 1966). In the preceding example,
this would mean that people might reason differently in the “dog” case than in
the “chair” case, despite the fact that they are syntactically identical. So again,
we must be careful to evaluate theories with respect to this criterion insofar as it
captures human performance, not with respect to the idealized characterization of
the criterion. In short, if people syntactically generalize in a certain circumstance,
a good cognitive architecture must be able to capture that behavior. If people do
not syntactically generalize, the architecture must be able to capture that as well.
8.2.2.2 Robustness
tous sort of cognitive task. Robustness, in contrast, deals with changes of perfor-
mance across many cognitive tasks. “Robustness” is a notion which most naturally
finds a home in engineering. This is because when we build something, we want
to design it such that it will continue to perform the desired function regardless of
unforeseen circumstances or problems. Such problems might be changes in the
nature of the material we used to construct the object (for instance the fatiguing of
metal). Or, these problems might arise from the fact that the environment in which
the object is built to perform is somewhat unpredictable. In either case, a robust
system is one which can continue to function properly in spite of these kinds of
problems, and without interference from the engineer.
In some respects, the hardware of modern digital computers is extremely ro-
bust. It has been specifically built such that each of the millions of transistors on
a chip continues to perform properly after many millions of uses. This robustness
stems partially from the fact that the voltage states in a transistor are interpreted
to be only on or off, despite the fact that the voltage varies between ±5 V. As a
result, if an aging transistor no longer reaches 5 V, but is above zero, it will be act-
ing properly. The cost of this robustness, however, is that these machines use a lot
of power: the human brain consumes about 20 W of power; much less impressive
digital computers use tens or hundreds of times more power.
Nevertheless, computers are well-designed to resist problems of the first kind
– degradation of the system itself. In fact, these same design features help re-
sist some of the second kind of problem – environmental variability. Specifically,
electromagnetic fluctuation, or heat fluctuations, can be partially accommodated
in virtue of the interpretation of the transistor states. However, “interesting” envi-
ronmental variability is not accounted for by the hardware design at all. That is,
the hardware cannot run a poorly written program. It cannot use past experience
to place a “reasonable” interpretation on noisy or ambiguous input. This limita-
tion is not solely the fault of the hardware designers. It is as much a consequence
of how software languages have been designed and mapped onto the hardware
states.
All of this is relevant to cognitive science because symbolicism adopted an
approach which shared these central design features used to construct and pro-
gram digital computers. But, it seems that biology has made a different kind of
trade-off than human engineers in building functional, flexible devices. Brains
use components which, unlike transistors, are highly unreliable, often completely
breaking down, but which use very little power. Consequently, the kinds of ro-
bustness we see in biological systems is not like that of computers. So, when
connectionists adopted a more brain-like structure in their theoretical characteri-
CHAPTER 8. EVALUATING COGNITIVE THEORIES 306
zation of cognition, much was made of the improved robustness of their models
over their symbolicist competitors. Indeed, robustness was not of obvious concern
to a symbolicist, since the hardware and software on which their simulations ran
(and whose theoretical assumptions they had taken on board) essentially hid such
concerns from view.
However, robustness concerns could not be ignored for long when connection-
ists began explaining certain kinds of phenomena, such as the “graceful degrada-
tion” of function after damage to the brain, that seemed beyond the reach of sym-
bolicism (Plaut and Shallice, 1994). As well, adopting more brain-like architec-
tures made it possible for connectionists to explain the performance of cognitive
systems on tasks such as pattern completion, recognition of noisy and ambiguous
input, and other kinds of statistical (as opposed to logical) inference problems.
But, connectionist models continued to make “un-biological” assumptions
about the nature of the implementation hardware. Most obvious, perhaps, was
that connectionist nodes were still largely noise free. Nevertheless, early connec-
tionist models made the important point that matching the observed robustness
of cognitive systems can be closely tied to the specific implementation architec-
ture that they were built on. More generally, these models established that the
robustness of cognitive systems could be used as a means to justify aspects of
an underlying cognitive theory. In short, they established that robustness was a
relevant criterion for characterizing cognitive architectures.
In sum, as a criterion for a good cognitive architecture, robustness demands
that an architecture supports models that continue function appropriately given
a variety of sources of variability, such as noisy or damaged component parts,
imprecision in input, and unpredictability in the environment.
8.2.2.3 Adaptability
Adaptability has long been one of the most admired features of cognitive sys-
tems. It is, in many ways, what sets apart systems we consider “cognitive” from
those that we do not. Simply put, adaptability is exhibited by a system when it
can update its future performance on the basis of past experience. This, of course,
sounds a lot like learning, and indeed the two terms can often be used interchange-
ably. However, as discussed earlier, in cognitive science (especially connection-
ism) learning often refers specifically to the changing of parameters in a model in
response to input. Adaptability, however, goes beyond this definition. There are,
for instance, nonlinear dynamical systems which can exhibit adaptable behavior
CHAPTER 8. EVALUATING COGNITIVE THEORIES 307
without changing any of the parameters in the system (the fluid intelligence model
in section 4.6, is one example). Indeed, many dynamicist models rely on precisely
this property.
Notice also that adaptability can be provided through sophisticated represen-
tational structures. A prototypical symbolicist model can respond to input it has
never seen before based solely on the syntactic structure of that input, often pro-
ducing reasonable results. As long as the rules it has been programmed with, or
has learned, can use structure to generalize, it will be able to exhibit adaptability:
one example, of course, is syntactic generalization.
So, adaptability has been identified and explored by all past approaches to cog-
nition. While the kinds of adaptability considered are often quite different – cho-
sen to demonstrate the strengths of the approach – all of these types of adaptability
are exhibited by real cognitive systems to varying degrees. That is, exploitation
of nonlinear dynamics, tuning of the system parameters to reflect statistical reg-
ularities, and generalization over syntactic structure are all evident in cognitive
systems. Consequently, a theory which accounts for adaptability in as many of its
various guises as possible will do well on this criteria – the more the better.
8.2.2.4 Memory
In order for a system to learn from its experience, it must have some kind of
memory. Despite the obviousness of this observation, it is by no means obvious
what the precise nature of memory itself is. In the behavioral sciences, many
varieties of memory have been identified, including long-term, short-term, work-
ing, declarative, procedural and so on. Not only are there many different kinds of
memory, but most of these different types of memory have been identified primar-
ily through behavioral markers. Consequently, it is unclear what specific compo-
nents a architecture needs, or how such components might internally function, in
order to explain the varieties of memory.
Past approaches have treated the issue of memory in very different ways: as
symbolic databases; as connection weights between nodes; as slowly varying
model parameters; etc. Ultimately, the adequacy of a cognitive architecture on
this criterion is going to be determined by its ability to address the wide variety
of data that relates to memory. There are a vast number of possible phenomena to
account for when considering memory, so allow me to simply focus my discussion
on aspects of long-term and working memory, which seem crucial for cognitive
behaviors.
CHAPTER 8. EVALUATING COGNITIVE THEORIES 308
good cognitive architecture must account for the functions of, and relationships
between, working and long-term memory, so as to not overload one, and not un-
derestimate the depth of the other. These considerations, perhaps, are among the
many reasons why researchers typically include “memory” in the list of important
capacities to consider when building cognitive models.
8.2.2.5 Scalability
As I noted previously in section 1.3, all three approaches to cognitive theoriz-
ing have a large and complex system as their target of explanation. As a result,
many of the simple cognitive models proposed in contemporary research must be
“scaled up” to truly become the kinds of cognitive explanations we would like.
The reason that scalability is a criterion for cognitive architectures is because of
the many difficulties hidden in the innocuous sounding “scaling up”.
The reason we need to take scalability seriously is that it is notoriously difficult
to predict the consequences of scaling. This is tragically illustrated by the case of
Tusko the elephant. In 1962, Louis West and his colleagues decided to determine
the effects of LSD on a male elephant, to see if it would explain the phenomenon
of “musth”, a condition in which elephants become uncontrollably violent (West
et al., 1962). Because no one had injected an elephant with LSD before, they
were faced with the problem of determining an appropriately sized dosage. The
experimenters decided to scale the dosage to Tusko by body weight, based on the
known effects of LSD on cats and monkeys.
Unfortunately, five minutes after the injection of LSD the elephant trumpeted,
collapsed and went into a state resembling a seizure. He died about an hour and
a half later. Evidently, the chosen means of scaling the dosage had the effects
of a massive overdose that killed the 7000 pound animal. If the dose had been
scaled based on a human baseline, it would have been 30 times lower. If it had
been scaled based on metabolic rate, it would have been about 60 times lower.
If it had been scaled based on brain size, it would have been about 300 times
lower. Clearly, the dimension along which you characterize an expected scaling is
crucial to determining expected effects. The lesson here for cognitive theories is
that scaling can seldom be accurately characterized as “more of the same”, since
we may not know which “same” is most relevant until we scale.
A second reason to take scaling seriously, which is more specific to functional
architectures, is the result of considering the complexity of potential interactions
in large systems. As Bechtel and Richardson have forcefully argued, decompo-
sition and simplification is an essential strategy for explaining complex systems
(Bechtel and Richardson, 1993). It is not surprising, then, that most cognitive
CHAPTER 8. EVALUATING COGNITIVE THEORIES 310
theorists take exactly this kind of approach. They attempt to identify a few basic
functional components, or principles of operation, that are hypothesized to char-
acterize “full-blown” cognition. Because we are not in a position to build models
of “full-blown” cognition, the principles are applied in a much more limited man-
ner. Limitations are typically imposed by abstracting away parts of the system,
and/or highly simplifying the task of interest. When faced with a different task
to explain, a new model employing related principles and methodologies is often
constructed and a new comparison is made. Sometimes, different models employ-
ing the same architecture will use few or no overlapping components, sometimes
there will be significant overlap.
In either case, such a strategy is problematic because it skirts the main chal-
lenge of building complex systems. As any engineer of a complex real-world sys-
tem will tell you, many of the challenges involved in building large systems come
from characterizing the interactions, not the internal functions, of components of
the system. As has been well established by disciplines such as chaos theory and
dynamical systems theory, the interactions of even simple components can give
rise to complex overall behavior. In order to ensure that the hypothesized cogni-
tive principles and components can truly underwrite a general purpose, cognitive
architecture, simple, task specific models must be integrated, and hence scaled up,
to simultaneously account for a wide variety of tasks.
In the context of constructing cognitive architectures, these two considerations
are closely related: the first suggests it is difficult to know in advance how to
predict the effects of scaling, and the second suggests that the true challenges of
scaling are found in the interactions of parts of large systems. Both considerations
point to the importance of actually constructing large, scaled-up models to test a
proposed architecture. Noticing this relationship can help us to apply the principle
of scaling.
Since scalability is difficult to predict from simpler instances, the weakest
form of scalability is scalability in principle. Scalability in principle amounts to
demonstrating that there is nothing in your assumed characterization of cognition
that makes it unlikely that you could scale the theory to account for full-blown
cognition. This form of scalability is weak because it could well be the case that
a crucial dimension for predicting scalability has been missed in such an analysis.
A far more significant form of scalability is scalability in practice. That is, ac-
tually building large models of large portions of the brain that are able to account
for behaviors on many tasks, without intervention or task-specific tuning. Such
scaling can be extremely demanding, both from a design standpoint and compu-
tationally. However, this simply means that being able to construct such models
CHAPTER 8. EVALUATING COGNITIVE THEORIES 311
makes it that much more convincing that the relevant architecture is appropriate
for characterizing real cognitive systems.
The behavioral sciences are replete with a variety of methods. Some methods
characterize the opening and closing of a single channel on a neural membrane,
while others characterize activity of an entire functioning brain. In some sense,
all of these methods are telling us about the functioning of the brain. So, if we
take our cognitive theory to be a theory of brain function, then information from
all such methods should be consistent with and relatable to our theory.
Unfortunately, this ideal seems to not only be largely un-realized, but seldom
even treated as a central goal by cognitive theorists. Perhaps this is because cog-
nitive science researchers have generally been placed into traditional academic
disciplines like psychology, neuroscience, computer science, and so on. As a re-
sult, the conventional methods of one discipline becomes dominant for a given
researcher and so his or her work becomes framed with respect to that specific
discipline. In some sense, the identification of “cognitive science” as a multidis-
ciplinary but unified enterprise is an attempt to overcome such a tendency. Nev-
ertheless, cognitive theories seem to often have a “home” in only one or perhaps
two of of the sub-disciplines of cognitive science.
A truly unified theory of cognitive function, in contrast, would have clear re-
lationships to the many disciplines of relevance for understanding brain function.
Most obviously this would be evident from such a theory being able to predict
CHAPTER 8. EVALUATING COGNITIVE THEORIES 312
the results of experiments in any relevant discipline. In short, the more kinds of
data that can be use to constrain and test a theory, the better. We should, after
all, be impressed by a model that not only predicts a behavioral result, but also
tells us which neurons are active during that task, what kind of blood flow we
should expect in the relevant brain areas, what sort of electrical activity we will
record during the task, how we might disrupt or improve performance on the task
by manipulating neurotransmitters within the system, and so on. While the in-
terdisciplinary nature of the behavioral sciences can prove to be one of the most
daunting aspects of performing research in the area, it also provides one of the
most stringent tests of the quality of a purported theory of cognition.
8.2.3.2 Compactness
While I have called this criteria “compactness”, it often goes by the name of
“simplicity”. In fact, I think the former better indicates what is important about
theories that are deemed good. It is not the fact that they are “easy” or “simple”
theories that makes them good – they may indeed be quite difficult to understand.
Instead, it is that they can be stated comprehensively, in a highly succinct manner,
that makes them good. Often, this kind of succinctness is possible because the
theory employs mathematics, a language whose terms and relationships are well-
defined, and which can succinctly express complex structure. While mathematics
itself does not supply theories of the world (since the terms in mathematical ex-
pressions must be mapped onto the world), it is good for clearly expressing the
relationships between those terms.
One reason compact expressions of a theory are highly valued is because they
make it difficult to introduce unnoticed or arbitrary changes in a theory when em-
ploying it in different contexts. If a cognitive theory changes when moving from
simple cognitive tasks to more complex ones, there is little sense to be made of
it being a single, compact theory. For instance, if we must introduce a parameter
in our theory for no reason other than to fit new data, the additional complexity
introduced by that parameter is poorly motivated from a theoretical standpoint.
Similarly, if we must change our mapping between our theory and the world de-
pending on the circumstances (e.g., a “time step” in a model is not always the
same amount of real time), the statement of our theory should include a descrip-
tion of how to determine what the appropriate mapping is, making the theory less
compact. In short, any unprincipled “fitting” of a model to data pays a price in
CHAPTER 8. EVALUATING COGNITIVE THEORIES 313
cally, we might think that we are at least in a better position to evaluate cognitive
theories than we were 50 years ago. In either case, the last 50 years should make
us humble.
Chapter 9
Theories of cognition
315
CHAPTER 9. THEORIES OF COGNITION 316
tecture, the General Problem Solver (GPS; Newell and Simon, 1976). Conse-
quently, ACT-R shares central representational and processing commitments with
most symbolic architectures, and thus is a natural choice as a representative of the
paradigm. While ACT-R contains some “neurally inspired” components, making
it something of a hybrid approach, the core representational assumptions are sym-
bolicist. Interestingly, it has recently been used to explore the biological plausibil-
ity of some of its central assumptions, and is unique amongst symbolic approaches
in its ability to map to both behavioral and neural data. Because I am centrally
interested in biological cognition, these features of ACT-R make it an important
point of comparison for the SPA. In many ways, the ACT-R architecture embodies
the best of the state-of-the-art in the field.
Nevertheless, there are several other architectures which are more biologically
inspired, connectionist approaches to cognitive modelling, and hence are also nat-
ural to compare to the SPA. These include architectures that use various mech-
anisms for constructing structured representations in a connectionist network,
such as Neural Blackboard Architectures (van der Velde and de Kamps, 2006),
which use local integrators, and SHRDLU (Shastri and Ajjanagadde, 1993) and
LISA (Hummel and Holyoak, 2003), which use synchrony. Another influential
connectionist-based approach to structure representations focuses less on specific
implementations, and more on proposing a broad theoretical characterization of
how to capture symbolic (specifically linguistic) processing with distributed rep-
resentations and operations (Smolensky and Legendre, 2006a,b). The associated
architecture is known as ICS, and shares central commitments with the SPA re-
garding structured representations. Other connectionist approaches focus less on
representational problems, and deal more with issues of control and learning in
cognitive tasks. Leabra is an excellent example of such an approach, which has
had much success mapping to reasonably detailed neural data (O’Reilly and Mu-
nakata, 2000). In addition, there has been a recent effort to combine Leabra with
ACT-R (Jilk et al., 2008), as a means of simultaneously exploiting the strengths
of each. This, again, provides an excellent comparison to the SPA.
Finally, there are some dynamicist approaches to cognitive modeling that have
been gaining prominence in recent years (Schöner, 2008). Perhaps the best known
amongst these is Dynamic Field Theory (DFT), which employs a combination
of dynamicist and neurally-inspired methods to model cognitive behaviors (e.g.,
Schöner and Thelen, 2006). The focus of DFT on time, continuity, and neural
modeling provides a useful and unique comparison for the SPA.
In the remainder of this chapter, I consider each of these approaches in more
detail, describing their strengths and some challenges each faces. This discussion
CHAPTER 9. THEORIES OF COGNITION 317
will provide background for a subsequent evaluation of these theories with respect
to the QCC. And, it sets the stage for an explicit comparison between this past
work and the SPA.
Before proceeding, it is worth explicitly noting two things. First, there is
important work in cognitive science that has been very influential on the architec-
tures described here, including the SPA, which I do not discuss in detail for lack of
space and/or lack of a full-fledged architecture specification (e.g., Newell, 1990;
Rogers and McClelland, 2004; Barsalou, 2003). Second, because I am consid-
ering six architectures, each receives a relatively short examination. This means
that my discussion is somewhat superficial, though hopefully not inaccurate. As
a consequence the reader is encouraged to follow up with the provided citations
for a more detailed and nuanced understanding of each of the architectures under
consideration.
9.1.1 ACT-R
Adaptive control of thought-rational, or ACT-R, is perhaps the best developed
cognitive architecture. It boasts a venerable history with the first expression of
the architecture as early as 1976 (Anderson, 1976). It has been very broadly ap-
plied to characterizing everything from child language development (Taatgen and
Anderson, 2002) to driving (Salvucci, 2006), and from the learning of algebra
(Anderson et al., 1995b)) to categorization (Anderson and Betz, 2001).
The many successes and freely available tools to build ACT-R models have re-
sulted in a large user community developing around this architecture (see http:
//act-r.psy.cmu.edu/). This in turn has had the effect of there being many
“sub-versions” of ACT-R being developed, some of which include central com-
ponents from originally competing architectures. For example, a system called
ACT-R/PM includes “perceptual-motor” modules taken from EPIC (Kieras and
Meyer, 1997). This has lead to proponents claiming that the ACT-R framework
allows the creation of "embodied" cognitive models (Anderson, 2007, p. 42).
Since a central goal of ACT-R is to provide descriptions of “end-to-end behavior”
(Anderson, 2007, p. 22), this is an important extension of the methods.
The basic ACT-R architecture consists in a central procedural module, which
implements a production system, and is bidirectionally connected to seven other
modules (i.e., the goal, declarative, aural, vocal, manual, visual, and imaginal
buffers). Each of these modules has been mapped to an area of the brain, with the
procedural module being the basal ganglia, and hence responsible for controlling
communication between buffers and selecting appropriate rules to apply based on
CHAPTER 9. THEORIES OF COGNITION 318
the contents of the buffers. A central constraint in ACT-R is that only a single
production rule can be executed at a time, and that it takes about 50 ms for a
production rule to fire (Anderson, 2007, p. 54).
Because of its commitment to a symbolic specification of representations and
rules, most researchers take ACT-R to be a largely symbolicist cognitive archi-
tecture. However, its main proponent, John Anderson, feels no particular affinity
for this label. Indeed, he notes that it “stuck in [his] craw” when he was awarded
the prestigious Rumelhart prize in 2005 as the “leading proponent of the symbolic
modeling framework” (Anderson, 2007, p. 30). In his view, ACT-R is equally
committed to the importance of subsymbolic computation in virtue of at least two
central computational commitments of the architecture. Specifically, “utilities”
and “activations” are continuously varying quantities in the architecture which
account for some central properties of its processing. For example, continuous-
valued utilities are associated with production rules to determine their likelihood
of being applied. As well, utilities slowly change over time with learning, deter-
mining the speed of acquisition of new productions. Anderson takes such subsym-
bolic commitments of the architecture to be at least as important as its symbolic
representational commitments.
As well, these subsymbolic properties are important for the recent push to map
ACT-R models to fMRI data, because they play a crucial role in determining the
timing of processing in the architecture. In his most recent description of ACT-R,
John Anderson provides several examples of the mapping of model responses to
the bold signal in fMRI (Anderson, 2007, pp. 74-89, 119-121, 151). This mapping
is based on the amount of time each module is active during one brief fMRI scan
(usually about 2s). The idea is that an increase in the length of time a module
is active results in increased metabolic demand, and hence an increased BOLD
signal, picked up by the scanner. While the quality of the match between the
model and data vary widely, there are some very good matches for certain tasks.
More importantly, the inclusion of neural constraints, even if at a very high-level,
significantly improves the plausibility of the architecture as a characterization of
the functions being computed by the brain.
Several critiques have been levelled against the ACT-R architecture as a theory
of cognition. One is that the architecture is so general, that it does not really pro-
vide practical constraints on specific models of specific tasks (Schultheis, 2009).
Before the explicit mappings to brain areas of the modules, this criticism was
more severe. Now that there is a more explicit implementational story, however,
there are at least some constraints provided regarding which areas, and hence
which modules, must be active during a given task. However, these constraints
CHAPTER 9. THEORIES OF COGNITION 319
are not especially stringent. As Anderson notes, the mapping to brain areas is
both coarse, and largely a reiteration of standard views regarding what the various
areas do (Anderson, 2007, p. 29).
For instance, the ACT-R mapping of the declarative module to prefrontal areas
is neither surprising, nor likely to be generally accurate. The prefrontal cortex is
extremely large, and is differentially involved in a huge variety of tasks sometimes
in ways not suggestive of providing declarative memories (Quirk and Beer, 2006).
So, it not likely informative to map it to only one (i.e. declarative memory) of
many functions ascribed to the area. As well, it would be surprising if all retrieval
and storage of declarative memory could be associated with prefrontal areas, given
the prominent role of other structures (such as hippocampus) in such functions
(Squire, 1992).
Similarly, the visual module is mapped to the fusiform gyrus, despite the ac-
knowledgement that there are many more areas of the brain involved in vision.
While fusiform no doubt plays a role in some visual tasks, it is somewhat mis-
leading to associate its activation with the “visual module” given the amount of
visual processing that does not activate fusiform.
In addition, the mapping between an ACT-R model and fMRI data often have
many degrees of freedom. In at least some instances, there are several parameters
for such a mapping, each of which are tuned differently for different areas, and
differently for different versions of the task to realize the reported fits (Anderson
et al., 2005). This degree of tuning makes it less likely that there is a straight-
forward correspondence between the model’s processing time and the activity
of the brain regions as measured by BOLD. Thus, claims that ACT-R is well-
constrained by neural data should be taken with some skepticism. In fairness,
Anderson (2007) does note that these mappings are preliminary, acknowledges
some of the concerns voiced here, and clearly sees the fMRI work as only the
beginning of a more informative mapping (p. 77, 248).
Another criticism of the ACT-R approach is its largely non-embodied ap-
proach to cognition. For instance, the claim that ACT-R/PM provides for “em-
bodied” models seems to overreach the actual application of this version of ACT-
R. The perceptual and motor areas of this architecture do not attempt to capture
the processing of natural images, or the control of high-degree of freedom limbs.
Instead, they provide lengths of time that such processing might take, and capture
some high-level timing data related to processing such as attentional shifts. Appli-
cations of such models are to tasks such as the effect of icon placement and design
on reaction times. Visual representations in such models are specified as symbolic
lists of features: e.g., “gray circle” (Fleetwood and Byrne, 2006). It seems disin-
CHAPTER 9. THEORIES OF COGNITION 320
subsymbols:
furry
canine
group-oriented
solitary
animal
human
motion
inittiator
emotion-negative
emotion-positive
information
abstract
Figure 9.1: The LISA architecture. Boxes represent neural groups and lines repre-
sent synaptic connections. Shown are just those neural groups needed to represent
“dogs chase cats” and “Tom thinks that dogs chase cats”. Based on Hummel &
Holyoak (2003), figure 1.
phisticated neural expression in the recent LISA architecture from Hummel and
Holyoak (2003).
In LISA, synchrony is used as a direct method for neurally representing struc-
tured relations (Hummel et al., 1994; Hummel and Holyoak, 1997, 2003). In
this architecture, a structured representation is constructed out of four levels of
distributed and localist representations. The first level consists of localist “sub-
symbols” (e.g. mammal, furry, male, etc.). The second level consists of localist
units connected to a distributed network of subsymbols relevant to defining the
semantics of the second level term (e.g., dog is connected to furry, mammal, etc.).
The third level consists of localist “subproposition” nodes that bind roles to ob-
jects (e.g., dog+verb-agent, or dog+verb-theme, etc.). The fourth and final level
consists of localist proposition nodes that bind subpropositions to form whole
propositions (e.g., dog+chase-agent and cat+chase-theme). LISA is an interesting
case because it is clearly a connectionist network, but it implements a classical
representational scheme since all elements of the structures are explicitly tokened
whenever a structure is tokened. In this sense, it is precisely the kind of imple-
mentation that could underwrite the symbolic representations of ACT-R.
Hummel and Holyoak demonstrate how LISA can perform relational inference
CHAPTER 9. THEORIES OF COGNITION 322
evidence does not establish that the observed synchrony of firing is actually used
for binding, instead of being an epiphenomenon” (p. 221). Similarly, from a more
biological perspective, Shadlen and Movshon (1999) present several reasons why
synchrony has not been established as a binding mechanism, and why it may not
serve the functional role assigned by proponents.
Furthermore, synchrony binding in LISA is accomplished by increasing real-
valued spike rates together at the same time. Synchrony in LISA is thus very
clean, and easy to read off of the co-activation of nodes. However, in a biolog-
ical system, communication is performed using noisy, individual spikes, leading
to messy synchronization, with detailed spectral analyses necessary to extract ev-
idence of synchronization in neural data (Quyen et al., 2001). Thus it is far from
clear that using more neurally realistic, spiking, noisy nodes would allow LISA to
exploit the mechanism it assumes.
However, there is an even deeper problem for LISA. The kind of synchroniza-
tion exploited in LISA is nothing like that that has been hypothesized to exist in
biological brains. In LISA, synchronization occurs because there are inhibitory
nodes connected to each subproposition that cause oscillatory behavior when the
subproposition is given a constant input. That oscillation is then reflected in all
units that are excitatorily connected to these subpropositions (i.e., propositions
and themes/verbs/agents). Therefore, in LISA binding is established by construct-
ing appropriately excitatorily connected nodes, and the oscillations serve to high-
light one such set of nodes at a time. So, the representation of the binding results
in synchronization patterns: this essentially puts the bound cart before the syn-
chronous horse. Synchronization in the neurobiological literature is supposed to
result in binding (see, e.g., Engel et al., 2001). For instance, if “red” and “circle”
are co-occurring features in the world, they are expected to be synchronously rep-
resented in the brain, allowing subsequent areas to treat these features as bound.
Thus the synchronization patterns result in binding, not the other way around.
Consequently, the neural plausibility of LISA is not supported by current work on
synchronization in the neurosciences.
without being “wired up” differently for each optimization (e.g. to parse each sen-
tence). Relatedly, there is no suggestion for what kind of control structures could
“load” and “manipulate” language as quickly as it happens in the human system.
So despite the isomorphic mapping between localist and distributed networks, it
remains a major challenge for the ICS to show how a distributed network can ac-
tually perform all the steps needed to process language in a biologically relevant
way.
As with past approaches, biological plausibility is also related to scaling is-
sues, a known challenge for the ICS’s preferred method of binding: the tensor
product. As mentioned earlier, tensor products scale badly because the product
of two N dimensional vectors results in an N × N length vector. So, embedded
structures become exponentially large and unwieldy. For example, to represent
structures like “Bill believes John loves Mary” using 500-dimensional vectors (as
in the SPA) would require 12 billion neurons, or about half the area of available
cortex (see appendix B.5) – as with the NBA, this is only for representation. In
their book, Smolensky and Legendre explicitly note the utility of a compression
(or “contraction”) function (like that employed by all other VSAs) to address this
scaling issue: “the experimental evidence claimed as support for these models
suggests that contracted tensor product representations of some kind may well be
on the right track as accounts of how people actually represent structured infor-
mation” (p. 263). However, they do not subsequently use such representations.
Staying with the assumption of standard tensor products seems to be moti-
vated by Smolensky and Legendre’s explicit concern to connect a description of
the brain as a numerical computer with the description of mind as a symbolic
computer. As a result, their approach has been praised by Anderson (2007) for its
focus on how to do complex processing with the proposed representation: “The
above criticism is not a criticism of connectionist modeling per se, but ... of mod-
eling efforts that ignore the overall architecture... in the Smolensky and Legendre
case, [their approach] reflects a conscious decision not to ignore function” (p. 14-
15). The approach appeals to Anderson because Smolensky and Legendre hold a
view of cognitive levels that is compatible with Anderson’s suggestion that sym-
bolicism often provides the “best level of abstraction” for understanding central
aspects of cognition (2007, p. 38). In their discussion of the ICS, Smolensky and
Legendre explicitly note their commitment to an isomorphism between their rep-
resentational characterization in terms of vectors and a characterization in terms
of symbols ?, p. 515. If you use compressed representations, however, the iso-
morphism no longer holds. Compressed representations lose information about
the symbolic structures they are taken to encode, so they can quickly diverge from
CHAPTER 9. THEORIES OF COGNITION 329
9.1.5 Leabra
The last connectionist cognitive architecture I will consider has taken a very dif-
ferent approach to characterizing cognition. Rather than focusing on representa-
tion, Leabra focuses on control and learning (O’Reilly and Munakata, 2000). The
CHAPTER 9. THEORIES OF COGNITION 330
methods have currently not been integrated, but rather “attached.” Consequently,
it is difficult to argue that the resulting models solve the kinds of problems they
were attempting to address. I believe it is fair to say that the desired unification of
biological realism and symbolic processing has so far remained elusive.
simulate the known features of habituation, including familiarity and novelty ef-
fects, stimulus intensity effects, and age and individual differences. Other DFT
models capture the development of perseverative reaching in infants (Thelen et al.,
2001), or the improvement in working memory through development (Schutte
et al., 2003).
More recent work has addressed more cognitive tasks, such as object recog-
nition (Faubel and Sch, 2010), spatial memory mapping to spatial language (Lip-
inski et al., 2006), and speech motor planning (Brady, 2009). The speech motor
task model, for example, is constructed out of two neural fields by mapping the
input/output space to a first sheet of neural activity, and then having a second-
order field learn how to map the current first-order state into a desired next first
order state. This is a nice example of a simple hierarchical dynamical system,
which is able to capture how a static control signals can aid the mapping between
appropriate dynamic states.
We can see in such examples that, like Leabra, the DFT approach has used
working memory models as an important component of building up more cogni-
tive models. Unlike Leabra, DFT is focused on ensuring that the low-level dy-
namics are not compromised by introducing algorithmic methods into their simu-
lations. Unlike Leabra, DFT is often not mapped to specific anatomical structures,
and types of neurons.
As with Leabra, there are reasons to be concerned with the neural plausibility
of DFT. DFT again relies on non-spiking single cell models. Spikes can introduce
important fluctuations into the dynamics of recurrent networks, and so ignoring
them may providing misleading results regarding network dynamics. As well, the
assumption in DFT that all neurons are physiologically the same is not borne out
by neural data. As described earlier, neurons possess a wide variety of responses,
and are generally heterogeneous along a variety of dimensions. Because the DFT
models are homogeneous, they are difficult to compare in their details to the kinds
of data available from single neuron recordings.
But again, as with Leabra, there are more important concerns. Perhaps the
most crucial drawback is that the kinds of representations, and hence the com-
plexity of the tasks DFT models address, is very limited. Specifically, there are
no methods for introducing complex structured representations into the approach.
Hence, most cognitive applications address fairly minimal cognitive tasks, not the
kind of tasks addressed by more traditional cognitive models, such as ACT-R. It
is somewhat difficult to argue that these last two approaches are general cognitive
architectures given their lack of ability to address high-level cognitive function.
While these approaches have not made unscalable assumptions in trying to incor-
CHAPTER 9. THEORIES OF COGNITION 335
porate symbolic representations, they have also not provided an alternative that
will allow their methods to be applied to the kinds of tasks for which symbolic
representations are needed.
9.2 An evaluation
Now that I have provided a brief description of several past approaches to under-
standing biological cognition, describing some strengths and limitations of each,
I now return to the QCC and consider how each criterion applies to the current
state-of-the-art. I want to be clear that this evaluation is in no way intended to un-
derstate the important contributions of this past work, but rather it is intended to
highlight points of difference, and to see the relative strengths and weaknesses of
the various approaches that have been taken to understanding biological cognition
to date.
clear that they capture the “right kind” of compositionality. Recall from my dis-
cussion of this criteria earlier, that compositionality seems to be an idealization of
the way in which meaning actually maps to linguistic structure. None of these ap-
proaches provides an especially detailed account of the semantics of the symbols
which they employ. However, they all equally account for a purely compositional
semantics, given their ability to implement a classical representation architecture.
It may seem that the ICS, because it employs highly distributed representa-
tions, may have more to say about the semantics of the employed representations.
Smolensky and Legendre (2006a) do consider some cases of more grounded spa-
tial representations, but in the end they note that the “semantic problem... will not
be directly addressed in this book” (p. 163). The NBA and ACT-R do not discuss
semantics in much detail. Though LISA does, the semantics typically consist only
in sharing a few subsymbolic features which are chosen explicitly by the modeler
(e.g. the “Bob” and “John” symbol nodes are both linked to a “male” subsymbol
node and hence are semantically similar).
Again, Leabra and DFT do not capture classical compositional semantics be-
cause they do not implement structured representations. Nevertheless, proponents
of Leabra consider linguistic processing (O’Reilly and Munakata, 2000). How-
ever, none of these linguistic models employ Leabra-based architectures. Instead,
they are largely demonstrations that the underlying software can implement past
connectionist models of language. DFT models that directly address semantics are
not common, though some characterize the mapping between simple spatial words
and perceptual representations (Lipinski et al., 2006). More generally, however,
the DFT commitment to embodiment brings with it the suggestion that representa-
tional states in these models will have a more grounded semantics than are found
in other approaches to cognition. The claim that language is highly influenced by
the embodied nature of the language user is a common one in some approaches to
linguistics (Gibbs Jr. (2006); though see Weiskopf (2010) for reservations).
Overall, current approaches seem to either account for classical composi-
tional semantics, or to have the potential to ground semantics without relating
this grounding to language-like representations. Of the past approaches, the ICS
seems to have the representational resources to do both, but the grounding of its
distributed representations is not considered. Clearly, much remains to be done
to integrate simple classical compositional semantics with more sophisticated se-
mantics grounded in detailed perceptual and motor representations.
CHAPTER 9. THEORIES OF COGNITION 337
d. The massive binding problem (the problem of two) The “massive binding
problem” is obviously a scaled up version of the “binding problem”. All of the
implementations of classical representation that I have discussed have provided
solutions to the binding problem. In each case, I considered their ability to scale
in the context of linguistic representations. And, in each case, I demonstrated
that the proposed representational and binding assumptions were not likely to be
implemented in a biological system (see appendix B.5 for details).
ACT-R, in contrast, is silent on the question of what the neural mechanism
for binding is. While this allows it to address this particular criterion well (as
binding is essentially “free”), it will do poorly on subsequent criteria because of
this agnosticism.
Again, DFT and Leabra have little to say about this particular issue because
they do not describe methods for performing language-like binding in any detail.
So, while all of the neural implementations of classical representation are able
to solve the problem of two (because they all posit a viable mechanism for small-
scale binding), none of them scale up to satisfactorily address the massive binding
problem. The approaches to biological cognition that do not account for binding
also do poorly with respect to this criterion. Only ACT-R successfully addresses
it, with the qualification that it does not do so for perceptual binding.
will rapidly reduce its ability to bind large structures. Similarly, binding in LISA
will be highly sensitive to the removal of nodes, since many individual nodes are
associated with specific bindings. As well, for LISA, a reliance on synchrony
is subject to serious concerns about the effects of noise on the ability to detect
or maintain synchronous states (section 9.1.2). Noting that each of the nodes
of these models is intended to map to many actual neurons, will not solve this
problem. There is little evidence that cortex is highly sensitive to the destruction
of hundreds or even several thousand neurons. Noticeable deficits usually occur
only when there are very large lesions, taking out several square centimeters of
cortex (one square centimeter is about 10 million neurons). If such considerations
are not persuasive to proponents of these models, the best way to demonstrate
that the models are robust is to run them while randomly destroying parts of the
network, and introducing reasonable levels of noise.
ACT-R’s commitment to symbolic representations suggests that it too will not
be especially robust. Of course, it is difficult to evaluate this claim by considering
how the destruction of neurons will affect representation in ACT-R. But there
are other kinds of robustness to consider as well. As I have mentioned, it is a
concern of proponents of the approach that they do not support partial matching
(section 9.1.1). This means, essentially, that there cannot be subtle differences in
representations – the kinds of differences introduced by noise for instance – which
are fixed by the production matching process. Or, put another way, there cannot
be “close” production matches which could make up for noise or uncertainty in
the system. So, ACT-R is robust in neither its processes nor its representations.
The robustness of the ICS is difficult to judge. If the system were consis-
tently implementable in a fully distributed manner, it should be quite robust. But,
because control structures are not provided, and most of the actual processing in
current models is done in localist networks, it is not clear how robust an ICS archi-
tecture would be. So, while the distributed representations used in tensor products
should be highly robust to noise, the effects of noise on the ability of the system to
solve its optimization problems in distributed networks has not been adequately
considered. As a consequence, we might expect that an ICS system could be
quite robust, but little to no evidence is available that supports that expectation,
especially in the context of a reconfigurable, distributed system.
We can be more certain about the robustness of DFT. Attractor networks are
well known to be highly robust to noise, and reasonably insensitive to the dis-
ruption of individual nodes participating in the network (Conklin and Eliasmith,
2005)???add Macneil ref???. It is likely, then, that the dynamics intrinsic to most
DFT models will help improve the robustness of the system overall. One concern
CHAPTER 9. THEORIES OF COGNITION 341
with robustness in DFT models, is that it is well known that controlling dynam-
ical systems can be difficult. Because there are no “control principles” available
for DFT, it would be premature to claim that any DFT model will be robust and
stable. So, while the representations seem likely to be robust, it is difficult to
determine, in general, if the processes will be. Essentially the architecture is not
well-specified enough to draw general conclusions.
Similarly, for Leabra, the reliance on attractor-like working memories will
help address robustness issues. In many Leabra models, however, the represen-
tations of particular states are localized. Because the models are seldom tested
with noise and node removal, it is again difficult to be certain either way about
the robustness of a scaled-up version of these models. The current theoretical
commitments, at least, do not preclude the possibility that such models could be
robust.
Overall, then, state-of-the-art approaches to biological cognition do not do
especially well on this criterion. It seems that systems are either highly unlikely to
be robust, or there is little positive evidence that generated models would continue
to function under reasonable disturbances due to neuronal death, input variability,
or other sources of noise.
model. However, what this misses is how the symbols used in those descriptions
relate to the perceived and manipulated world of the system. That is, there is no
representational substrata which describes, for example, a model of visual prop-
erties that capture the variability and subtlety of perceptual experience. In other
words, the account of long-term memory provided by ACT-R is partial and se-
mantically superficial.
LISA and the NBA both have working memory-like mechanisms. In the NBA
these mechanisms are used to support temporary binding of symbols. In LISA,
they account for the capacity limitations of working memory observed in people.
However, both have poor accounts of long-term memory. LISA’s minimal seman-
tic networks could be considered a kind of long-term memory. However, they do
not capture either retrievable facts about the world, or more subtle perceptual rep-
resentations like those missing from ACT-R. The NBA does not discuss long-term
memory at all. The ICS discusses neither.
Leabra and DST are both centrally concerned with working memory phe-
nomena. As a result, they have good characterizations of working memory in
a biologically-inspired substrate. Both have little to say about long-term mem-
ory. This, perhaps, should be qualified by the observation that Leabra is able to
learn procedures for manipulating its representations, and these may be consid-
ered long-term memories. As well, DFT considers the development of various
kinds of behavior over time that would presumably persist into the future. In both
cases, however, these are clearly very limited forms of long-term memory when
compared to the declarative system of ACT-R, or to the kinds of perceptual and
motor semantic representations that are crucial for central aspects of biological
cognition.
e. Scalability Geoff Hinton recently wrote: “In the Hitchhiker’s Guide to the
Galaxy, a fearsome intergalactic battle fleet is accidentally eaten by a small dog
due to a terrible miscalculation of scale. I think that a similar fate awaits most
of the models proposed by Cognitive Scientists” (Hinton, 2010, p. 7). In short, I
agree.
I highlighted several challenges relating to scalability in section 8.2.2.5. These
included knowing the dimension along which to evaluate scaling, knowing how
to account for interactions in functional systems, and being able to establish scal-
ability in practice, rather than only in principle. I have obliquely addressed some
of these issues in my discussion of previous criteria. For instance, the massive
binding problem clearly relates to the ability to scale binding assumptions up to
CHAPTER 9. THEORIES OF COGNITION 344
of neural responses can tell us much about the function of a population of cells
(Eliasmith and Anderson, 2003; Machens et al., 2010, chp. 7). As well, Leabra
uses rate neurons, and so is unable to provide spike timing data. Relatedly, con-
cerns I voiced earlier about the use of kWTA in place of actual neural dynamics
make it less able to address a variety of dynamical neural phenomena in any de-
tail. Nevertheless, Leabra is clearly more directly related to neural methods than
other approaches, save, perhaps, DFT.
For both DFT and Leabra, their stronger connection to low-level neural data
has been achieved at the price of being less able to explain higher-level behavior.
This can be seen in the desire of proponents of Leabra to connect their work to
ACT-R. The enviable breadth of application of ACT-R is not achievable within
Leabra because it does not provide theoretical principles that determine how the
low-level activity patterns in Leabra relate to the symbol manipulation exploited
in ACT-R. This is true despite the explicit attempts to marry these two approaches
I mentioned previously. Similarly, in DFT the “most cognitive” behaviors ad-
dressed are identification of simple spatial relations from visual stimuli, and the
tracking of occurrent objects. These are not comparable to the “most cognitive”
behaviors addressed by the likes of ACT-R.
In sum, there are no state-of-the-art approaches that convincingly connect to
most of the relevant kinds of data regarding brain function. Instead, most ap-
proaches have a preferred kind of data and, despite occasional appearances to the
contrary, are not well-constrained by other kinds of data. Not surprisingly, since I
am considering cognitive approaches, behavioral data is by far the most common
kind of data considered. However, Leabra and DFT stand out as being better con-
nected to neural data than the other approaches. Unfortunately, this seems to have
resulted in their being less well-connected to the high-level behavioral data.
tent. I have suggested that ACT-R is more compact that Leabra largely because
judgements of compactness depend also on the breadth of application of a theory.
Because Leabra does not address a wide variety of cognitive behaviors accessible
to ACT-R, but ACT-R does heavily overlap with Leabra in its account of rein-
forcement learning (though Leabra’s account is more thorough), ACT-R seems to
be more compact overall. In addition, ACT-R is more systematic in its mapping
to temporal phenomena.
When turning to consideration of temporal phenomena, in general, most of the
considered approaches are disconcertingly arbitrary. That is, perhaps the greatest
degree of arbitrariness in all of the current accounts is found in their mappings
to time. There are no obvious temporal parameters at all in some accounts, such
as the ICS and the NBA. Other approaches, like LISA and Leabra are prone to
use “epochs” in describing temporal phenomena. Unfortunately, different models
often map an “epoch” onto different lengths of “real” time, depending on the
context. As well, Leabra relies on basic computational mechanisms that avoid
time all together (e.g., kWTA). In contrast, DFT is fundamentally committed to
the importance of time. The difficulty comes when we again consider how the time
parameters in DFT models and real time are related. As with LISA and Leabra, the
DFT does not specify principles for time-constant parameters are related to real
time, and so different models are free to map model time to real time in different
ways. Finally, ACT-R is less temporally arbitrary because of its insistence on
always using 50ms to be how long it takes to fire a production. However, as
I hinted above, the complexity of productions may vary. This means that very
simple or very complex rules may be executed in the same length of time. Without
a further specification of some constraints on the complexity of productions, the
ACT-R mapping to time is not as systematic as it could be.
To conclude, while all of these approaches are to be commended for rigor-
ously specifying their models, ACT-R and Leabra are the most compact. This
stems from greater evidence of their providing a systematic mapping between
components of the theory and the system they are modelling. Nevertheless, all
approaches are somewhat arbitrary in their relation to real time.
9.2.4 Summary
Evaluating past approaches with respect to the QCC helps make clear the kinds
of trade-offs different approaches have made. It also helps highlight general
strengths and weaknesses that there appear to be in the field as a whole. I have at-
tempted to summarize my previous discussion in table 9.1, though much subtlety
CHAPTER 9. THEORIES OF COGNITION 350
Table 9.1: Comparison of past approaches with respect to the QCC, rated on a
scale from 0 to 4.
ACT-R LISA NBA ICS DFT Leabra
Systematicity ++++ ++++ ++++ ++++ ++ ++
Compositionality ++ ++ ++ ++ - -
Productivity ++++ +++ +++ +++ - -
Massive binding +++ + + + - -
Syntactic Generalization ++ ++ ++ ++ - -
Robustness - + + ++ ++ ++
Adaptability ++ + - - + ++
Memory +++ ++ + - ++ ++
Scalability + - - - - -
Triangulation + + + + ++ ++
Compactness ++ + + + + ++
is lost.
Nevertheless, it is possible to see some general trends from this table. It is
clear for instance, that symbol-like representation and biological detail remain
somewhat orthogonal strengths, as they have been historically. It is also evident
that each approach would greatly benefit from pursuing larger-scale models, in
order to more effectively address several criteria including not only scalability,
but also adaptability, massive binding, and compactness. These same large-scale
models could serve to demonstrate the robustness of the approaches, another cri-
terion on which most approaches do poorly.
Finally, this analysis also suggests that the general criteria for good scientific
theories could be much better addressed by current approaches. This is an obser-
vation that, as I noted earlier, has been made by other theorists in the field. This
observation is also consistent with a sentiment sometimes encountered among
non-modelers, who suggest that there are “too many free parameters” in most
brain models. The more compact and data driven theories of cognition become,
the more unreasonable such off-the-cuff dismissals will be.
One of my main motivations for providing this analysis, and the QCC in gen-
eral, is to help situate the SPA in the context of past work. While presenting the
SPA, I have attempted to highlight some similarities and differences between the
SPA and other approaches. In the next two sections, I have gathered and related
these observations to the QCC in an attempt to give a fuller picture of how the
CHAPTER 9. THEORIES OF COGNITION 351
The approaches I have been considering for the last several pages span the
range of paradigms on offer over the last 50 years of cognitive science. While I
have picked the more biologically relevant approaches from each of symbolicism,
connectionism, and dynamicism, the main theoretical commitments of all three
approaches are well-represented in what I have surveyed. As I now turn to a con-
sideration of the approach on offer here, it may already be obvious that the SPA
is all and none of these past approaches. Indeed, my earlier plea to “move beyond
metaphors” (section 1.2) intentionally opened the door the possibility that no one
paradigm would turn out to be the right one. In this section, I survey the many
similarities between each of these past approaches and the methods and commit-
ments of the SPA. This demonstrates that the SPA bears important similarities to
much past work, making clear that it is an extension of much of what has gone on
before. In the next section, I emphasize the differences.
With LISA, the NBA, and the ICS, the SPA shares a commitment to providing
a neurally plausible account that captures symbol-like representations. In all four
cases, there is an emphasis on the importance of binding to construct structured
representations that can then support a variety of syntactic and semantic manipu-
lations. In fact, the preferred mechanism for binding in the SPA comes from the
same family as that employed in the ICS. As a result, many of the mathematical
results available regarding the ICS are quite informative about representational
and transformational resources available to the SPA. Relatedly, the ICS offers a
sophisticated story regarding how such representations can be related to a wide
variety of linguistic processing. The representational commitments of the SPA
should be largely consistent with that characterization.
The SPA shares a central commitment to dynamics with DFT. It is no sur-
prise, then, that attractor networks form a basic computational component that is
re-used throughout both architectures. This concern for dynamics connects di-
rectly to interest in perception-action loops, and accounting for the integration be-
tween these aspects of behavior, and more traditionally cognitive aspects. There
is little in the SPA that could not be reformulated using terms more familiar to
CHAPTER 9. THEORIES OF COGNITION 352
dynamic systems theorists. Control theory and dynamic systems theory use many
of the same mathematical tools, concepts, and methods, after all. The common
use of temporally-laden terminology belies this deeply shared commitment to un-
derstanding biological cognition through time.
Turning to Leabra, the terminology becomes drawn heavily from neuroscience.
This central interest in the biological basis of cognition is shared with the SPA.
The similarities between Leabra and the SPA go further, however. There is a gen-
eral agreement about some basic architectural features. Both consider the cortex-
basal ganglia-thalamus loop to play a central role in controlling cognitive function.
Both consider the biological details of this architecture when discussing learning
of control, as well as past learned behaviors. Not surprisingly, then, both have
similar relationships to neural data regarding the dopamine system. It is clear that
Leabra has much to recommend it, and the SPA has been heavily influenced by
this, and related approaches to basal ganglia function (e.g., Gurney et al., 2001).
Overall, the SPA and Leabra share a concern with the biological implementation
of adaptive control in cognition.
ACT-R and the SPA share a basic interest in high-level cognitive function.
Both propose ways that rule-like behavior, resulting from symbol-like represen-
tation, can occur. They share, with Leabra, a commitment to the cortex-basal
ganglia-thalamus loop as being the central control structure in explaining such
behavior. All three see a similar place for reinforcement learning, and hence re-
late to similar learning data in some respects. As well, both the SPA and ACT-R
have something like rule matching at the heart of that adaptive control structure.
A related interest of ACT-R and the SPA is found in their focus on the timing of
behaviors. Specifically, behavioral-level reaction times are relevant to the func-
tioning of both architectures. Finally, both approaches are very concerned with
understanding the mapping of the cognitive architecture to the underlying neural
structures. Hence, both are interested in relating their approaches to the biological
basis of cognition.
Given these similarities, there are many ways in which the SPA can be con-
sidered a combination of commitments of the ICS, DFT, ACT-R, and Leabra.
While ACT-R and Leabra alone do not seem to have been broadly integrated, de-
spite attempts to do so, adding the ICS to the attempt provides an important extra
ingredient to that integration. The challenge, after all, was to relate the represen-
tations of ACT-R to those of Leabra. The ICS provides a rigorous and general
mapping between the two. The basic control structure is similar between ACT-R
and Leabra, and the ICS and DFT are silent on that issue, so there are no direct
conceptual conflicts. The SPA can thus be seen as an attempt to build the neces-
CHAPTER 9. THEORIES OF COGNITION 353
level description means that the ICS and the SPA embody different conceptualiza-
tions of the relationship between levels of description in cognitive science. I have
described both conceptions in some detail (sections 9.1.4 and 2.4 respectively),
and the difference essentially lies in the “messy” mapping between levels allowed
by the SPA. This messiness means that clean isomorphisms are unlikely, and that
neural details will matter for certain kinds of behavioural accounts (e.g., recall the
example of serial working memory). Nevertheless, higher-level descriptions can
be helpful, if approximate, for answering certain kinds of questions about cogni-
tive function (e.g., what rule was induced in the Raven’s task?). Of course, the
neural level description is an approximation to a more detailed model in a similar
manner. The pragmatic aspects of “descriptive pragmatism” about levels (see sec-
tion 2.4) means that the questions we are asking of a model will help determine
what the appropriate level of analysis and simulation of the model is. Like the
ICS, the SPA supports descriptions across a range of levels, but not in virtue of an
isomorphism. And, I should add, the SPA covers a wider range of levels than does
the ICS.
While the non-classicism of the SPA approach helps distinguish it from the
NBA and LISA, there is perhaps a deeper difference. This is the reliance of the
SPA on the “moving around” of representations within the architecture. In the
NBA and LISA, the representation of specific items is fixed to a specific part of
the network. In a localist network, this describes the standard method of hav-
ing the “dog” representation be a single node. In the distributed case, this would
be some specific subset of nodes being the “dog” representation. In the SPA,
the representation of “dog” can be in many different networks at the same time,
or at different times. This is because that representation can (often, though not
necessarily exclusively) be found in the activity patterns of a population of cells.
Populations of cells represent vector spaces, and many different items can be rep-
resented in a given vector space. At any point in time, the specific representation
in a population depends on transient activity states.
In a way, the SPA characterizes neural representations as being somewhere be-
tween standard computer representations, and standard implementations of neural
network representations. Standard computers are powerful partly because they do
not commit a particular part of their hardware to specific representations. Thus,
the same hardware can process representations about many different things. It is
the function of the hardware, not its representations, that are often taken to define
its contribution to the overall system. If we adopt the NBA or LISA style methods
of representation, then specific populations or nodes are committed to particular
representations. The SPA commits hardware to a specific vector space, but not a
CHAPTER 9. THEORIES OF COGNITION 355
specific point in a specific labelling of a vector space. This helps it realize some of
the key flexibility of traditional computers in being able to represent many things
with the same hardware (and represent the same thing with many different bits of
hardware). Consequently, the same neural population can compute functions of a
wide variety of representations. This is a subtle yet important difference between
the SPA and many past connectionist approaches. It is because of this flexible use
of neural populations that the SPA can support control and routing of represen-
tations. This is an aspect of cognitive function conspicuously absent from LISA
and the NBA.
In the previous section I highlighted the shared emphasis on dynamics and
embodiment between DFT and the SPA. This is a conceptual similarity, but the
practical applications are quite different. The SPA is regularly compared to, and
constrained by, detailed single cell spiking dynamics. As well, the dynamic com-
ponents of the SPA are intended to be actual neurons, mapping well to the ob-
served heterogeneity, tuning, and response properties of these elements of the real
system. As such, the parameters of the nodes are measurable properties of the
brain. DFT, in contrast, does not provide such a specific mapping to such aspects
of the physics and functions of neurons.
The SPA also has a broader characterization of the dynamics and interaction
of the perceptual, motor, and cognitive systems. A central reason for this is that
the SPA characterizes the dynamics of much higher-dimensional spaces than DFT
typically does. Because the SPA distinguishes between the state space, and the
neuron space in which the state space is embedded, it is possible to understand
the evolution of a very complex neuron space with a few degrees of freedom in
the state space. This mapping then makes it possible to extend the dimensionality
of the state space to even higher dimensions, while keeping the principles that
map dynamics to neural representation the same. In many ways, the SPA work-
ing memory of a 100-D semantic pointer is a straightforward extension of the
2-D working memory of a DFT sheet of neurons (Eliasmith, 2005b). However,
the computations and dynamics available in that higher-dimensional space can be
much more sophisticated. Specifically, the principles of the NEF allow explicit
nonlinear control to be introduced into these dynamical systems, which opens
a host of important possibilities for constructing networks with useful dynamics
(e.g., for routing, gain control, binding, controlled oscillation, etc.).
As well, the NEF provides the SPA with methods that allow such systems to be
constructed without the Hebbian learning typically relied on in DFT models. This
allows much more complicated systems to be constructed, and compared to data
from developed cognitive systems. To address developmental questions, learning
CHAPTER 9. THEORIES OF COGNITION 356
statistical mappings), and how available rules can be matched to current context
(i.e., partial matching). In applying the SPA, we have often found that standard
variable-binding rules used in ACT-R are often not necessary in SPA models.
For instance, in section 5.6, we employ many rules that simply copy information
between elements of the architecture, rather than explicitly including variables
in the rules themselves. If this kind of approach proves generally useful, it may
prove an important change in how cognitive rules are characterized.
It is also interesting to notice how the SPA’s representational story relates to
a conceptual difference between how ACT-R and the SPA treat levels. ACT-R
proponents, unlike those of the ICS, suggest that there is a “best” level of de-
scription for cognitive systems: “In science, choosing the best level of abstraction
for developing a theory is a strategic decision...in both [symbolicist and connec-
tionist] cases, the units are a significant abstraction... I believe ACT-R has found
the best level of abstraction [for understanding the mind]” (Anderson, 2007, p.
38-39). While Anderson leaves open the possibility that the best level is not yet
available, it seems to me to simply be a mistake to expect that there is a best level
for understanding the mind. In contrast with the notion of descriptive pragma-
tism I introduced earlier, the notion that there is a single best level will always cry
out for an answer to the question: “The best for what purpose?” If this purpose
changes, even slightly, so will the most appropriate level of description. I believe
the SPA’s focus on relating levels, rather than picking levels, allows descriptions
to be systematically adjusted to meet the relevant purposes for a given question
about cognition. Sometimes the best description will demand reference to single
cell spike patterns, or a characterization of the effects of specific neurotransmit-
ters. Other times, the best description will demand the specification of which rule
was applied, or what sentence was generated by the system. A unified approach,
able to relate many such descriptions across many different levels strikes me as the
goal of most scientific enterprises. I am not alone: for instance, Craver (2007) has
argued in detail that good explanations in neuroscience should contact many spe-
cific mechanisms at multiple levels of descriptions, and across disciplinary fields
(see also Bechtel and Richardson, 1993; Bechtel, 2005; Thagard and Litt, 2008).
In general, I take it that the SPA spans the relevant levels of description in a
more convincing manner than past approaches, including ACT-R. A major reason
that the SPA can employ highly detailed neural representations as well as abstract
symbol-like representations, is that the methods for characterizing representation
are flexibly specified. For instance, the NEF mapping of vector spaces on to neu-
rons does not impose any specific form on the neural nonlinearity. While most
models I have presented use a simple spiking leaky integrate-and-fire model, the
CHAPTER 9. THEORIES OF COGNITION 358
complexity of useful single neuron models in the SPA is limited only by the re-
searcher’s knowledge and available computational resources. In previous work,
we have used adapting LIF neurons (Singh and Eliasmith, 2006), reduced burst-
ing neurons (Tripp and Eliasmith, 2007), and detailed conductance-based neurons
(Eliasmith and Anderson, 2003). The fundamental approach remains the same.
Similarly, the amount of detail in the synaptic model is not constrained by the
NEF approach. Consequently, if deemed relevant, a research can introduce uncer-
tain vesicle release, saturating synaptic conductances, axonal delays, etc. In most
of our models we introduce a generic noise term to account for these kinds of un-
certainties, but the precise mechanisms leading to such variability can be included
if necessary for a desired explanation.
Much of this breadth stems from its adoption of the NEF, which has been, and
continues to be, centrally concerned with neural plausibility. For instance, in pre-
vious work it has been shown that any network constructed with the NEF can be
made to be consistent with Dale’s Principle3 , hence capturing a major architectural
constraint ignored by much past modelling (Parisien et al., 2008). In addition, a
wide variety of single-cell dynamics, including adaptation, bursting, Poisson-like
firing, etc., has been directly incorporated into the methods of the NEF, helping to
make it generally applicable to cortical and subcortical systems (Tripp and Elia-
smith, 2007). The continuing development of a neurally responsible theoretical
substrate for the SPA means that the SPA characterization of biological cognition
continues to incorporate ever more subtle aspects of biology.
Each such added detail improves the realism with which the model’s mech-
anisms approximate those that have been characterized by neuroscientists. And,
each such added detail opens a field of neuroscience from which data can be drawn
to be directly compared to the model. Furthermore, many such details result in
changes in neural spike patterns, one of the most commonly measured properties
of neural systems. Thus, model results from the SPA approach are directly com-
parable to one of the most common sources of neural data. Furthermore, given our
understanding of the relationship between single cell activity and other methods
for recording neural signals, we can use these single cell models to predict other
kinds of data, such as that gathered from local field potentials (LFPs), electroen-
cephalograms (EEGs), fMRI, and so on. Essentially, different kinds of filters can
be used to process the model data and provide comparisons to other methods of
measuring brains (see, e.g., section 5.8).
3 Dale’s Principle states that the vast majority of neurons in the brain are either inhibitory or
excitatory, not both.
CHAPTER 9. THEORIES OF COGNITION 359
In sum, the SPA not only provides unique advantages, such as flexible bio-
logical plausibility, it does not suffer any of the most salient drawbacks of past
approaches. Of course, much remains to be done to truly demonstrate that the
SPA can underwrite large-scale biological cognition. In the next chapter I turn to
consideration of some of the central challenges for the SPA, and briefly consider
several conceptual consequences of thinking about cognition from the perspective
the SPA.
Chapter 10
362
CHAPTER 10. CONSEQUENCES AND CHALLENGES 363
wide variety and subtlety of views on any of these notions. Instead, I take the
short discussions I present to be food-for-thought more than carefully constructed
arguments. The philosopher in me feels guilty for giving such short shrift to such
important issues. Nevertheless, I think it is helpful to consider the variety of
ramifications adopting the SPA may have, even if only briefly.
10.1.1 Representation
In some quarters, the notion of representation has been closely identified with clas-
sical digital computers (e.g., van Gelder, 1995). However, behavioral scientists
typically have a much more liberal notion, freely talking about representations in
neural networks, analog computers, and even in the structure of the dynamical
system state spaces (Schöner, 2008). In the SPA I have adopted this more liberal
view.
This is at least partly because I believe the SPA can provide an account which
inter-relates the many kinds of representation identified in behavioral science re-
search. Like the NEF on which it is based, the SPA is quite flexible in what it
calls a representation. SPA representations can be scalar values, vectors, func-
tions, vector fields, symbols, images, and so on. In every case, however, mental
representations can be characterized as vectors implemented in neurons. The rep-
resentational commitments of the SPA are thus both flexible and unified.
One unique aspect of the SPA characterization of representation is that the
vectors identified as representations are typically not vectors of neural activity. In
many ways, the representational commitments of the SPA will strike many as be-
ing much like those of connectionists. However, the SPA provides two consistent
perspectives on the activities of neurons. One relates to directly measurable prop-
erties of the system, such as spiking patterns. The other relates to the underlying
vector space that is taken to be represented by those measurable properties. This
distinction is crucial for bolstering our understanding of neural function because
the mapping between the biological wetware and an abstract vector space provides
a means of relating brain function to a more general scientific understanding of
the world. For example, it allows us to understand certain neurons in area MT as
being sensitive to visual velocity. It also enables us to understand neurons in pri-
mary motor cortex as representing variables related to the dynamics of movement.
Essentially, this mapping gives us a way to connect concise, scientific, and low-
dimensional descriptions of such properties to the often very high-dimensional
neural activities that we record from brains interacting with those properties.
We may, in a sense, think of representations of vectors in the SPA as being
CHAPTER 10. CONSEQUENCES AND CHALLENGES 364
“doubly distributed”. That is, representations of symbols are not only distributed
over the vector space, but that vector space is also distributed over neural activities.
I take this as an important strength of the SPA because it allows us to relate high-
level characterizations of cognition in terms of symbols to detailed spiking data.
As well, it simultaneously allows us to characterizes the internal structure of such
symbols, providing an increase in the variety and subtlety of manipulations we
can define over such representations (see section 7.4).
This way of understanding mental representations has consequences for both
neuroscientific and psychological understandings of representation. In neuro-
science, taking a single cell to have a preferred direction in a high-dimensional
state space can go a long way to helping us understand what is often considered
the perplexing variety of neural responses observed with respect input stimuli.
For instance, in working memory, allowing for higher dimensional representations
provides for much simpler characterizations of the variety of dynamics observed
during working memory delay periods (Singh and Eliasmith, 2006; Machens et al.,
2010). As well, the seeming complexity of remapping in prefrontal cortex (Hoshi,
2006) can be easily understood in the SPA as routing different sources of informa-
tion into the same high-dimensional vector space (e.g., visual information in one
task, and auditory information in another task). Furthermore, the SPA characteri-
zation of representation emphasizes the importance of specifying not only encod-
ing, but also decoding assumptions when making representational claims about
particular neural systems. Often, the term “representation” has been misleadingly
used in neuroscience to identify only encodings (Eliasmith, 2005a), though more
recent work often addresses decoding as well.
In fact, semantic pointers can be seen as a natural generalization of well-
known neural representation schemes in neuroscience. Perhaps the most probed
representations in the brain are those in primary visual cortex (V1). In V1, in-
dividual cells are often characterized as sensitive to specifically oriented bars in
particular retinal locations, with their activity falling off as the stimulus either ro-
tates from the preferred orientation, or moves out of the specific retinal location
(Kandel et al., 2000, pp. 532-540). It is often fruitful to consider these individ-
ual cells as carrying specific content about input stimuli (e.g., oriented bars at a
location). In this very particular sense, we can identify each cell with a specific
label indicating what it means to the animal. While this could be interpreted as an
unusual type of localist representation, a more natural interpretation arises if we
consider a large population of such neurons, in which case they can be seen as a
distributed representation of an entire visual image.
The SPA uses representations that are localist and distributed in precisely this
CHAPTER 10. CONSEQUENCES AND CHALLENGES 365
same way, at many levels of description. For example, these “localist” single cells
in V1 are often taken to represent coefficients of Fourier-like decompositions of
images (Olshausen and Field, 1996).1 This same kind of decomposition forms the
basis of SPA visual representation (see section 3.5). More interestingly, the SPA
extends this kind of analysis to lexical representation. That is, at higher levels
“localist” single cells are taken to represent coefficients of Fourier-like decompo-
sitions of concepts. In both cases, the neural tuning curve is taken to be the basis
for the decomposition, and the neural activity is the currently relevant coefficient.
Furthermore, just as there is significant redundancy in V1 single cell tuning (sug-
gesting that many neurons participate to encode a given coefficient), there is also
significant redundancy in the SPA representation of semantic pointers. Applying
this representational scheme to symbolic concepts is thus a natural generalization
of existing neural evidence.
In a more psychological context, there are at least three major consequences of
SPA representations: 1) the internal structure of symbols is important for charac-
terizing a wide variety of cognitive behavior; 2) mental representations are better
characterized as “symbol-like” rather than symbols; and 3) mental representations
are best thought of as often temporary processing states of activity, rather than as
objects that reside in a specific location in the system. The first consequence al-
lows us to identify more sophisticated kinds of transformations of representations
(see section 7.4), as well as providing a clear relationship between perceptual and
lexical/conceptual representation.
The second consequence arises because of the poor scalability of attempts to
do otherwise. Characterizing mental representations as symbol-like carries with it
a reminder that many cognitive properties, such as systematicity, compositionality,
and productivity, are idealizations and hence not practical constraints on cognitive
architectures. In contrast, the hard constraints provided by the available physical
resources of the brain seem likely only to be respected if the system traffics in
symbol-like representations.
The third consequence helps to highlight the importance of vector processing,
in contrast to the mere “activation” of putative representational states. In essence,
the SPA suggests that we think about the brain as more of a processor of neural
signals rather than as a storehouse of neural representations. That is, it suggests
that we should think of a neural population as a temporary way-station for con-
1 Representations similar in these respects to those found in V1 have been identified in auditory
cortex, the hippocampal complex (including place cells and head direction cells), motor areas, and
many other parts of the brain.
CHAPTER 10. CONSEQUENCES AND CHALLENGES 366
stantly changing representational content, rather than a home-base for some small
subset of re-activating representations. Doing so provides an understanding of
how the brain can be a flexible, adaptive, and computationally powerful system.
This shift in metaphors also highlights the temporal aspects of neural representa-
tion in the SPA. Because the representational contents of a given neural population
can change quite dramatically from moment to moment, the dynamics of neural
representation must be considered when constructing SPA models. I return to the
importance of dynamics shortly.
All three consequences also help identify the fact that most representations in
the SPA are a kind of semantic promissory note. Semantic pointers carry with
them a means of accessing much more sophisticated semantics than they them-
selves directly encode. Thinking of central representational structures as not being
unified semantic wholes, like symbols, provides a subtler, and more biologically
plausible view of how mental representation might be organized. The idea that
semantics are encoded through compression and dereferencing provides a natural
explanation for the various amounts of time and effort it takes to extract semantics
of words from experimental subjects.
Notably, my use of “semantics” here refers largely to the relationship between
representational vehicles described by the SPA. The relationship of those vehi-
cles to the external world is also usually considered an important part of “seman-
tics”. While I have included such relationships in several SPA models, arguing that
this demonstrates the SPA’s ability to address the symbol grounding problem (see
chapter 3), I have not described a general semantic theory in this book. However,
I have described a theory of “neurosemantics” in detail in past work (Eliasmith,
2000, 2006), which I take to be fully compatible with the SPA. Encouragingly,
this theory has been used by others to account for semantic phenomena as well
(Parisien and Thagard, 2008). However, the details of that theory are beyond the
scope of this book.
10.1.2 Concepts
Recent theoretical work on concepts seems to have come to a kind of agreement
that “in short, concepts are a mess” (Murphy, 2002, p. 492). For some, this is
just an observation about the state of the field that means more work needs to
be done. For others, such as Machery (2009), this leads to the conclusion that
“the notion of concepts ought to be eliminated from the theoretical vocabulary of
psychology” (p. 4). Machery’s argument begins with the observation that there are
many different things concepts are used to explain (e.g., motor planning, analogy,
CHAPTER 10. CONSEQUENCES AND CHALLENGES 367
suggestive. Nevertheless, they are at least suggestive: that is even these brief
considerations make it plausible that the SPA can provide the resources needed
to unify our understanding of concepts. Again, I will emphasize that the contents
and behavioral efficacy of any semantic pointer depend on its being situated in
the architecture. So, while we can identify a single vehicle with the concept for
expediency, the contents of that concept is determined by semantics that are only
fully defined by accessing representations spread throughout the system.
The SPA suggests, then, that we can come to better understanding of concepts
by providing a computational definition that allows us to examine our simulations
in a way analogous to how we examine human subjects. Perhaps we can begin
to tackle the disarray in conceptual theorizing by adopting a unification through a
single functional architecture. Such an architecture will need to be shown to give
rise to the wide variety of complex results generated by psychologists in relation
to concepts. The examples I provided in this book go only a small way to realizing
this goal. But, I believe the SPA may provide some basic resources for tackling
the problem of systematizing our understanding of concepts.
10.1.3 Inference
Like “concept”, “inference” suffers from a very broad application in the behav-
ioral sciences. For instance, the term is used to refer to automatic, “sub-personal”
processes that lead us to have representations of the world that go beyond the in-
formation available to our senses. For example, when I assume that objects have a
back side even though I cannot see it, I am often said to be relying on mechanisms
of perceptual inference. In other instances, the term is used to characterize ef-
fortful reasoning that helps us determine regularities in our environment. In some
such cases, we do not explicitly know the rules we are using to reason in this way.
In other cases, we may know the rule explicitly, such as when we are performing
logical inference.
To determine if an inference has been made, experimenters typically observe
the choice behavior of a subject. More often than not, such behaviors can be nat-
urally characterized as a kind of decision-making. As a result, decision-making
and inference often go hand-in-hand in the behavioral sciences. This is unsurpris-
ing, since we might expect inference to result in a representation that can support
the choice of actions. Given the huge variety of decision-making and inference
phenomena, it is tempting to make an argument analogous to that which Machery
made about concepts: that the diffuseness of the phenomena suggests we can nei-
ther identify a specific phenomenon as being diagnostic of inference nor can we
CHAPTER 10. CONSEQUENCES AND CHALLENGES 370
ference” has a long way to go. While a model like Spaun goes some way to inte-
grating this variety of inferencing processes within the context of a single model,
it clearly does not fully capture the complexity of even these examples. Never-
theless, it does suggest that we ought to be able understand the wide variety of
inference phenomena that we encounter in the behavioral sciences by employing
unified architectures like the SPA.
10.1.4 Dynamics
In past work, I have been quite critical of dynamicism as a cognitive approach
(Eliasmith, 1996, 1997, 1998, 2009b). However, I believe that the SPA provides a
way of addressing those criticisms, while still embracing the compelling insights
of the view. For instance, dynamicists generally do not map the parameters in
their proposed models (e.g. “motivation”) to the underlying physical system that
is being modeled (i.e., neurons, networks, brain areas, etc.) in any detail (see, e.g.,
Busemeyer and Townsend, 1993). Clearly, the SPA provides a mapping down to
the level of individual neurons, allowing us to understand the mechanisms which
give rise to the dynamics that we observe in brains and in behavior. As well, I
and others (e.g., Bechtel, 1998) have criticized the anti-representationalism that
is often espoused by dynamicists, something clearly not embraced by the SPA.
More generally, I have had serious concerns regarding the approach to dy-
namics taken by all three standard approaches. As I discussed to some extent in
section 9.2, none of the three approaches provides principles for mapping cogni-
tive models onto independently measurable, low-level system dynamics, such as
membrane or synaptic time constants, refractory periods, and so on. In short, un-
like past approaches, the SPA is concerned with real-time dynamics “all the way
through”. It is concerned with the dynamics inherent in the physical body which
must be controlled by the nervous system (section 3.6). It is concerned with the
dynamics of recurrently connected spiking attractor networks (section 6.2). And,
it is concerned with the dynamics of complex cognitive processes (section 5.8).
Above all, the SPA is concerned with how all of the various temporal measure-
ments we make of the system, at many levels of analysis, interact to give rise to
the variety of observed dynamics.
One main consequence of this focus on real-time and real physics, is that the
SPA is in a position to embrace the constructive insights of researchers who em-
phasize the embodiment and embeddedness of cognitive systems in their environ-
ment. I have argued elsewhere that a problematic tendency of these researchers is
to blur the lines between agents and environments (Eliasmith, 2009a). Indeed it
CHAPTER 10. CONSEQUENCES AND CHALLENGES 372
has been claimed that “nothing [other than the presence of skin] seems different”
(Clark and Chalmers, 2002, p. 644) between brain-brain and brain-world interac-
tions. The suggested conclusion is that our characterization of cognition cannot
stop at the boundaries of an agent. Ironically, I believe that the very dynamical
nature of these systems, which is often appealed to in drawing such conclusions,
suggests exactly why there is a difference between the inside and outside of the
nervous system. In short, the degree and speed of coupling inside the nervous
system is generally much greater than that between the nervous system and the
body, or the body and the world. One straightforward reason for this is that the
body has mass. Hence, the dynamics tend to be considerably different than those
of the transmission of electrical signals, which have essentially no mass.
This suggestion, that we can rely on differences in dynamics to identify use-
ful system boundaries for scientific exploration, has a further consequence for
a long-held position in philosophy called “functionalism”. Functionalism is the
view that what makes a system the kind of system it is, for example a mind, is
determined by the functions that it computes. On a standard understanding of
what counts as computation (e.g., as characterized by Turing machines) function-
alism suggests that two systems operating at very different speeds, that is with
very different dynamics, could both be computing the same function (Turing ma-
chines ignore the amount of time taken to compute a function). As a consequence,
philosophers have sometimes argued that an interstellar gas cloud, or a disembod-
ied spirit, might have a mind because it might be computing the same function
as, or functionally isomorphic to, humans (Putnam, 1975). However, such con-
clusions clearly do not hold if we think that the length of time it takes to compute
a function matters for whether or not that function is being realized. Mathemati-
cally, all that I am suggesting is that functions describing cognition be written as
not only functions of state, but also functions of time. As I have suggested else-
where, such a “temporal functionalism” provides a much more fine-grained, and I
believe plausible, basis for a taxonomy of cognitive systems (Eliasmith, 2003).
The importance of time for understanding what cognitive functions a biolog-
ical system can actually compute is illustrated by the discussion of control in
chapter 5. Indeed, the obvious practical importance of control and routing make it
hard to dissociate the functions a system can perform from its dynamics: consider
the consequences of replacing digital telephone switches with human operators.
One interesting consequence of the SPA focus on control, is that it helps to expand
our concepts of adaptation and of learning in cognitive science. As mentioned in
section 6.4, learning is most often associated with changing weights, or construct-
ing new, permanent representations in long-term memory. However, as was made
CHAPTER 10. CONSEQUENCES AND CHALLENGES 373
evident by the Raven’s matrix task, the ability of past input to influence future
response can often be a consequence of controlled recursive dynamics. In some
cases, as emphasized by Bob Hadley’s rapid variable creation task (section 7.3.7),
such dynamics seem the only possible explanation for the observed cognitive phe-
nomena.
In sum, the SPA has conceptual consequences for embodiment, system bound-
aries, functionalism, and learning. However, the SPA also has practical conse-
quences for contemporary discussions of dynamics in experimental neuroscience.
In recent years, there has be been an increasing emphasis on the kinds of frequen-
cies and temporal correlations observed during tasks in a wide variety of brain
structures (Fries, 2009). Consideration of such phenomena has not played a role
in the development of the SPA. Nevertheless, it is an important empirical con-
straint on SPA models.
For example, Pesaran et al. (2002) performed working memory experiments
while recording from monkey cortex, and performed a spectrographic analysis.
This analysis has been taken to show that during working memory, there is a shift
in frequencies, increasing power in the gamma band (i.e. 25-90 Hz) during the
delay period. As shown in figure 10.1, the single cell and population frequency
spectrograms are for the model and the data are similar. Some have suggested that
such results indicate that gamma band synchronization is a fundamental mecha-
nism of neural computation (Fries, 2009). However, I would be hesitant to call
them “fundamental”, as there was no need to consider them in order to construct
models which give rise to the same patterns. Others have suggested that such
patterns are merely epiphenomenal (Tovée and Rolls, 1992). Such phenomena
are unlikely to be strictly epiphenomenal, since the presence or absence of such
synchronization will have consequences for precise currents in the dendrites in
neurons receiving such signals. I am inclined to think that such phenomena are
neither fundamental nor epiphenomenal, but instead they are the consequences
of the implementational constraints imposed by using neurons for implementing
cognitive systems. Not surprisingly, those implementational constraints will have
some functional implications themselves, though they do not seem to be so severe
as to dominate the information processing occurring within the brain.
a) Data b) Model
Single cell Single cell
200 75 200 75
Frequency (Hz)
Spectrogram
0 0 0 0
Frequency (Hz)
spectrogram
1 1
0 0.8 0 0.8
0 1 2 0 1 2
Time (s) Time (s)
remain for the development of the SPA. I have mentioned various issues through-
out the book, but attempt to gather and expand on some of them in the following
three sections.
10.2.1 Representation
The precise nature of time-dependent representation in a biological system de-
pends on the physical properties of the neurons that underwrite that representation.
Consequently, low-level biological properties have representational consequences
in the SPA. So, any simplifications that are made in simulations introduce the pos-
sibility of mischaracterizing representation. While the NEF goes a long way to
incorporating a wide variety of low-level biophysical properties of cells, there are
some assumptions in the software implementation in Nengo which may need to
be revisited as SPA models grow more sophisticated. For example, we typically
assume that all synapses in a particular projection between populations have the
same dynamics. However, we know that there is a distribution of time constants,
and the processes that determine the precise synaptic dynamics of each cell are
somewhat variable.
In general, we have so far assumed that all neurons are point neurons. That
is, they have no spatial extent. Consequently, we do not currently account for
the location of synapses on a receiving cell’s dendritic tree or soma. In addition,
Nengo neuron models do not capture some more detailed phenomena such as
dendritic spiking, back propagation of somatic potentials up dendritic trees, or
other space dependent dynamics. To summarize, many of our models could stand
to have more synaptic heterogeneity and could better account for spatial effects.
Both of these improvements, however, come at high computational costs.
In many ways, the SPA is more focused on representation in populations of
neurons than in single neurons. At this level of analysis, there are many challenges
specific to functional properties of a given part of the brain. For example, there is
always the challenge of determining the appropriate dimensionality of the space
to be represented by a population of cells. Relatedly, determining how much and
what kinds of compression need to occur through sequential neural processing of
semantic pointers may vary from area to area, and hence needs to be considered
carefully for each new model. In general, these observations simply boil down
to the fact that many specific representational properties are likely to depend on
the particular functions we are modelling in a given SPA model. Perhaps the
most challenging area where such questions must be answered is in the domain of
language.
CHAPTER 10. CONSEQUENCES AND CHALLENGES 376
10.2.2 Architecture
When pressed, I would be willing to concede that the SPA is less of an architec-
ture, and more of an architecture sketch combined with a protocol. That is, the
SPA, as it stands, is not overly committed to a specific arrangement and attribution
of functions to parts of the brain. As a result, a major challenge for the SPA is to
fill in the details of this schema. I believe a natural approach for doing so is to
adopt Michael Anderson’s “massive redeployment hypothesis” (Anderson, 2010).
This is the idea, consistent with the SPA, that neural areas compute generically
specifiable functions that are used in a wide variety of seemingly disparate tasks.
CHAPTER 10. CONSEQUENCES AND CHALLENGES 377
The SPA provides a means of understanding how the same neural resources can be
driven by very different inputs, i.e., by redeploying the resources of cortex using
routing strategies. Nevertheless, which areas of cortex perform which functions is
highly underspecified by the SPA as it stands.
One of the functions that the SPA is most explicit about, is that of the basal
ganglia. As initially described, the SPA employs the basal ganglia to perform ac-
tion selection. However, there is good evidence that if the output of the basal gan-
glia is removed, actions can still be selected, though less smoothly and quickly.
This is consistent with recent hypotheses, supported by anatomical and physio-
logical experiments, which suggests that the basal ganglia is important for novel
action sequences and teaching the cortex, and not for well-learned actions (Turner
and Desmurget, 2010). However, the statistical reinterpretation that I provided
in section 7.4, very naturally fits with such a view. After all, in that interpreta-
tion it is clear that cortical processing must precede basal ganglia processing, and
can result in action selection even if basal ganglia does not play its typical role.
Specifically, this is because the role of the basal ganglia under that interpretation
is more clearly one of refinement, than one on which all subsequent cortical pro-
cessing depends. Indeed, this is also true of the matrix-vector interpretation, but
is less explicit. In both cases, we can think of the architecture as having a kind
of “default routing” which will allow somewhat inflexible processing to result in
reasonably appropriate actions for given inputs. As a result, however, it becomes
clear that a major challenge for the SPA is to provide detailed hypotheses regard-
ing such default cortical interactions.
Thinking of the basal ganglia as more of a teacher than a “selector” also raises
aspects of learning that are not yet well integrated into the SPA. For example,
connections between thalamus and cortex are likely as sensitive to environmental
contingencies as those from cortex to striatum, but they are not accounted for by
learning in the SPA. Similarly, if we think basal ganglia as performing a corrective
role in tuning cortico-cortical connections, we should include such modulation
effects in projections to cortex.
Even many of those elements of cortical processing that we have included in
our architecture need to be significantly extended. For example, visual routing
and object recognition should be implemented in the same network, although I
have considered them separately in the models presented in sections 5.5 and 3.5.
As well, several challenges remain regarding clean-up memories (see section 4.5).
Not only do we need to better understand which anatomical areas act as clean-up
memories in cortex, we also need methods for rapidly constructing such memo-
ries, extending such memories as new information becomes available, and so on.
CHAPTER 10. CONSEQUENCES AND CHALLENGES 378
10.2.3 Scaling
Many of the architectural challenges in the previous section could be equally well
considered scaling challenges because many are demands for more elements in
the architecture. Undoubtedly, as the number of elements increases, scaling chal-
lenges related to integration can become critical. However, there is another kind
of scaling problem that we face even with a fixed number of anatomical elements.
Consider, for example, the Spaun model. Despite the fact that it has several
anatomical areas integrated in the model, a large number of them are quite simple:
reinforcement learning is applied to only a few simple actions; motor control is
applied to a single arm; perceptual input is in a fixed location; and semantics is
limited to the domain of numbers. These issues of scaling arise without consider-
ation of adding elements to the model: that is my focus here.
To begin, reinforcement learning (RL) should be demonstrated in more com-
plex tasks like the Tower of Hanoi, where the representations and set of pos-
CHAPTER 10. CONSEQUENCES AND CHALLENGES 379
sible actions are of much greater complexity. One of the major challenges for
RL is correctly representing state and action spaces. Consequently it is crucial
to demonstrate that the representational assumptions of the SPA are consistent
with RL algorithms that may be implemented in basal ganglia-cortex interactions.
Even more convincing would be the generation of the representations from sen-
sory inputs. This would necessitate scaling the kinds of adaptation and learning
employed in the SPA independently of RL, as well.
Such an extension would have implications for semantics in the SPA. Cur-
rently, the semantics of representations in most of the models presented are quite
limited. Scaling these to include additional modalities, dynamic transformations,
and more sophisticated lexical relationships remains a challenge. One potentially
fruitful strategy for addressing this problem is to attempt to build on successful
past approaches (e.g., Rogers and McClelland, 2004) that are consistent with
SPA commitments. This would also allow models to explicitly scale to adult-
sized vocabularies, as I have only made in principle arguments that this will work.
Such models would also allow the SPA to begin to address challenges related to
scaling the sophistication of linguist behaviors, as discussed in the section repre-
sentational challenges.
A quite different scaling challenge is increasing the degrees of freedom in the
motor system. Not only do current SPA implementations only account for the
arm, and not the rest of the body, the models have fewer degrees of freedom than
the human arm, and are constrained to move in a plane. One excellent test for
scaling the SPA motor system would be to embed the SPA into an actual physical
body. This would help explicitly realize the ability of the SPA to account for
both more immediate environmental interactions, as well as aspects of cognitive
performance. However, the computational demands of such real-time interaction
are severe.
In fact, such computational demands are a general practical consequence of
attempting to address the scaling challenges. To address these to some degree, we
have written code for Nengo to run on GPUs, taking advantage of their massive
parallel computational resources, which significantly speeds up Nengo models.
However, ultimately, the low-power high-efficiency computing necessary to run
large simulations in real time will most likely be realized through hardware im-
plementation. For this reason it is an important, practical scaling challenge to im-
plement SPA models on hardware architectures, such as Neurogrid (Silver et al.,
2007) or SpiNNaker (Khan et al., 2008). Our lab is currently working in collabo-
ration with Kwabena Boahen’s group at Stanford on Neurogrid chips, which can
currently simulate up to a million neurons in real time. However, we are only
CHAPTER 10. CONSEQUENCES AND CHALLENGES 380
beginning to address the many challenges that remain for implementing arbitrary
SPA models in neuromorphic hardware.
10.3 Conclusion
Perhaps the greatest challenge for the SPA is for it to become any kind of seri-
ous contender in the behavioral sciences. The field is a vibrant and crowded one.
Consequently, I will be happy if reading this book convinces even a handful of
researchers that we ought to relate all of our cognitive theories to the biological
substrate. I will be very happy if this book plays a role in increasing the number
of cognitive models that are specified at the neural level. I will be ecstatic if the
SPA itself is deemed useful in constructing such models. And, I will be shocked
if major elements of the SPA turn out to be right. While I have attempted to show
how the SPA can address a wide variety of the QCC, often better than its com-
petitors, the many challenges that remain for the SPA make it unclear if anything
recognizable as the SPA itself will remain as such challenges are addressed.
One thing that has become clear to me, is that addressing such challenges for
the SPA, or any other architecture, is going to be the work of a community of
researchers. This is not the observation that the behavioral sciences are interdis-
ciplinary. Instead, it is the observation that to make real progress on advancing
our understanding of biological cognition, groups with diverse expertise will have
to work together, not just alongside one another. I suspect that the only practical
way to do this is to have large-scale integrated theoretical constructs, likely in the
form of a computational model or a family of such models, that all of these ex-
perts can test and extend. As a result, there needs to be a degree of agreement on
modeling practices, software, databases, and so on. Consequently, we continue
to spend significant effort developing not only an architecture, but tools that go
beyond the architecture, for helping to distribute, test, and implement biologically
realistic large-scale models.
Despite the enormous challenges that remain, I am optimistic. If the field more
aggressively pursues the integration of results across disciplines by developing
coherent methods and tools, then it will be progressively able to satisfy the QCC.
In short, I think that unraveling the mysteries of biological cognition is only a
matter of time.
Appendix A
A.1 Representation
A.1.1 Encoding
Consider a population of neurons whose activities ai (x) encode some vector, x.
These activities can be written
where Gi is the nonlinear function describing the neuron’s response function, and
Ji (x) is the current entering the soma. The somatic current is defined by
where Ji (x) is the current in the soma, αi is a gain and conversion factor, x is
the vector variable to be encoded, ei is the encoding vector which picks out the
‘preferred stimulus’ of the neuron (see section 2.5), and Jibias is a bias current that
accounts for background activity.
The nonlinearity Gi which describes the neuron’s activity as a result of this
current is determined by physiological properties of the neuron(s) being modeled.
The most common model used in the book is the leaky integrate-and-fire (LIF)
neuron. For descriptions of many other neural models, some of which are in
Nengo, see Bower and Beeman (1998); Koch (1999); Carnevale and Hines (2006).
381
APPENDIX A. MATHEMATICAL DERIVATIONS FOR THE NEF 382
A.1.2 Decoding
To define the decoding, we need to determine the postsynaptic current (PSC) and
the optimal decoding weight. A simple model of the PSC is
where τPSC is the time constant of decay of the PSC. This varies with the type of
neurotransmitter being used. Typical values are 5ms for AMPA, 10ms for GABA,
and 100ms for NMDA receptors.1
Given an input spike train δ (t − tm ) generated from the encoding above, the
‘filtering’ of the neural spikes by the PSC gives an ‘activity’ of2
M
ai (x) = ∑ h(t − tm )
m
1A wide variety of time constants, with supporting empirical evidence can be found at http:
//compneuro.uwaterloo.ca/cnrglab/?q=node/537.
2 It is important to keep in mind that the PSC filtering, and the weighting by a synaptic weight
happen at the same time, not in sequence as I am describing here. In addition, the decoders here
constitute only part of a neural connection weight.
APPENDIX A. MATHEMATICAL DERIVATIONS FOR THE NEF 383
matrix of neural activities where each element Γi j = R ai (x)a j (x)dx and Γii =
R
Γii + σ 2 . The σ 2 is the variance of the noise from which the ηi are picked.
The overall decoding equation is thus
N,M
x̂ = ∑ hi(t − tm)di. (A.7)
i,m
where N is the number of neurons, M is the number of spikes, i indexes the neu-
rons, m indexes the spikes, h(t) is the PSC of the neuron, x̂ is the estimate of the
variable being represented, and di is the decoder for neuron i to estimate x.
A.2 Transformation
To define a transformational decoder, we follow the same procedure as in section
A.1.2, but minimize a slightly different error to determine the population decoders,
namely:
" #2
Z N
E= f (x) − ∑(ai (x) + ηi )di dxdη
R i
where we have simply substituted f (x) for x in equation A.6. This results in the
related solution:
D f = Γ−1 ϒ f
APPENDIX A. MATHEMATICAL DERIVATIONS FOR THE NEF 384
f
where ϒi = R f (x)a(x)dx, resulting in the matrix D f where the transformation
R
f
decoders di can be used to give an estimate of the original transformation f (x)
as:
N
fˆ(x) = ∑ a(x)di .
f
i
Thus, over time we have the overall decoding equation for transformations as:
N,M
fˆ(x) = ∑ hi(t − tm)dif (A.8)
i,m
where N is the number of neurons, M is the number of spikes, i indexes the neu-
rons, m indexes the spikes, h(t) is the PSC of the neuron, fˆ(x) is the estimate
of the transformation being performed, x is the representation being transformed,
f
and di is the decoder for neuron i to compute f .
A.3 Dynamics
The equation describing figure 2.11a is:
Notably, the input matrix B and the dynamics matrix A completely describe the
dynamics of an linear time-invariant (LTI) system, given the state variables x(t)
and the input u(t). Taking the Laplace transform of (A.9) gives:
where h(s) = 1s .
In the case of the neural system, the transfer function h(s) is not 1s , but is deter-
mined by the intrinsic properties of the component cells. Because it is reasonable
to assume that the dynamics of the synaptic PSC dominate the dynamics of the
cellular response as a whole (Eliasmith and Anderson, 2003), it is reasonable to
characterize the dynamics of neural populations based on their synaptic dynamics,
i.e. using h(t) from equation (A.5). The Laplace transform of this filter is:
1
h0 (s) = .
1 + sτ
APPENDIX A. MATHEMATICAL DERIVATIONS FOR THE NEF 385
Given the change in filters from h(s) to h0 (s), we need to determine how to
change A and B in order to preserve the dynamics defined in the original system
(i.e. the one using h(s)). In other words, letting the neural dynamics be defined
by A0 and B0 , we need to determine the relation between matrices A and A0 and
matrices B and B0 given the differences between h(s) and h0 (s). To do so, we can
solve for sx(s) in both cases and equate the resulting expressions:
1
x(s) = [Ax(s) + Bu(s)]
s
sx(s) = [Ax(s) + Bu(s)]
and
1 0
x(s) = A x(s) + B0 u(s)
1 + sτ
1 0
sx(s) = A x(s) − x(s) + Bu(s)
τ
so
1 0
A x(s) − x(s) + Bu(s) = [Ax(s) + Bu(s)] .
τ
Rearranging and solving gives:
A0 = τA + I (A.10)
B0 = τB (A.11)
In short, the main difference between synaptic dynamics and integration is that
there is an exponential ‘forgetting’ of the current state with synaptic dynamics.
APPENDIX A. MATHEMATICAL DERIVATIONS FOR THE NEF 386
It is standard to visualize convolution as flipping the filter around the ordinate (y-
axis), and sliding it over the signal, computing the integral of the product of the
overlap of the two functions at each position, which gives z(t). In the case where
y(t) and x(t) are discrete vectors, the length of the result is equal to the sum of
the lengths of the original vectors, minus one. So, if the vectors are the same size,
the result is approximately double the original length. This could lead to scaling
problems in the case of using the operation for binding.
Circular convolution is the same operation, but the as the elements of y(t)
slide off of the end of x(t), they are wrapped back onto the beginning of x(t).
387
APPENDIX B. MATHEMATICAL DERIVATIONS FOR THE SPA 388
z = x~y
D−1
zj = ∑ xk y j−k
k=0
where subscripts are modulo D (the length of the filter vector). The result of this
form of convolution for two equal length vectors is equal to their original length
because of the wrapping.
A computationally efficient algorithm for computing the circular convolution
of two vectors takes advantage of the Discrete Fourier Transform (DFT) and
Inverse Discrete Fourier Transform (IDFT). In general, Fourier Transforms are
closely related to convolution because the Fourier Transform of any convolution
is a multiplication in the complimentary domain. The same is true for circular
convolution. So, the circular convolution of two finite-length vectors can be ex-
pressed in terms of DFT and IDFT as follows:
x = z ⊕ y = z ~ y−1
As discussed by Plate (2003), while most vectors do have an exact inverse under
convolution, it is numerically advantageous to use an approximate inverse (called
the involution) of a vector in the unbinding process. This is because the exact
inverse y−1 is unstable when elements of DFT (y) are near zero.
Involution of a vector x is defined as x0 = [x0 , xD−1 , xD−2 , ..., x1 ]. Notice that
this is simply a flip of all the elements after the first element. Thus, it is a simple
permutation of the original matrix. Unbinding then becomes
x = z ~ y0 .
Notice that the involution is a linear transformation. That is, we can define a
permutation matrix S such that Sx = x0 . As a result, the neural circuit for circular
convolution is easily modified to compute circular correlation, by using the matrix
CS = CDFT S in place of the second DFT matrix in the original circuit (equation
B.2). Specifically,
z ~ y0 = CIDFT ((CDFT z) . (CS y))
where again all matrices are constant for any vectors that need unbinding.
Min
i = γMi−1 + (Pi ~ Ii ) + Ii
in
Mepis
i = ρMepis
i−1 + (Pi ~ Ii ) + Ii
epis
MOSE
i = Min
i + Mi
Decoding:
Ii = cleanup(MOSE ~ P0i )
APPENDIX B. MATHEMATICAL DERIVATIONS FOR THE SPA 391
where Min input working memory trace, γ the rate of decay of the old memory
trace, Mepis the memory trace in the episodic memory, ρ scaling factor related
to primacy, M OSE is the overall encode memory of the list, P a position vector, I
an item vector, i indexes the associated item number, and cleanup() indicates the
application of the clean-up memory to the semantic pointer inside the brackets.
∆ωi j = κα j e j Eai
where ω is the connection weight, κ is the learning rate parameter, α is the gain of
the cell, e is the encoding vector of the cell, E is an error term, a is the activity of
the cell, and i and j index the pre and postsynaptic cells respectively. The gain and
encoding vectors are described in more detail in appendix A.1.1. The error term is
an input to the cell that captures the error in the performance of the network. This
term can be either a modulatory input, like dopamine, or another direct spiking
input carrying the relevant information.
The STDP rule from Pfister and Gerstner (2006) is in two parts, one describes
the effects of a pre-synaptic spike:
−(t pre −t post1 ) −(t pre −t pre2 )
− −
∆ωi j (t ) = −e
pre τ−
A2 + A3 e τx .
where ω is the connection weight, t pre/post is the time that the current pre/post-
synaptic spike occurred, t post1 /pre1 is the time that the last post/pre-synaptic spike
−/+
occurred, t pre2 /post2 is the time that the last pre/postsynaptic spike occurred, A2
−/+
are the weight of the pair-based rule, and A3 are the weight of the triplet-based
APPENDIX B. MATHEMATICAL DERIVATIONS FOR THE SPA 392
rule, τ+/− determine the shapes of the exponential weights given to the pair-based
rule, and τx/y determine the shapes of the exponential weights given to the triplet-
based rule.
The rule that results from combining these two previous rules follows from
substituting the STDP rule in for the activity variable of the NEF rule (i.e. ai ):
−(t pre −t post1 )
−(t pre −t pre2 )
− −
∆ωi j (t ) = κα j e j E −e
pre τ−
A2 + A3 e τx
and
per grammatical role (van der Velde & de Kamps, 2006), and we assume at least
100 neurons per neural group are needed to maintain accurate representation.
groups = words × assemblies × 8
= 60 000 × 100 × 8
= 48 000 000
neurons = groups × 100
= 4 800 000 000
area = neurons / 10 000 000 = 480 cm2
For the SPA, the number of dimensions required for similar structures (60,000
words, 8 possible bindings, 99% correct) is about 500 as reported in the main text
(see figure 4.7). If, by analogy to the assumptions in LISA and the NBA, we allow
each dimension to be a population of 100 neurons, representing any such structure
requires:
neurons = dimensions × 100 = 50 000
area = neurons/100 000 ≈ 1 mm2
This does not include the clean up memory, which is crucial for decoding. As
mentioned in the main text, this would add 3.5 mm2 of cortical area for a total of
approximately 5 mm2 (see section 4.5).
For the ICS, making the same assumptions as for the SPA, the number neurons
required for the sentence “Bill believes John loves Mary” can be determined as
follows. First, we allow the sentence to be represented the same as in the SPA:
S = relation⊗believes+sub ject ⊗Bill +ob ject ⊗(relation⊗loves+sub ject ⊗
John + ob ject ⊗ Mary)Each vector is assumed to have 500 dimensions, in order
to represent 60,000 words as above, so the maximum number of dimensions is:
dimensions = 500×500×500 = 125 000 000
neurons = dimensions×100 = 12 500 000 000
area = neurons/10 000 000 = 1 250 cm2
Which is about half the 2,500 cm2 of available cortex.
Appendix C
394
APPENDIX C. SPA MODEL DETAILS 395
Table C.1: The model elements of the SPA Tower of Hanoi model found in cortex.
The model also includes a basal ganglia and thalamus.
Name Type Description
state buffer Used to control the different stages of the
problem-solving algorithm.
focus buffer Stores the disk currently being attended to
(D0, D1, D2, D3)
focuspeg sensory Automatically contains the location of the
focus disk (A, B, C)
goal buffer Stores the disk we are trying to move (D0,
D1, D2, D3)
goaltarget buffer Stores the location we want to move the goal
disk to (A, B, C)
goalcurrent sensory Automatically contains the location of the
goal disk (A, B, C)
goalfinal sensory Automatically contains the final desired
location of the goal disk (A, B, C)
largest sensory Automatically contains the largest visible
disk (D3)
mem1 memory Stores an association between mem1 and
mem2 inv working memory
mem2 memory Stores an association between mem1 and
mem2 in working memory
request memory Indicates one element of a pair to attempt to
recall from working memory
recall memory The vector associated with the currently
requested vector
movedisk motor Tells the motor system which disk to move
(D0, D1, D2, D3)
movepeg motor Tells the motor system where to move the
disk to (A, B, C)
motor sensory Automatically contains DONE if the motor
action is finished
APPENDIX C. SPA MODEL DETAILS 396
Table C.2: A list of the 16 rules used in the Tower of Hanoi simulation. For the
matching, equality signs indicate the result of a dot product, where ‘=’ is summed
and ‘6=’ is subtracted. For the execution, all statements refer to setting the lefthand
model element to the value indicated on the righthand side.
Adolphs, R., Bechara, A., Tranel, D., Damasio, H., and Damasio, A. R. (1995).
Neuropsychological approaches to reasoning and decision-making. Neurobiol-
ogy of decision-making. Springer Verlag, New York.
Albin, R. L., Young, A. B., and Penney, J. B. (1989). The functional anatomy of
basal ganglia disorders. Trends in Neurosciences, 12:366–375.
Aldridge, J. W., Berridge, K., Herman, M., and Zimmer, L. (1993). Neuronal
coding of serial order: Syntax of grooming in the neostriatum. Psychological
Science, 4(6):391–395.
Allport, D. A. (1985). Distributed memory, modular subsystems and dysphasia.
In Newman, S. K. and Epstein, R., editors, Current Perspectives in Dysphasia,
pages 207–244. Churchill Livingstone, Edinburgh.
Almor, A. and Sloman, S. A. (1996). Is deontic reasoning special? Psychological
Review, 103:374–380.
Altmann, E. M. and Trafton, J. G. (2002). Memory for goals: An activation-based
model. Cognitive Science, 26:39–83.
Amari, S. (1975). Homogeneous nets of neuron-like elements. Biological Cyber-
netics, 17:211–220.
Amit, D. J. (1989). Modeling brain function: The world of attractor neural net-
works. Cambridge University Press, New York, NY.
Amit, D. J., Fusi, S., and Yakovlev, V. (1997). A paradigmatic working memory
(attractor) cell in IT cortex. Neural Computation, 9(5):1071–1092.
Andersen, R. A., Essick, G. K., and Siegel, R. M. (1985). The encoding of spatial
location by posterior parietal neurons. Science, 230:456–458.
397
BIBLIOGRAPHY 398
Anderson, J., John, B. E., Just, M., Carpenter, P. A., Kieras, D. E., and Meyer,
D. E. (1995a). Production system models of complex cognition. In Moore, J. D.
and Lehman, J. F., editors, Proceedings of the Seventeenth Annual Conference
of the Cognitive Science ..., page 9. Routledge.
Anderson, J. R. (2007). How can the human mind occur in the physical universe?
Oxford University Press.
Anderson, J. R., Albert, M. V., and Fincham, J. M. (2005). Tracing problem solv-
ing in real time: fMRI analysis of the subject-paced Tower of Hanoi. Journal
of cognitive neuroscience, 17(8):1261–74.
Anderson, J. R., Bothell, D., Byrne, M. D., Douglass, S., Lebiere, C., and Qin, Y.
(2004). An integrated theory of the mind. Psychological Review, 111(4):1036–
1060.
Anderson, J. R., Corbett, A., Koedinger, K. R., and Pelletier, R. (1995b). Cogni-
tive tutors: Lessons learned. Journal of the Learning Sciences, 4(2):167–207.
Anderson, J. R., Kushmerick, N., and Lebiere, C. (1993). Tower of Hanoi and
goal structures. Erlbaum Associates.
Askay, E., Baker, R., Seung, H. S., and Tank, D. (2000). Anatomy and discharge
properties of pre-motor neurons in the goldfish medulla that have eye-position
signals during fixations. Journal of Neurophysiology, 84:1035–1049.
Buffalo, E. A., Fries, P., Landman, R., Liang, H., and Desimone, R. (2010). A
backward progression of attentional effects in the ventral stream. Proceed-
ings of the National Academy of Sciences of the United States of America,
107(1):361–5.
Burger, P., Mehlb, E., Cameron, P. L., Maycoxa, P. R., Baumert, M., Friedrich, L.,
De Camilli, P., and Jahn, R. (1989). Synaptic vesicles immunoisolated from rat
cerebral cortex contain high levels of glutamate. Neuron, 3(6):715–720.
Calabresi, P., Picconi, B., Tozzi, A., and Di Filippo, M. (2007). Dopamine-
mediated regulation of corticostriatal synaptic plasticity. Trends in neuro-
sciences, 30(5):211–9.
Canessa, N., Gorini, A., Cappa, S. F., Piattelli-Palmarini, M., Danna, M., Fazio,
F., and Perani, D. (2005). The effect of social content on deductive reasoning:
an fMRI study. Human brain mapping, 26(1):30–43.
Carpenter, P., Just, M., and Shell, P. (1990). What one intelligence test measures:
a theoretical account of the processing in the Raven Progressive Matrices Test.
Psychological Review, 97(3):404–31.
BIBLIOGRAPHY 403
Choo, X. (2010). The Ordinal Serial Encoding Model: Serial Memory in Spiking
Neurons. Masters, University of Waterloo.
Churchland, M. M., Cunningham, J. P., Kaufman, M. T., Ryu, S. I., and Shenoy,
K. V. (2010). Cortical Preparatory Activity: Representation of Movement or
First Cog in a Dynamical Machine? Neuron, 68(3):387–400.
Clark, A. (1997). Being there: Putting brain, body and world together again.
MIT Press, Cambridge, MA.
Collins, A. and Quillian, M. (1969). Retrieval time from semantic memory. Jour-
nal of Verbal Learning and Verbal Behavior, 8(2):240–247.
Cosmides, L. (1989). The logic of social exchange: Has natural selection shaped
how humans reason? Studies with the Wason selection task. Cognition, 31:187–
276.
BIBLIOGRAPHY 404
Craver, C. (2007). Explaining the brain. Oxford University Press, Oxford, UK.
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., and Harshman,
R. (1990a). Indexing By Latent Semantic Analysis. Journal of the American
Society For Information Science, 41:391–407.
Deerwester, S., Furnas, G. W., Landauer, T. K., and Harshman, R. (1990b). In-
dexing by latent semantic analysis. Journal of the American Society For Infor-
mation Science, 41(6):391–407.
Dempster, A., Laird, N., and Rubin, D. (1977). Maximum Likelihood from In-
complete Data via the EM Algorithm. Journal of the Royal Statistical Society
Series B, 39(1):1–38.
Dennett, D. and Viger, C. (1999). Sort-of symbols? Behavioral and Brain Sci-
ences, 22(4):613.
Dewolf, T. and Eliamith, C. The neural optimal control hierarchy for motor con-
trol. Neural Engineering.
Dretske, F. (1994). If you can’t make one, you don’t know how it works, chapter
XIX, pages 615–678. Midwest Studies in Philosophy. University of Minnesota
Press, Minneapolis.
Edelman, S. and Breen, E. (1999). On the virtues of going all the way. Behavioral
and Brain Sciences, 22(4):614.
Engel, A. K., Fries, P., and Singer, W. (2001). Dynamic predictions: oscilla-
tions and synchrony in top-down processing. Nature reviews.Neuroscience,
2(10):704–716.
Faubel, C. and Sch, G. (2010). Learning Objects on the Fly: Object Recognition
for the Here and Now. In International Joint Conference on Neural Networks.
IEEE Press.
Fodor (1974). Special sciences (or: The disunity of science as a working hypoth-
esis). Synthese, 28(2):97.
Fodor, J. (1995). West coast fuzzy: Why we don’t know how brains work (review
of Paul Churchland’s The engine of reason, the seat of the soul). The Times
Literary Supplement, (August).
Fodor, J. (1998). Concepts: Where cognitive science went wrong. Oxford Univer-
sity Press, New York.
Georgopoulos, A. P., Kalasaka, J. F., Crutcher, M. D., Caminiti, R., and Massey,
J. T. (1984). The representation of movement direction in the motor cortex:
Single cell and population studies. In Edelman, G. M., Gail, W. E., and Cowan,
W. M., editors, Dynamic aspects of neocortical function. Neurosciences Re-
search Foundation.
Gershman, S., Cohen, J., and Niv, Y. (2010). Learning to selectively attend. In
Proceedings of the 32nd annual conference of the cognitive science society,
pages 1270–1275.
Gray, J. R., Chabris, C. F., and Braver, T. S. (2003). Neural mechanisms of general
fluid intelligence. Nature Neuroscience, 6(3):316–22.
Gupta, A., Wang, Y., and Markram, H. (2000). Organizing Principles for a Di-
versity of GABAergic Interneurons and Synapses in the Neocortex. Science,
287:273–278.
Hazy, T. E., Frank, M. J., and O’reilly, R. C. (2007). Towards an executive with-
out a homunculus: computational models of the prefrontal cortex/basal ganglia
system. Philosophical transactions of the Royal Society of London. Series B,
Biological sciences, 362(1485):1601–13.
Henson, R. (1998). Short-term memory for serial order: The start-end model.
Cognitive psychology, 36:73–137.
Henson, R., Noriss, D., Page, M., and Baddeley, A. (1996). Unchained memory:
Error patterns rule out chaining models of immediate serial recall. The quarterly
journal of experimental psychology A, 49(1):80–115.
Hier, D., Yoon, W., Mohr, J., Price, T., and Wolf, P. (1994). Gender and aphasia
in the stroke data bank. Brain and Language, 47:155–167.
Holmgren, C., Harkany, T., Svennenfors, B., and Zilberter, Y. (2003). Pyramidal
cell communication within local networks in layer 2/3 of rat neocortex. The
Journal of physiology, 551(Pt 1):139–53.
Hummel, J. E., Burns, B., and Holyoak, K. J. (1994). Analogical mapping by dy-
namic binding: Preliminary investigations, chapter 2. Advances in connection-
ist and neural computation theory: Analogical connections. Ablex, Norwood,
NJ.
Jancke, D., Erlhagen, W., Dinse, H., Akhavan, A., Steinhage, A., Schöner, G.,
and Giese, M. (1999). Population representation of retinal position in cat
primary visual cortex: interaction and dynamics. Journal of Neuroscience,
19(20):9016–9028.
Jilk, D., Lebiere, C., O’Reilly, R., and Anderson, J. (2008). SAL: an explicitly
pluralistic cognitive architecture. Journal of Experimental & Theoretical Arti-
ficial Intelligence, 20(3):197–218.
Johnson-Laird, P. N., Legrenzi, P., and Legrenzi, S. (1972). Reasoning and a sense
of reality. British Journal of Psychology, 63:395–400.
Kalisch, R., Korenfeld, E., Stephan, K. E., Weiskopf, N., Seymour, B., and Dolan,
R. J. (2006). Context-dependent human extinction memory is mediated by a
ventromedial prefrontal and hippocampal network. Journal of Neuroscience,
26(37):9503–9511.
Kan, I. P., Barsalou, L. W., Solomon, K. O., Minor, J. K., and Thompson-Schill,
S. L. (2003). Role of mental imagery in a property verification task: fMRI
evidence for perceptual representations of conceptual knowledge. Cognitive
Neuropsychology, 20:525–540.
Kandel, E., Schwartz, J. H., and Jessell, T. M. (2000). Principles of neural science.
McGraw Hill, New York, NY.
Kanerva, P. (1994). The spatter code for encoding concepts at many levels, chap-
ter 1, pages 226–229. Proceedings of the International Conference on Artificial
Neural Networks. Springer-Verlag, Sorrento, Italy.
Kawato, M. (1995). Cerebellum and motor control. In Arbib, M., editor, The
handbook of brain theory and neural networks. MIT Press, Cambridge, MA.
BIBLIOGRAPHY 413
Khan, M., Lester, D., Plana, L., Rast, A., Jin, X., Painkras, E., and Furber, S.
(2008). SpiNNaker: Mapping neural networks onto a massively-parallel chip
multiprocessor. IEEE.
Kim, H., Sul, J. H., Huh, N., Lee, D., and Jung, M. W. (2009). Role of striatum
in updating values of chosen actions. The Journal of neuroscience : the official
journal of the Society for Neuroscience, 29(47):14701–12.
Kim, R., Alterman, R., Kelly, P. J., Fazzini, E., Eidelberg, D., Beric, A., and
Sterio, D. (1997). Efficacy of bilateral pallidotomy. Neurosurgical FOCUS,
2(3):E10.
Koulakov, A. A., Raghavachari, S., Kepecs, A., and Lisman, J. E. (2002). Model
for a robust neural integrator. Nature neuroscience, 5(8):775–782.
Laubach, M., Caetano, M. S., Liu, B., Smith, N. J., Narayanan, N. S., and Elia-
smith, C. (2010). Neural circuits for persistent activity in medial prefrontal
cortex. In Society for Neuroscience Abstracts, page 200.18.
Lee, H., Ekanadham, C., and Ng, A. (2007). Sparse deep belief net model for
visual area V2. Advances in neural information processing systems, 20:1–8.
Legendre, G., Miyata, Y., and Smolensky, P. (1994). Principles for an integrated
connectionist/symbolic theory of higher cogntion. Lawrence Erlbaum Asso-
ciates, Hillsdale, NJ.
Lieberman, P. (2007). The Evolution of Human Speech: Its Anatomical and Neu-
ral Bases. Current Anthropology, 48(1).
Lipinski, J., Spencer, J. P., Samuelson, L. K., and Schöner, G. (2006). Spam-
ling: A dynamical model of spatial working memory and spatial language. In
Proceedings of the 28th Annual Conference of the Cognitive Science Society,
pages 768–773, Vancouver, Canada.
Litt, A., Eliasmith, C., and Thagard, P. (2008). Neural affective decision theory:
Choices, brains, and emotions. Cognitive Systems Research, 9:252–273.
Lund, J. S., Yoshioka, T., and Levitt, J. B. (1993). Comparison of intrinsic con-
nectivity in different areas of macaque monkey cerebral cortex. Cerebral cortex
(New York, N.Y. : 1991), 3(2):148–62.
Machamer, P., Darden, L., and Craver, C. (2000). Thinking about mechanisms.
Philosophy of Science, 67:1–25.
Machens, C. K., Romo, R., and Brody, C. D. (2010). Functional, but not anatom-
ical, separation of "what" and "when" in prefrontal cortex. The Journal of Neu-
roscience, 30(1):350–60.
Markram, H., Lübke, J., Frotscher, M., and Sakmann, B. (1997). Regulation of
synaptic efficacy by coincidence of postsynaptic APs and EPSPs. Science (New
York, N.Y.), 275(5297):213–5.
Miller, G. (1956). The magical number seven, plus or minus two: Some limits on
our capacity for processing information. Psychological review, 63:81–97.
Miller, P., Brody, C. D., Romo, R., and Wang, X. J. (2003). A Recurrent Network
Model of Somatosensory Parametric Working Memory in the Prefrontal Cortex.
Cerebral Cortex, 13:1208–1218.
Mink, J. W. (1996). The basal ganglia: Focused selection and inhibition of com-
peting motor programs. Progress in Neurobiology, 50:381–425.
Murdock, B. B. (1993). Todam2: A model for the storage and retrieval of item,
associative and serial-order information. Psychological review, 100(2):183–
203.
Murphy, G. L. (2002). The Big Book of Concepts. MIT Press, Cambridge MA.
Newell, A., Shaw, C., and Simon, H. (1958). Elements of a theory of human
problem solving. Psychological review, 65:151–166.
Newell, A. and Simon, H. (1963). GPS: A Program that simulates human thought.
McGraw-Hill, New York.
O’Reilly, R. C., Frank, M. J., Hazy, T. E., and Watz, B. (2007). PVLV: the primary
value and learned value Pavlovian learning algorithm. Behavioral neuroscience,
121(1):31–49.
BIBLIOGRAPHY 418
Oztop, E., Kawato, M., and Arbib, M. (2006). Mirror neurons and imitation : A
computationally guided review. Neural Networks, 19:254–271.
Paaß, G., Kindermann, J., and Leopold, E. (2004). Learning prototype ontolo-
gies by hierachical latent semantic analysis. In Lernen, Wissensentdeckung und
Adaptivit{ä}t, pages 193–205.
Page, M. and Norris, D. (1998). The primacy model: A new model of immediate
serial recall. Psychological review, 105(4):761–781.
Paivio, A. (1971). Imagery and verbal processes. Holt, Rinehart and Winston.
Parisien, C., Anderson, C. H., and Eliasmith, C. (2008). Solving the problem of
negative synaptic weights in cortical models. Neural Computation, 20:1473–
1494.
Parsons, L. and Osherson, D. (2001). New evidence for distinct right and left
brain systems for deductive versus probabilistic reasoning. Cerebral Cortex,
11:954–965.
Parsons, L., Osherson, D., and Martinez, M. (1999). Distinct neural mechanisms
for propositional logic and probabilistic reasoning. In Proceedings of the Psy-
chonomic Society Meeting, pages 61–62.
Perfetti, B., Saggino, A., Ferretti, A., Caulo, M., Romani, G. L., and Onofrj,
M. (2009). Differential patterns of cortical activation as a function of fluid
reasoning complexity. Human Brain Mapping, 30(2):497–510.
Pesaran, B., Pezaris, J. S., Sahani, M., Mitra, P. P., and Andersen, R. A. (2002).
Temporal structure in neuronal activity during working memory in macaque
parietal cortex. Nature neuroscience, 5(8):805–11.
Petersen, S., Robinson, D., and Keys, W. (1985). Pulvinar nuclei of the behaving
rhesus monkey: visual responses and their modulation. Journal of Neurophysi-
ology, 54(4):867.
Petersen, S., Robinson, D., and Morris, J. (1987). Contributions of the pulvinar to
visual spatial attention. Neuropsychologia, 25:97–105.
Pew, R. W. and Mavor, A. S., editors (1998). Modeling human and organiza-
tional behavior: Application to military simulations. National Academy Press,
Washington, DC.
Pfister, J.-P. and Gerstner, W. (2006). Triplets of spikes in a model of spike timing-
dependent plasticity. Journal of neuroscience, 26(38):9673–82.
Poirazi, P., Brannon, T., and Mel, B. W. (2003). Pyramidal Neuron as Two-Layer
Neural Network. Neuron, 37(6):989–999.
Polsky, A., Mel, B. W., and Schiller, J. (2004). Computational subunits in thin
dendrites of pyramidal cells. Nature neuroscience, 7(6):621–7.
Port, R. and van Gelder, T. (1995). Mind as motion: Explorations in the dynamics
of cognition. MIT Press, Cambridge, MA.
Prabhakaran, V., Smith, J., Desmond, J., Glover, G., and Gabrieli, E. (1997). Neu-
ral substrates of fluid reasoning: an fMRI study of neocortical activation during
performance of the Raven’s Progressive Matrices Test. Cognitive Psychology,
33:43–63.
Prinz, J. (2002). Furnishing the Mind: Concepts and the Perceptual Basis. MIT
Press.
Putnam, H. (1975). Philosophy and our mental life, chapter 2, pages 291–303.
Mind, language and reality: Philosophical papers. Cambridge University Press,
Cambridge.
BIBLIOGRAPHY 421
Quine, W. V. O. and Ullian, J. (1970). The web of belief. Random House, New
York.
Quiroga, R. Q., Reddy, L., Kreiman, G., Koch, C., and Fried, I. (2005). In-
variant visual representation by single neurons in the human brain. Nature,
435(7045):1102–7.
Quyen, M. L. V., Foucher, J., Lachaux, J.-P., Rodriguez, E., Lutz, A., Martinerie,
J., and Varela, F. J. (2001). Comparison of Hilbert transform and wavelet meth-
ods for the analysis of neuronal synchrony. Journal of Neuroscience Methods,
111(2):83–98.
Raven, J. (1962). Advanced Progressive Matrices (Sets I and II). Lewis, London.
Redgrave, P., Prescott, T., and Gurney, K. (1999). The basal ganglia: a vertebrate
solution to the selection problem? Neuroscience, 86:353–387.
Rieke, F., Warland, D., de Ruyter van Steveninick, R., and Bialek, W. (1997).
Spikes: Exploring the neural code. MIT Press, Cambridge, MA.
BIBLIOGRAPHY 422
Rinella, K., Bringsjord, S., and Yang, Y. (2001). Efficacious logic instruction:
People are not irremediably poor deductive reasoners, pages 851–856. Pro-
ceedings of the 23rd Annual Conference of the Cognitive Science Society.
Lawrence Erlbaum Associates, Mahwah, NJ.
Ringach, D. L. (2002). Spatial Structure and Symmetry of Simple-Cell Receptive
Fields in Macaque Primary Visual Cortex. J Neurophysiol, 88(1):455–463.
Ritter, D. A., Bhatt, D. H., and Fetcho, J. R. (2001). In Vivo Imaging of Ze-
brafish Reveals Differences in the Spinal Networks for Escape and Swimming
Movements. Journal of Neuroscience, 21(22):8956–8965.
Roberts, S. and Pashler, H. (2000). How persuasive is a good fit? A comment on
theory testing. Psychological Review, 107(2):358–367.
Rogers, T. T. and McClelland, J. L. (2004). Semantic cognition: a parallel dis-
tributed processing approach. MIT Press.
Romo, R., Brody, C. D., Hernández, A., and Lemus, L. (1999a). Neuronal corre-
lates of parametric working memory in the prefrontal cortex. Nature, 399:470–
473.
Romo, R., Brody, C. D., Hernandez, A., and Lemus, L. (1999b). Neuronal corre-
lates of parametric working memory in the prefrontal cortex. Nature, 399:470–
473.
Rosenblueth, A., Wiener, N., and Bigelow, J. (1943). Behavior, purpose, and
teleology. Philosophy of Science, 10:18–24.
Rumelhart, D. E. and McClelland, J. L. (1986). Parallel distributed processing:
Explorations in the microstructure of cognition. Number 1. MIT Press/Bradford
Books, Cambridge MA.
Rundus, D. (1971). Analysis of rehearsal processes in free recall. Journal of
experimental psychology, 89(1):63–77.
Ryan, L. and Clark, K. (1991). The role of the subthalamic nucleous in the re-
sponse of globus pallidus neurons to stimulation of the prelimbic and agranular
frontal cortices in rats. Experimental Brain Research, 86:641–651.
Salinas, E. and Abbott, L. F. (1997). Invariant Visual Responses From Attentional
Gain Fields. J Neurophysiol, 77(6):3267–3272.
BIBLIOGRAPHY 423
Schiller, J., Major, G., Koester, H. J., and Schiller, Y. (2000). NMDA spikes in
basal dendrites of cortical pyramidal neurons. Nature, 404(6775):285–9.
Schöner, G. and Thelen, E. (2006). Using dynamic field theory to rethink infant
habituation. Psychological Review, 113:273–299.
Schultz, W., Dayan, P., and Montague, P. R. (1997). A neural substrate of predic-
tion and reward. Science, 275:1593–1599.
Schutte, A. R., Spencer, J. P., and Schöner, G. (2003). Testing the dynamic field
theory: Working memory for locations becomes more spatially precise over
development. Child Development, (74):1393–1417.
Searle, J. (1984). Minds, Brains and Science: The 1984 Reith Lectures. Harvard
University Press, Cambridge, MA.
Silver, R., Boahen, K., Grillner, S., Kopell, N., and Olsen, K. L. (2007). Neurotech
for neuroscience: Unifying concepts, organizing principles, and emerging tools.
Journal of Neuroscience, 27(44):807–819.
Simmons, W. K., Hamann, S. B., Harenski, C. L., Hu, X. P., and Barsalou, L. W.
(2008). fMRI evidence for word association and situated simulation in concep-
tual processing. Journal of physiology, Paris, 102:106–119.
BIBLIOGRAPHY 424
Smolensky, P. and Legendre, G. (2006a). The Harmonic Mind: From Neural Com-
putation to Optimality-Theoretic Grammar Volume 1: Cognitive Architecture.
MIT Press, Cambridge MA.
Song, S., Sjostrom, P. J., Reigl, M., Nelson, S., and Chklovskii, D. B. (2005).
Highly nonrandom features of synaptic connectivity in local cortical circuits.
PLoS biology, 3(3):e68.
Sperber, D., Cara, E., and Girotto, R. (1995). Relevance theory explains the se-
lection task. Cognition, 57:31–95.
Spruston, N., Jonas, P., and Sakmann, B. (1995). Dendritic glutamate receptor
channel in rat hippocampal CA3 and CA1 pyramidal neurons. Journal of Phys-
iology, 482:325–352.
Squire, L. (1992). Memory and the hippocampus: a synthesis from findings with
rats, monkeys, and humans. Psychological review, 99:195–231.
BIBLIOGRAPHY 425
Stepniewska, I. (2004). The pulvinar complex. CRC Press LLC, Cambridge, MA.
Stewart, T., Choo, X., and Eliasmith, C. (2010a). Dynamic Behaviour of a Spiking
Model of Action Selection in the Basal Ganglia. In Salvucci, D. D. and Gun-
zelmann, G., editors, 10th International Conference on Cognitive Modeling.
Stewart, T., Tang, Y., and Eliasmith, C. (2010b). A biologically realistic cleanup
memory: Autoassociation in spiking neurons. Cognitive Systems Research.
Taatgen, N. and Anderson, J. (2010). The past, present, and future of cognitive
architectures. Topics In Cognitive Science, 2:693–704.
Tang, Y. and Eliasmith, C. (2010). Deep networks for robust visual recognition. In
Fürnkranz, J. and Joachims, T., editors, Proceedings of the 27th International
Conference on Machine Learning.
van Gelder, T. and Port, R. (1995). It’s about time: An overview of the dynami-
cal approach to cognition. Mind as motion: Explorations in the dynamics of
cognition. MIT Press, Cambridge, MA.
Varela, J. A., Sen, K., Gibson, J., Fost, J., Abbott, L. F., and Nelson, S. B. (1997).
A Quantitative Description of Short-Term Plasticity at Excitatory Synapses in
Layer 2/3 of Rat Primary Visual Cortex. J. Neurosci., 17(20):7926–7940.
Vigneau, F., Caissie, A., and Bors, D. (2006). Eye-movement analysis demon-
strates strategic influences on intelligence. Intelligence, 34(3):261–272.
West, L. J., Pierce, C. M., and Thomas, W. (1962). Lysergic Acide Diethylamides:
Its effects ona male asiatic elephant. Science, 138:1100–1103.
Whitlock, J. R., Heynen, A. J., Shuler, M. G., and Bear, M. F. (2006). Learning
induces long-term potentiation in the hippocampus. Science (New York, N.Y.),
313(5790):1093–7.
Wolfrum, P. and von der Malsburg, C. (2007). What is the optimal architecture
for visual information routing? Neural computation, 19(12):3293–309.
Womelsdorf, T., Anton-Erxleben, K., Pieper, F., and Treue, S. (2006). Dynamic
shifts of visual receptive fields in cortical area MT by spatial attention. Nature
neuroscience, 9(9):1156–60.
Womelsdorf, T., Anton-Erxleben, K., and Treue, S. (2008). Receptive field shift
and shrinkage in macaque middle temporal area through attentional gain mod-
ulation. The Journal of neuroscience : the official journal of the Society for
Neuroscience, 28(36):8934–44.
Zipser, D., Kehoe, B., Littlewort, G., and Fuster, J. (1993). A spiking network
model of short-term active memory. J. Neurosci., 13(8):3406–3420.