0% found this document useful (0 votes)
472 views224 pages

Clark, A - Microcognition

alo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
472 views224 pages

Clark, A - Microcognition

alo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 224

This excerpt from

Microcognition.
Andy Clark.
1991 The MIT Press.

is provided in screen-viewable form for personal use only by members


of MIT CogNet.

Unauthorized use or dissemination of this information is expressly


forbidden.

If you have any questions about this material, please contact


[email protected].
Acknowledgments

: Neil
Specialthanksare due to the following people (in no particularorder)
Tennant and Donald Campbellfor showing me what a biological perspective
was all about; Martin Davies, Barry Smith, and Michael Morris for
for
reminding me what a philosophicalissuemight look like; Aaron Sloman
-
encouragingan ecumenicalapproach to conventional symbol processing
AI ; Margaret Boden for making the School of Cognitive and Computing
Sciencespossiblein the Ant place; my father, JamesHendersonClark, and
my mother, Christine dark , for making me possible in the Ant place;
my father and Karin Merrick for their saintly patience with the typing ;
H. Stanton and A . Thwaits for help with the many mysterious details of
Adam Sharp
publishing a book; the CogneticsSociety at Sussex(especially
and Paul Booth) for some of the graphics; and LesleyBenjaminfor invaluable
help with the problem of meaningin all its many forms.
I would like to thank the copyright owners and authors for permission
to use the following figures. Figures 4.3 and 4.4 are from S. J. Gould and "
"
R. Lewontin, The Spandrelsof SanMarco and the PanglossianParadigm,
Proceedings of the Royal Societyof London , SeriesB, 205 (1979) no. 1161;
-
582 583. Reproducedby permission of the authorsand the Royal Society.
Table 5.1 and figures 5.1, 5.3, 5.4 , 5.5, and 5.7 are from J. McClelland,
D. Rumelhart, and the POP ResearchGroup, Parallel DistributedProcessing :
0
Explorationsin the Microstructure / Cognition, vols. 1 and 2 (Cambridge:
MIT / Press, 1986). Reproducedby permission of the authors and the MIT
Press.
Partsof chapters3 and 4 are basedon the following articlesof mine that
have appearedin scholarlyjournals. Thanksto the editors for permissionto
use this material.
"From Folk- "
Psychology to Cognitive Science , CognitiveScienceII ,
no. 2 (1987): 139- 154.
" " -
A Biological Metaphor, Mind and LanguageI , no. 1 (1986): 45 64.
"The " 4 1987):
Kludge in the Machine, Mind and Language2, no. (
277- 300.
xiv Acknowledgments

Adapted versionsof parts of chapters7 and 9 are due to appearas:


' ' -Smith, eds., Phi-
~ DP or Not PDPr in S. Torrance and R. Spencer
losophyandComputation( Norwood, N .J.: Ablex).
" "
Connectionismand the Multiplicity of Mind , in AI Review , special
issueon connectionistmodels.
Preface

The subtitle is enough to put anyone off : Philosophy , CognitiveScience , and


ParallelDistributedProcessing . It might have read: the elusive, the ill -defined,
and the uncharted. For all that, the project thrust itself upon me with an unusual
senseof its own urgency. Paralleldistributed processing(POP) is an
exciting and provocative new movement within cognitive science.It offers
nothing lessthan a new kind of computationalmodel of mind. The bad news
is that it as yet offers nothing morethan a hint of the nature and power of
suchmodels. But the hints themselvesare remarkableand have the potential,
I believe, to reshapeboth artiAcialintelligence(AI ) and much of the philoso-
phy of mind. In particular, they offer a new picture of the relation between
sentencesascribingthoughts and the in-the-head computationalstructures
subservingintelligent action. The final product of such reshapingwill not
be found in this work. At best, I offer someviews on the central points of
contrast between the old shapeand the new, a personalview on most of
the major issues, and along the way, a reasonably detailed taxonomy of
features, distinctions, and subprojects. The taxonomy, though necessarily
individualistic, may be of some use in future discussionsof what is, in
effect, a whole new topic for philosophy and AI . The conclusionsare often
provisional, as beAts discussionsof an approachthat is still in its infancy.
By the time this book seesprint , there will be many new and relevant
developments. I hope the book will provide at least a &amework in which
to locate them.
The readershould be warned of my peculiarcircumstances . I am Arst of
all a philosopher, with only a secondaryknowledge of AI and evolutionary
biology . I am fortunate to work in the highly interdisciplinary School of
Cognitive and Computing Sciencesat SussexUniversity. It is only thanks
to that harsh selectiveenvironment that I have been able to avoid many
glaring misunderstandings . Those that remain are, in the time-honored
clause, entirely my own responsibility. Adaptation, even to an environment
of AI researchersand cognitive scientists, falls somewhatshort of an
optimizing process.
Introduction
'
What the. Brain s-Eye View Tells
'
the Mind s-Eye View

1 After the Go/drush


' for
Cognitive scienceis the goldrush of the mind. Everybody s searching
it. Worse still, everybody (symbolic AI workers , subsymbolic AI workers ,
neuroscientists , naturalistic philosophers , and so forth ) claims to be finding
it. Or at least, they claim to know where to look.
For all that, I believe the place of mind in cognitive scienceis highly
tied up with
problematic. In many ways the very idea of mind is intimately
of
the apparatus propositional attitude ascription : the use of such sentences
' "
as "Mary believes that Goldbachs Conjectureis true to describemental
states. The use of this kind of apparatus("sentential ascriptions of belief,
" -
desire, and the like) is often denigratedas folk psychology. Its denigra
tors believe that a good computationalaccount of mental processingwill
not involve any neat analoguesto the concepts and relations visible in
-
daily talk. Its proponentsbelieve that much cognitive activity is mechanis
tically impossible without such analogues(internal tokens, syntactically -
identified). The two campsare often also divided in their choice of com
such as Fodor )
putational architecture. Those who favor folk psychology (
seeconventionalsymbol-processingAI asthe obvious option. Thoseagainst
folk psychology (such as Churchland) tend to favor connectionism or
Dutch, be patient.)
paralleldistributed processing. (If all this is double
This kind of dispute, I shall argue, is deeply muddled. It is muddled
-
because(1) folk psychology does not seek to model computational pro
in -the -head analogues
cesses , and its dignity does not depend on there being
to the propositional attitudes, and (2) paralleldistributed processing
and conventional approaches to mental modeling need not be uniformly
:
regardedas competingparadigmsof cognitive architecture cognitive psychology
may require many kinds of computational models for different
to establish these two
purposes. This book is Jargelyan attempt propositions
. Along the way very substantialattention is given to laying out the
-
theoretically interesting differencesbetween conventional AI and connec
tionist AI (paralleldistributed processing) . In the remainder of the introduction
I shall sketchthe main lines that the discussionwill follow .
2 Introduction

2 ParallelDistributedProcessing AI
and Conventional
Parallel distributed processingnamesa broad class of AI models. These
modelsdependon networks of richly interconnectedprocessingunits that
are individually very simple. The network storesdata in the subtly orchestrated
morass of connectivity. Some units are connected to others by
excitatory links, so that the activation of one will increasethe likelihood
that the other is activated. Some are inhibitorily linked. Some may be
neutral. The overall systemturns out to be an impressivepattern completer
that is capable of being tuned by powerful learning algorithms. Many
useful properties seemto come easily with such a setup. Taken together,
these allow such systems to representdata in an economicalyet highly
flexible way . A certain classof work in conventionalAI I shall call semantically
. A model will count as semanticallytransparentif and only
transparent
if it involves computational operations on syntactically specifiedinternal
statesthat (1) can be interpreted as standingfor the conceptsand relations
" "" "" "" "
spokenof in natural language(suchitems as ball , cat, loves, equals,
and so on) and (2) theseinternal tokens recur whenever the system is in a
state properly describedby content ascriptions employing those words:
the token is, as we shall say, projectibleto future cases . ( Note that such
statesneed not be localizablewithin the machine. The point is rather that
we can make senseof the systemas operating accordingto computational
rules on entities of that grain.) In short, a systemis semanticallytransparent
if there is a neat mapping between states that are computationally trans-
fonned and semanticallyinterpretable bits of sentences . A great deal of
work in conventionalAI (but not all) is semanticallytransparent. Work in a
highly distributed, connectionistparad~gm is not. And therein, I shall argue,
lies a philosophically interesting difference.

3 TheMultiplicity of Mind
Some POP theorists argue that conventional AI models are at best good
approximations to the deep truth revealedby connectionism. Some conventional
theorists argue that connectionismdisplays at best a new way
of implementing the insights contained in more traditional models. Both
campsare thus endorsingwhat I shall call the uniformity assumption. This
statesthat a single relation will obtain between connectionistand conventional
models for every classand aspectof mentality studied by cognitive
psychology.
The uniformity assumptionis, I believe, distortive and unhelpful along a
number of dimensions. Most straightforwardly, it is distortive if (as seems
likely) the mind is best understood in terms of a multiplicity of virtual
machines , someof which are adaptedto symbol processingtasksand some
What the brain tells the mind 3

of which are adapted for subsymbolic processing. For many tasks, our
everyday performancemay involve the cooperativeactivity of a variety of
suchmachines.Many connectionistsare now sympatheticto sucha vision.
Thus, Smolensky(1988) introducesa virtual machinethat he calls the conscious
rule interpreter. This is, in effect, a POP systemarrangedto simulate
the computational activity of a computer running a conventional (and
semanticallytransparent) program.
Less straightforward but perhaps equally important is what might be
termed the multiplicity of explanation . This will be a little harder to tease
out in summary form. The general idea is that even in caseswhere the
underlying computationalform is genuinelyconnectionist, there will remain
a need for higher levels of analysis of such systems. We " will need to
display, for example, what it is that a number of differentconnectionist
networks, all of "which have learned to solve a certain classof problems,
have in common. Finding and exhibiting the commonalitiesthat underpin
important psychological generalizationsis, in a sense, the whole point of
doing cognitive science. And it may be that in order to exhibit such
commonalitieswe shall need to advert to the kinds of analysisfound in
symbolic (nonconnectionist) AI .

'
4 TheMind s-EyeViewand theBrain's-EyeView
AI that involves conventional, semanticallytransparentsymbol processing
is to be identified with what I am calling the mind's- eye view. The mind' s-
eye view generatesmodels basedon our intuitive ideasabout the kind of
semanticobject over which computational operations need to be defined.
They survey the nature of human thought from within normal human
experienceand set out to model its striking features. The models produced
dependon encoding and manipulatingtranslationsof the symbol strings pf
ordinary language. For some explanatory projects, I shall argue, such an
approachmay indeedbe both correct and necessary . But for others it looks
severely 'limited.
The mind s-eye approachwas prominent in the late sixties and throughout
the seventies. It is characterizedby the tasksit selectsfor study and the
forms of the computational approachit favors. The tasks are what I shall
term " recent achievements ." " Recent" here has both an evolutionary and a
developmentalsense. In essence , the tasks focusedon are those we intui-
tively considerto be striking and interestingcognitive achievements . These
include chessplaying (and game playing in general), story understanding,
consciousplanning and problem solving, cryparithmeticpuzzles, and scientific
creativity. Striking achievementsindeed. And programs were devised
that did quite well at such individual tasks. Chess-playing programs performed
at closeto the world-classlevel; a scientificcreativity program was
4 Introduction

'
able to rediscoverone of Kepler' s laws and Ohm s law; planning programs
learnedto mix and match old successfulstrategiesto meet new demands;
new data structures enabled a computer to answer questions about the
unstatedimplicationsof stories; cryparithmeticprogramscould far outpace
you or me. But something seemedto be missing. The programmedcomputers
lacked the smell of anything like real intelligence. They were rigid
and brittle, capableof doing only a few tasksinterestingly well.
This approach(which was not universal) may have erred by placing too
'
much faith in the mind s own view of the mind. The entities that found
their way into such models would be a fairly direct translation of our
standardexpressionsof belief and desireultimately into some appropriate
'
machinecode. But why supposethat the mind s natural way of understanding
its own and others mental states by using sentential attributions of
beliefs, desires, fears, and so on should provide a powerful model on which
to base a scientific theory of mind? Why suppose, that is, that the com-
putational substrateof most thought bears some formal resemblanceto
ordinary talk of the mind? Such talk is an evolutionarily recent development
, gearedno doubt to smoothing our daily socialinteractions. There is
no obvious pressurehere for an accurateaccount of the computational
structure underlying the behavior that such talk so adequatelydescribes
and (in a very real sense) explains.
'
Part 1 of the book examinesthe mind s-eye approach, associatingit with
a commitment to semanticallytransparentprogramming. It looks at some
standardphilosophicalcriticismsof the approach(mistakenly identified by
many philosopherswith an AI approachin general) and also raisessome
further worries of an evolutionary and biological nature. In part 2 our
" '
attention is focusedon the POP alternative, which I call the brain s-eye
"
view. The label refers to the brainlike structure of connectionistarchitectures
. Such architecturesare neurally inspired. The neural networks found
in slugs, hamsters, monkeys, and humans are likewise vast parallel networks
of richly interconnected, but relatively slow and simple, processors .
The relative slowness of the individual processors is offset by having them
work in a cooperativeparallelismon the task at hand. A standardanalogy
is with the way a film of soapsettleswhen stretchedacrossa loop (like the
'
well-known childrens toy ). Each soap molecule is affected only by its
immediateneighbors. At the edgestheir position is determinedby the loop
(the system input). The affect of this input is propagated by a spreading
seriesof local interactions until a global order is achieved. The soap film
settlesinto a stable configuration acrossthe loop. In a POP systemat this
point we say the network hasrelaxedinto a solution to the global problem.
In computing, this kind of cooperativeparallelismproves most usefulwhen
the task involves the simultaneoussatisfactionof a large number of small,
"
or " soft, constraints. In such cases( biological vision and sensorimotor
What the brain tells the mind 5

control are prime examples) the use of a parallel cooperative architecture


can make tractable tasks that (with such slow processors ) we could not
otherwise completein the time available.
Suchapproaches are most obviously useful for suchevolutionarily basic
tasks as vision and sensorimotor control. Such computational achievements
are by no meansas intuitively central to cognition as, say, chess
playing. Nonetheless, hamsters, slugs, and tortoises all senseand move,
and they dependon parallel neural networks to enablethem to do so. The
secretof real intelligencemay be revealedby the lessonssuchhumblecreatures
are able to teach us. For it may be that the flexibility and common
senseassociatedwith human cognitive achievementsis the result of an
underlying computationalform chosenby natural selectionpreciselyfor its
ability to help solve suchevolutionarily basicproblems.
In examining this conjecture, we cannot afford to ignore the obvious
differencesbetween human cognition and, say, the cognitive skills of a
hamster. Humansdo perform feats of complex, logical reasoning. How is it
done? In the closing chaptersof the book I examine the conjecture that
suchreasoningdependson our coming to simulatethe serial, manipulative
capacitiesof a more conventional computer.

5 TheFateof theFolk
And where does all that leave folk psychology? The position I adopt
explicitly rejectswhat I call the syntadic challenge. The syntadic challenge
demandsthat if beliefs and desiresare real and causebehavior, there must
be neat, in-the-headsyntactic analoguesto the semanticexpressionsin sentences
ascribing them. I in general deny this to be the case. Instead, I see
belief and desire talk to be a holistic net thrown acrossa body of the
behavior of an embodiedbeing acting in the world. The net makessenseof
the behavior by giving beliefs and desiresas causesof actions. But this in
no way dependson there being computational brain operations targeted
on syntactic items having the semanticsof the words usedin the sentences
ascribingthe beliefs. SemanticallytransparentAI posits a neat reductionist
mapping between thoughts so ascribed and computational brain states.
That is, thoughts (as ascribedusing propositional-attitude talk) map onto
computationaloperationson syntactic strings whose parts have theseman-
tics of the parts of the sentencesused to expressthe attitudes. The picture
I propose looks like this: thoughts (as ascribed using propositional attitudes
talk) are holistically ascribedon the basisof bodies of behavior. Individual
items of behavior are causedby computationalbrain operationson
syntactic items, which may not (and typically will not) be semantically
transparent. In my model a thought is typically not identical with any
computationalbrain operation on syntactically identi6~d entities, although,
6 Introduction

of course, there is a systematicrelation betweenbrain eventsand behaviors


usefully carvedup by ascriptionsof beliefsand desires.

6 Threadsto Follow
Here are some threads to follow if your interest lies in a particular topic
among those just mentioned. Conventional, semantically transparentAI
is treated in chapter I , sections 2 to 5; chapter 2, section 4; chapter 4,
section 5; chapter 7, section 6; chapter 8, sections2 and 4 to 8; chapter 9,
sections 3, 5, and 6; chapter 10, section 4, and the epilogue. Parallel
distributed processingis covered in chapter 5, sections 1 to 7; chapter 6,
sections1 to 8; chapter 7, sections1 to 7; chapter 9, sections1 to 7; and
chapter 10, sections 2 to 5. Mixed models (POP and simulated conventional
systems) are takenup in chapter7, sections1 to 7; chapter9, sections
1 to 7 (especially 9.6); and chapter 10, section 4. Folk psychology and
thought are discussedin chapter 3, sections 1 to 9; chapter 4, section 5;
chapter 7, section 6; chapter 8, sections1 to 9; and chapter 10, section 4.
Biology, evolutionary theory, and computational models are discussedin
chapter3, section6; chapter 4, sections1 to 6; and chapter5, section6.
The main POP models used for discussionand criticism are the Jetsand
the Sharks(chap. 5, sec. 3), emergent schemata(chap. 5, sec. 4), memory
(chap. 5, sec. 5), sentenceprocessing(chap. 6, secs. 2 and 3), past tense
acquisition(chap. 9, secs. 2 and 3).
Chapter 1
ClassicalCognitivism

1 Cognitivism, Life, and Pasta

Cognitivism, like life and pasta, comesin a bewildeiing variety of forms.


Philosophers , psychologists, and AI researchersall use the term. For some
it is a term of abuse; for others, one of endearment. Like many pseudotechnical
terms, its chameleonproperties are often used as a convenient
antidote to criticism. I therefore propose to use insteadthe term " classical
" " "
cognitivism (and later conventional AI ). These terms are to signify a
conception of mind and computational modeling associatedwith Newell
and Simon's (1976) hypothesis of a physical symbol system and, more
generally, with what I call semanticallytransparentsystems, more on both
of which below. The point of gracing theseparticular approaches with the
"
name classicalcognitivism" is simply that in a reasonably preciseway it
capturesthe view of the relation betweenmind and computationalmodeling
that has strongly (indeed, almost exclusively) informed philosophical
reactions- both pro and con- to the emerging discipline of cognitive
science.! This is despite its far from universal acceptancein the AI and
2
cognitive sciencecommunities. One goal of this book is to chart someof
the limitations of classicalcognitivism but to do so without arguing for its
total bankruptcy. The justi6cation of sucha judgment is, however, a fairly
long and involved project that requires a treatment of alternative styles
of computational theorizing, a discussionof the role of so- called virtual
machines , and a separationof the requirementsof psychologicalexplanation
from those of the implementationof mindlike qualitiesin a computer.
The present chapter, then, seeksonly to outline the classicalcognitivist
stance, to sketch an associatedmethodology, and to indulge in a little
innocent namedropping.

2 Turing , Newell
, McCarthy , andSimon
Thebiggerthenames , thehardertheydrop. Thesewoulddentthekindsof
Roorsthat supportedancientmainframes . It would be fair to say that
Turing made AI conceivable , andMcCarthy(alongwith Minsky, Newell,
10 1
Chapter

and Simon) made it possible. Despite occasionalpronouncementsto the


contrary, I think we are still waiting to seeit madeactual, but more on that
in due course.
'
Turing s (1937) achievementwas to formalizethe notion of computation
itself, using the theoretical device we call a Turing machine. He thereby
paved the way for mathematicalinvestigations of computability. But significantly
'
, Turing s formalization also (1) encompasseda whole classof
mechanismsgrouped together not by details of actual physical composition
but by their formal properties of symbol manipulation, (2) showed
how such mechanismscould tackle any sufficiently well specifiedproblem
that would normally require human intelligence to solve, and (3) showed
how to define a special kind of Turing machine (the universal Turing
machine), which could imitate any other Turing machineand thus perform
any cognitive task that any other ' Turing machinecould perform. I shall not
review the details of Turing s demonstrationshere.3 For present purposes
'
what matters is that Turing s ideassuggestedthe notion of machinesthat,
by their formal structure, imitate (and even emulate) the mind. The material
stuff (valves, silicon, or whatever) did not matter; the formal properties
guaranteedin principle a capacity to perform any sufficiently well specified
cognitive task. In the words of a major figure in contemporary cognitive
science,
'
Turing s work can be seenas the first study of cognitive activity fully
abstractedin principle Horn both biological and phenomenological
foundations. . . . It representsthe emergenceof a new level of analysis,
independentof physics yet mechanisticin spirit. It makespossiblea
scienceof structureand function divorced Horn material substance ....
Becauseit speaksthe languageof mental structuresand internal pro-
cesses, it can answer questionstraditionally posed by psychologists.
(Pylyshyn, 1986, 68)
Classicalcognitivism, thanks to the work of Turing, was then on the
cards. It was some time, however, before classicalcognitivism could develop
into a viable, experimental discipline. That development required
first the arrival of the general-purpose digital computer and second the
availability of a powerful and flexible high-level programming language.
John von Neumann provided the practical design, and John McCarthy,
around 1960, provided a language. The languagewas called LISP, which
stood for list processing, and it made possible the first sustainedrun of
researchand development within the classical -cognitivist paradigm.4 This
run of researchand development becametheoretically self-consciousand
'
articulate with A . Newell and H. Simons abstraction of the notion of a
physical symbol system.
Classicalcognitivism 11

-Symbol
3 ThePhysical -SystemHypothesis
-
A physicalsymbolsystem, accordingto NewellandSimon(1976, 40 42), is
systemsmeetingthe
anymemberof a generalclassof physicallyrealizable
following :
conditions
(1) It containsa set of symbols, which are physical patterns that can
be strung together to yield a structure(or expression).
-
(2) It containsa multitude of suchsymbol structuresand a set of pro
cessesthat operate on them (creating, modifying , reproducing, and
destroying them accordingto instructions, themselvescoded as symbol
structures).
(3) It is locatedin a wider world of real objectsand may be related to
that world by designation(in which the behavior of the systemaffects
or is otherwise consistently related to the behavior or state of the
object) or interpretation(in which expressionsin the systemdesignate
a process, and when the expressionoccurs, the systemis able to carry
out the process).
In effect, a physical symbol system is any system in which suitably manipulable
tokenscanbe assignedarbitrary meaningsand, by meansof careful
to some
programming, can be relied on to behave in ways consistent (-
specifieddegree) with this projectedsemantic content . Any generalpurpose
computer constitutes such a system . What , though , is the relation between
such systemsand the phenomenaof mind (hoping, fearing, knowing, believing
and Simon are
, planning, seeing, recognising, and so on)? Newell
commendably explicit once again. Such ability an to manipulatesymbols,
of
they suggest, is the scientificessence thought and intelligence, much as
-
H2O is the scientific" essenceof water. According to the physical-symbol
and sufficient condition for a
system hypothesis, the necessary physical
system"to exhibit generalintelligent action is that it be a physical symbol
system, Newell and Simon thus claim that any generally intelligent physical
systemwill be a physical symbol system (the necessityclaim) and that
"
any physical symbol system can be organisedfurther to exhibit general
" the
intelligent action (' sufficiency " ). And general, intelligent action, on
claim
Newell and Simons gloss, implies the samescopeof intelligence seenin
"
humanaction.
It is important to be as clear as possible about the precise nature of
'
Newell and Simons claim. As they themselvespoint out (1976, 42), there
is a weak (and incorrect) reading of their ideas that assertssimply that a
of
physical symbol system is (or can be) a universal machinecapable any-
-
well specifiedcomputation, that the essence of intelligence lies in computa
tio~ and that intelligencecould thereforebe realizedby a universalmachine
(and henceby a physical symbol system). The trouble with this reading is
12 1
Chapter

that by leaving the nature of the computationsinvolved so unspedfied, it


assertsrather too little to be of immediatepsychologicalinterest. Newell and
"
Simon rather intend the physical-symbol-system hypothesis as a spedfic
"
architeduralassertionabout the natureof intelligent systems (1976, 42, my
emphasis ). It is fair, if a little blunt, to render this spedfic architectural
assertionas follows.
Thestrong-physical-symbol-system(SPSS . A virtual machine
) hypothesis
-
engaging in the von Neumann style manipulation of standardsymbolic
atoms has the direct and necessaryand suffident means for
generalintelligent action.
It will be necessaryto say a little about the terms of this hypothesis and
then to justify its ascription to Newell and Simon.
" "
About the terms, I note the following . A virtual machineis a machine
that owes its existencesolely to a program that runs (perhapswith other
intervening stages) on a real, physical machineand causesit to imitate the
usually more complex machineto which we addressour instructions (see,
for example Sloman 1984). Such high-level programming languagesas
LISP, PROLOG, and POP11 thus define virtual machines.And a universal
Turing machine, when it simulatesa spedal-purposeTuring machine, may
be treatedas a virtual version of the spedal-purposemachine.
"V "
on Neumann- style manipulation is meant to suggest the use of
certain basic manipulatory operations easily provided in a V on Neumann
machinerunning a high-level languagelike LISP. Such operations would
include assigningsymbols, binding variables, copying, reading and amending
symbol strings, basic syntactic, pattern-matching operations (more on
which later), and so on. Connectionistprocessing , as we shall see, involves
a radically different repertoire of primitive operations.
"
The next phrase to consider is standard symbolic atoms." This highlights
what kinds of entities the SPSSapproachdefines its computational
operationsto apply to. They are to apply to symbolic expressionswhose
parts (atoms) are capableof being given an exact semanticinterpretation in
terms of the conceptsand relations familiar to us in daily, or at any rate
" " " " " "
public , language. Theseare words (atoms) such as table, ball , loves,
" " " "
orbit, electron, and so forth. Some styles of connectionismand many
more conventionalmodels(e.g ., those of computationallinguistics) involve
a radical deparaturefrom the use of standardsymbolic atoms. Since this
contrast looms quite large in what follows, it will be expandedupon in
section5 below.
" directand
Finally, the locution necessaryand suffid.ent meansfor general
" -
intelligent action is intended to capture a claim of architecturalsuffi
as
dency. In effect, the claim is that a strong physical symbol system, just
defined, will be capableof genuine intelligent action. That is, such a ma-
dassicalcognitivism 13

chinecould be truly intelligent quite independentlyof any particularunderlying


architectures(any other real or virtual machineson which it is built ),
and conversely, it could be so without simulating any other architecturesor
machines. The SPSShypothesis thus makesa highly specificand laudably
Popperianclaim.
What evidenceis there to associatesucha claim with Newell and Simon?
Quite a lot. Someof the evidencecomesin the form of reasonably explicit
assertions . Somecanbe inferred from the details of their actualwork in AI .
And some (for what it is worth ) can be found in the opinions of other
commentatorsand critics. A brief review of some of this evidencefollows.
It is perhapsworth noting that even if Newell and Simonwere to deny any
commitment to the SPSShypothesis, the formulation would still serve our
'
purposes, since something like that hypothesis informs the philosophers
view of artificial intelligence (see chapter 2) and, without doubt, still informs
(perhapsimconsciously) a great deal of work within AI itself.

4 BringinghometheBACON
Still, a little evidencenever goes amiss. For a start, we find the following
commentsandwichedbetweenNewell and Simon's outline of the nature of
a physical symbol system and their explicit statement of the hypothesis:
"
The type of systemwe have just defined. . . bearsa strong family resemblance
to all general purpose computers. If a symbol-manipulation language
, suchas LISP, is taken as defining a machine, then the kinship becomes
"
truly brotherly ( Newell and Simon 1976, 41). Douglas Hofstadter (1985,
646, 664), who takes issue with the idea that baroque manipulations of
standard LISP atoms could constitute the essenceof intelligence and
thought, is happy to ascribejust 'that view to Newell and Simon.
Moreover, Newell and Simons own practicedoes seemto bear suchan
ascription out. Thus, all their work, from the early GeneralProblem Solver
(1963) to their more recent work on production systems" and on automating
scientificcreativity, hasbeen guided by the notion of serialheuristic
searchbased on protocols, notebook records, and observation of human
subjects. (Heuristic searchis a meansof avoiding the expensiveand often
practically impossible systematic search of an entire problem space by
using rules of thumb to lead you quickly to the areain which with a little
luck the solution is to be found.) For our purposes, the things that most
significantly characterizethis work (and much other work in contemporary
AI besides- see, for example, the AM program mentionedbelow)
are its reliance on a serial application of rules or heuristics, the rather
high-level, consciously introspedible grain of most of the heuristics involved
, and the natureof the chosen-task domains. I shall try to makethese
points clearerby looking at the exampleof BACON and some of its suc-
14 1
Chapter

cessors , a seriesof programsthat aim to simulateand explain the processof


scienti6cdiscovery (Langley 1979; Simon 1979; Simon 1987; Langley, et al.
1987).
BACON sets out to induce scienHflc laws Horn bodies of data. It takes
observationsof the values of variablesand searches for functions relating
the valuesof the different variables. Along the way it may introduce new
variablesstandingfor the ratio of the value of the original variables. When
it finds an invariant, a constantrelation of the valuesof different variables,
it has(in somesense) discovereda scienti6claw. Thus, by following simple
heuristicsof the kind a person might use to seekrelations among the data
" " "
( try the simple relations First, treat nonconstantproducts of ratios between
"
variablesasnew variables, etc.), BACON was able to generateHorn
" ratios of successive '
Kepler's data [ ] powers of the radii of the planets
orbits to [successive ] powers of their periods of revolution, arriving at the
invariant D3/ p2 (Kepler' s third law), after a searchof a small number of
"
possibilities, (Simon 1979, 1088). Similarly, BACON arrived at Ohms law
by noticing that the product of electricalcurrentand resistanceis a constant '
.
Now for a few commentson BACON (the point of someof thesewon t
be clear until subsequentchapters, but patienceis a virtue). First, BACON
" "
makesits discoveries by working on data presentedin notational fonnats
(e.g ., measuresof resistance , periods of planetaryrevolution) that represent
the &uits of centuriesof human labor. Manipulating these representations
could be the tip of the iceberg; creating them and understandingthem may
constitute the unseenbulk. I say a little more about this in chapters6 and
7. For now, simply note that BACON and other programs like AM and
EURISKO6and MYCIN (below) help themselvesto our high-level representation
formalism. In a recentwork Langley et al. (1987, 326) are sensitive
to the problem of creating new representationalfonnalisms. But they
insist that such problems can be tackled within the architecturalparadigm
associatedwith the SPSShypothesis.
Secondand relatedly, the knowledge and heuristics that BACON deploys
are coded rather directly Horn the level of thought at which we consciously
'
introspect about our own thinking. This is evident Horn Simons
(1987) statement that he relies heavily on human protocols, laboratory
notebooks, etc. BACON thus simulates, in effect, the way we reasonwhen
we are consciousof trying to solve a problem and it useskinds of heuristics
that with some effort we might explicitly fonnulate and use as actual,
practicalrules of thumb. In chapters5 to 9 I shall conjecturethat this kind
of thought is a recent overlay on more primitive, instantaneousprocesses
and that though modeling such thought may constitute a psychological
theory of such conscious reasoning, it could not serve on its own to
instantiateany understandingwhatsoever. This level of modeling is common
to much but not all contemporary work in AI , including work in
Cassical cognitivism 15

"
expert systemsand qualitative
" reasoning(see, e.g ., the section Reasoning
about the PhysicalWorld in Hallam and Mellish 1987). Thus the MYCIN
rule (Shortliffe 1976) for blood injections reads: If (1) the site of the culture
is blood, (2) the gram stain of the organismis gramneg, (3) the morphology
of the organismrod, and (4) the patient is a compromisedhost, then there
is suggestiveevidencethat the identity of the organism is pseudomonas -
aeruginosa (&om Feigenbaum 1977 , 1014).
Likewise, BA CON ' s representationof data was at the level of attribute-
value pairs, with numericalvaluesfor the attributes. The general character
of the modeling is even more apparent in programs for qualitative-law
discovery: GLAUBER and STAHL. GLAUBER applies heuristic rules to
data expressedat the level of predicate-argument notation, e.g ., " reacts
"
[inputs (Hd , NH3 ) outputs ( NH4, d )], and STAHL deploys such heUristics
"
as identify components:If a is composedof b and c, and a is composed
of b and d, and neither c containsd nor d containsc, then identify c with d."
Third, BACON usesfairly slow serial search, applying its heuristicsone
at a time and assessingthe results. Insofar as BACON relies on productions
, there is an element of parallelism in the searchfor the currently
applicable rule. But only one production fires at a time, and this is the
seriality I have in mind. Serialbehavior of this kind is characteristicof slow,
consciousthought. And Hofstadter (1985, 632) reports Simon as asserting,
"
Everything of interest in cognition happensabove the 100 millisecond
level- the time it takesyou to recogniseyour mother." Hofstadterdisagrees
vehemently, assertingthat everything of interest in cognition takes place
below the l00 -millisecondlevel. My position, outlined in detail in chapters
5 to 9 below, is sympatheticto Hofstadter's (and indeed owes a great deal
to it ). But I will show that the notion of a dispute over the correct level
of interest here is misplaced. There are various explanatory projects here,
all legitimate. Some require us to go below the lOO-millisecond level (or
whatever) while others do not. This relates, in a way I expand on later, to
a problem areacited by Simon in a recent lecture (1987; seealso Langley,
et ale1987, 14- 16). Simon notes that programslike BACON are not good
at very ill structuredtasks, tasksdemandinga great deal of generalknowledge
and expectations. Thus, though BACON neatly arrives at Kepler's
third law when given the well-structured task of finding the invariants in
the data, it could not come up with the flash of insight by which Fleming
could both see that the mould on his petri dish was killing surrounding
bacteriaand recognizethis as an unusualand potentially interesting event.
Ustening to Simon, one gets the impressionthat he believesthe way to
solve these ill -structured problems is to expandthe set of high-level data
and heuristics that a system manipulatesin the normal, slow, serial way
(i.e., by creating, modifying , and comparing high-level symbol strings
accordingto stored rules). Thus, in a recent coauthored book he dismisses
16 1
Chapter

the idea that the processes involved in the flash-of-insight type of discovery
might be radically different in computational kind, saying that
"
the speedand subconsciousnature of such events does not in any way
imply that the processis fundamentally different from other processes of
discovery- only that we must seekfor other sourcesof evidence
"
about its
nature (i.e., subjects' introspections can no longer help) (Langley et al.
1987, 329).
The position I develop holds rather that the folk-psychological term
" scientific "
discovery encompass es at least two quite different kinds of
processes. One is a steady, V on Neumann-style manipulation of standard
symbolic atoms in a searchfor' patterns of regularity. And this is well
modeled in Simon and Langleys work. The other is the flash-of-insight
type of recognition of somethingunusualand interesting. And this, I shall
suggest, may require modeling by a method quite different (though still
computational).
In effect, theorists such as Langley, Simon, Bradshaw, and Zytkow are
betting that all the aspectsof human thought will turn out to be dependent
on a single kind of computationalarchitecture. That is an architecture
in which data is manipulated by the copying, reorganizing, and pattern
matchingcapabilitiesdeployed on list structuresby a Von Neumann(serial)
processor. The basic operations made available in such a setup definethe
computational architecture it is. Thus, the pattern matching operations
which such theorists are betting on are the relatively basic ones available
.
in such cases(i.e., test for complete syntactic identity , test for syntactic
identity following variablesubstitution, and so on). Other architectures(for
example, the POP architecture discussedin part 2 of this book) provide
different basic operations. In the case of parallel distributed processing
these include a much more liberal and flexible pattern-matching capacity
able to find a best match in caseswhere the standardSPSSapproachwould
find no match at all (seeespeciallychapters6 and 7 below).
-
Langley, Simon, et al. are explicit about their belief that the symbol
processingarchitecturethey investigate has the resources to model and
explain all the aspectsof human thought. Pacedwith the worry that the '
approach taken by BACON , OAL TON , GLAUBER, and STAHL won t
suffice to explain all the psychological processes that n) ake up scientific
"
discovery, they write, Our hypothesisis that the other processesof scientific
discovery, taken one by one, have [the] samecharacter, so that programs
for discovering researchproblems, for designing experiments, for
designing instruments and for representingproblems will be describable
by meansof the" samekinds of elementaryinformation processes that are
usedin BACON (1987, 114). They makesimilar commentsconcerningthe
questionof mental imagery (p. 336). This insistenceon a single architecture
of thought may turn out to be misplaced. The alternative is to view mind
Classicalcognitivism 17

as a complex system comprising many virtual architectures. If this is true,


psychologicalexplanationwill likewiseneedto deal in a variety of types of
models, availing itself in each case of different sets of basic operations
(relative to the virtual architecture).
Finally, a word about the methodology of BACON and its classical -
cognitivist cousins . These programs characteristicallyattempt to model
fragmentsof what we might term recent human achievements . By this I
meanthey focus on tasksthat we intelligent, language-using humanbeings
perform (or at least think we perform) largely by consciousand deliberate
efforts. Suchtaskstend to be well structuredin the senseof having definite
and recognizablegoals to be achievedby deploying a limited set of tools
(e.g., games and puzzles with prescribedlegal moves, theorem proving,
medicaldiagnosis, cryparithmeticand so on). They also tend to be the tasks
we do slowly and badly in comparisonwith perceptualand sensorimotor
tasks, which we generally do quickly and fluently. Some AI workers are
dubious about this choice of task domain and believe it essentialto tackle
the fluent, unconsciousstuff first before going on to model more evolu-
tionarily recentachievements . Marr (1977, 140) gives the classicstatementof
this: " Problem-solving researchhastended to concentrateon problemsthat
we understandwell intellectually but perform poorly on. . . . I argue that
[there are] exceptionally good grounds for not studying how we carry out
suchtasksyet . I have no doubt that when we do (e.g .) mentalarithmetic we
are doing somethingwell, but it is not arithmetic and we seem far from
understanding even one component of what that something "
is. Let us
therefore concentrate on the simpler problems first. I have expressed
similar views basedon direct evolutionary arguments(Clark 1986). There
still seemsto me to be much truth in suchstrictures. But the overall picture
to be developedhere is rather more liberal, as we shall see.

5 Semantically TransparentSystems
It is time to expand on the notion of a standard symbolic atom, introduced
in section 3 above. One of the most theoretically interesting points
of contrast between classicalsystems(as understood by philosopherslike
Fodor and Pylyshyn) and connectionist systems(as understood by theorists
like Smolensky) concernsthe precisesensein which the former rely on,
and the latter eschew, the use of such symbolic atoms. To bring out what
is at issue here, I shall speakof the classicistas ( by definition) making a
methodological commitment to the construction of semantically transparent
systems. Credit for the general idea of semantictransparencybelongs
elsewhere. The analysis I offer is heavily influenced by ideas in
7
Smolensky1988 and Davies, forthcoming.
18 1
Chapter

A system will be said to be semantically transparent just in case it is


possible to describe a neat mapping between a symbolic (conceptual
'
level ) semantic description of the system s behavior and some projectible
semantic interpretation of the internally represented objectsof
its formal computational activity .
The definition is somewhat complex and is not expected to make immediate
sense. In particular , the notion of a projectible semantic interpretation
must remain as a kind of dummy until much later (chapter 6, section 3). It
should be possible , however , to make some sense of the overall drift of the
definition immediately .
The general notion of a semantically transparent system (STS) may be
best appreciated from the perspective offered by Marr ' s now -standard
account of the levels of understanding of an information -processing task.
Marr ( 1982 ) distinguish es three levels at which a machine carrying out an
information -processing task needs to be understood .

Levell , computational theory. This level describes the goal of the computation
, the general strategies for achieving it , and the constraints on
such strategies .
Level 2, representationand algorithm . This describes an algorithm , i.e., a
series of computational steps that does the job . It also includes details
of the way the inputs and outputs are to be represented to enable the
algorithm to perform the transformation.
Level 3, implement a Hon . This shows how the computation may be
given flesh (or silicon) in a real machine.
In short, level ! considerswhat function is being computed(at a high level
of abstraction), level 2 finds a way to compute it , and level .3 shows how
that way can be realizedin the physical universe.
Supposethat at level ! you describea task by using the conceptual
apparatusof public language. (This is not" compulsory at level ! but is often
"" '" "
the case.) You might usesuchwords as liquid, flow : edge, and so on.
You thus describethe function to be computed in terms proper to what
Paul Smolenskycalls the conceptual level, the level of public language.
Very roughly, a systemwill count asan STSif the computationalobjectsof
its algorithmic description (level 2) is isomorphic to its task-analytic description
couchedin conceptuallevel terms (level ! ). What this meansis
that the computationaloperationsspecifiedby the algorithm are applied to
internal representationsthat are projectibly interpretable as standing for
conceptual-level entities. (Again, clarificationof the notion of projectibility
will have to wait until chapter6.)
Someexamplesmay help to sharpentheselevels. Considerthe following
specificationsof functions to be computed.
Classicalcognitivism 19

(1) If (cup and saucer) then (cup)


If (cup and saucer) then (saucer)
(2) If (verb stem + ending) then (verb stem + -ed)
The functions in (1) are clear examplesof a conceptuallevel speci8cation.
Though (2) doesnot draw on daily language,it is nonethelessa relatedcase,
aswe shall seein subsequentchapters. In eachcasethe items in parentheses
are structuraldescriptionswhose structureis semanticallysigni8cant.
A semanticallytransparentsystem, we may now say, is one in which the
"
objects(e.g ., cup and saucer " ) of state-transition rules in the task analysis
" if "
e.
( g ., the rule (cup and saucer ) then (cup) ) have structuralanalogues in
the actualprocessingstory told at level 2. That is to say, in the caseof (I ),
the level 2 story will involve computationaloperationsdeAnedto apply to
"
representations
" sharing the complex structure of the expression cup and
saucer. In the caseof (2), the level 2 story will involve computational
operations deAned to apply to descriptions "
of input verbs in a way that
"
revealsthem to have the structure verb stem + ending. It is in this sense
that classical , semantically transparent systems may be said to have a
certain kind of syntax. For they posit mental representationsthat have
actualstructuresechoing the semanticstructuresof our level-1 description.
As Fodor and Pylyshyn (1988) forcibly point out, this is very handy if we
want our systemto perform systematicallywith respectto a certain semantic
description. For it has the effect of making the semanticdescription a
real object for the system. Hence, any inferences , etc., that are systematicin
the semanticdescriptioncaneasilybe mimickedby relying on the syntactic
properties of its internal representations . If we want our system to treat
" " " "
(cup and saucer) asan instanceof a generallogical schema (a and b) and
henceto perform all kinds of deductive inferenceson argumentsinvolving
'
cups and saucers , this will be a simple matter just so long as the systems
representationof cups and saucersis semantically transparent and preserves
structure.
Clearly, the notion of a semanticallytransparent system is intended to
'
capture the substanceof Fodor and Pylyshyn s deAnition of a classical
approachto cognitive science. Classicaland connectionistapproaches differ
, accordingto Fodor and Pylyshyn, in two vital respects.
"
(1) Classicaltheories- but not connectionisttheories- posit alanguage
of thought' ." This meansthat they posit mentalrepresentations
(data structures) with a certainform. Suchrepresentationsaresyntadi-
callystructured , i.e., they are systematicallybuilt by combining atomic
constituentsinto molecularassemblies , which (in complex cases ) make
up whole data structures in turn. In short , they posit symbolsystems
with a combinatorialsyntax and semantics .
20 1
Chapter
"
(2) In classicalmodels, the prindples by which mental states are
transformed, or by which an input selectsthe correspondingoutput,
are defined over structural properties of mental representations . Because
have combinatorialstructure, it is
classicalmental representations
possiblefor classical mental operationsto apply to them by reference
"
to their form. This meansthat if you have a certainkind of structured
representationsavailable (as demandedby point I ), it is possible to
define computationaloperationson those representationsso that the
operations are sensitive to that structure. If the structure isn' t there
'
(i.e., if there is no symbolic representation), you couldn t to it , though
you might make it look as if you had by fixing on a suitablefunction
in extension. (Quotes are Horn Fodor and Pylyshyn 1988, 12- 13.)
In short, a classical system is one that posits syntactically structured,
symbolic representations and that defines its computational operationsto
apply to such representationsin virtue of their structure.
The notion of a semanticallytransparentsystemis also meantto capture
'
the spirit of Smolenskys views on the classical/connectionist divide, as
evidencedin commentslike the following .
A symbolic model is a systemof interacting processes, all with the
sameconceptual-level semanticsas the task behavior being explained.
Adopting the terminology of Haugeland(1978), this systematic explanation
relies on a systematicreductionof the behavior that involves no
shift of semanticdomain or dimension . Thus a game-playing program
is composedof subprograms that generatepossiblemoves, evaluate
them and so on. the symbolic paradigmthesesystematicreductions
In
play the major role in explanation. The lowest level processes in the
systematic reduction, still with the original semanticsof the task
domain, are then themselvesreducedby intentionalinstantiation: they
are implementedexactly by other processes with different semantics
but the sameform. Thus a move-generation subprogramwith game
semanticsis instantiatedin a systemof programswith list-manipulating
semantics . (Smolensky1988, 11)
Before leaving the subject of STSs, it is worth pausing to be quite
explidt about one factor that is not intended as part of the definitia:n of an
STS. Under the terms of the definition an STStheorist is not committed to
any view of how the systemexplidtly representsthe rulesadducedin task
analysis (level 1). Thus, in my example(I ), there is no suggestionthat the
"
rule "If (cup and saucer) then (cup) must itselfbe explicitly representedby
the machine. A system C04ldbe an STS and be hard-wired so as to take
II II
input cup and saucer " and transform it into output cup." According to
STS theory, all that must be explicit is the structured description of the
Classicalcognitivism 21

objects to which the rule is deAnedto apply. The derivation rules may be
tacit, so long as the data structuresthey apply to are explicit. On this Fodor
and Pylyshyn rightly insist: " Classicalmachinescan be rule implicit with
respect to their programs. . . . What doesneed to be explicit in a classical
machineis not its program but the symbols that it writes on its tapes (or
stores in its registers). These, however, correspondnot to the machine's
rules of statetransition but to its data structures" (1988, 61). As an example
they point out that the grammarposited by a linguistic theory neednot be
explicitly representedin a classicalmachine. But the structuraldescriptions of
sentences over which the grammar is deAned (e.g ., in terms of verb stems,
subordinateclauses , etc.) must be. Attempts to characterizethe classical/
connectionistdivide by referenceto explicit or non explicit rules are thus
shown to be in error (seealso Davies, forthcoming).

6 Functionalism
While I am setting the stage, let me bring on a major and more straightforwardly
philosophical protagonist, the functionalist. The functionalist is
in many ways the natural bedfellow of the proponent of the physical-
symbol-systemhypothesis. For either versionof the physical-symbol-system
hypothesis claims that what is essentialto intelligence and thought is a
certaincapacityto manipulatesymbols. This puts the essenceof thought at
a level independentof the physical stuff out of which the thinking system
is constructed. Get the symbol manipulating capacitiesright and the stuff
doesnot matter. As the well-known bluesnumberhasit , " It ain' t the meat,
it 's the motion." The philosophical doctrine of functionalism echoesthis
sentiment, asserting (in a variety of forms) that mental states are to be
identified not with , say, physicochemicalstatesof a being but with more
abstractorganizational, structural, or informational properties. In Putnam's
" ' "
rousing words We could be madeof swisscheeseand it wouldn t matter
(1975, 291). Aristotle , some would have it , may have been the first philo-
sophical functionalist. Though there seemsto be a backlashnow underway
(see, e.g ., Churchland 1981 and Churchland 1986), the recent popularity
of the doctrine can be traced to the efforts of Hilary Putnam (1960,
1967), Jerry Fodor (1968), David Arm strong (1970) and, in a slightly different
vein, Daniel Dennett (1981) and William Lycan (1981). I shall not
attempt to do justice to the nuancesof thesepositions here. Instead, I shall
simply characterizethe most basicand still influential form of the doctrine,
leaving the searchfor refinements to the next chapter. First, though, a
commenton the question to which functionalismis the putative answer.
In dealing with the issuesraised in this book, it seemedto me to be
essentialto distinguish the various explanatory projects for which ideas
about the mind are put forward. This should becomeespeciallyclear in the
22 Chapter1

closing chapters. For now I note that one classicalphilosophicalproject has


beento formulate and assessschemas for a substantialtheory of the essence
of the mental. The notion of essencehere may be unpackedas the search
for the necessaryand sufficient conditions for being in some mental state.
In this restricted sensea theory of mind should tell us what it is about a
being that makesit true to assertof that being that it is in a given mental
state(e.g., believing it is about to rain, feeling sad, feeling anxious, suffering
a stabbing pain in the left toe, and so forth ).
'
For the moment let me simply assertthat Newell and Simons intended
project (in common with a lot of workers in AI ) is psychologicalexplanation
. Pendinga fuller accountof psychologicalexplanation , it is not obvious
that the project of psychologicalexplanationis identicalwith the project of
seeking the essenceof the mental in the sensejust sketched. Newell and
'
Simons talk of the physical-symbol-systemhypothesisasan accountof the
necessaryand sufficient conditions of intelligent action effectively identifies
the tasks. It follows that having a full psychological explanation in
their sensewould put you in a position to re-create or instantiate the
analyzed mental state in a machine ( barring practical difficulties). I shall
later argue for a firm distinction between these projects of psychological
explanationand psychologicalinstantiation.
Functionalism, then, is a sketch or schemaof the kind of theory that,
when filled in, will tell us in a very deep sensewhat it is to be in some
mental state. The most basic form of such a theory is known as Turing-
machine functionalism. Not surprisingly, the doctrine takes its cue from
'
Turing s conception of the formal properties sufficient to guaranteethat a
task is computableby a mechanism , regardlessof the physical stuff out of
which the mechanismwas made(seesection 2 above).
'
In Putnams hands(1960, 1967) functionalismcameto suggesta theory
of mind (in the senseof a schemafor a substantialtheory of the essenceof
the mental) that was apparentlycapableof avoiding many of the difficulties
that beset other such proposals. Very sketchily, the situation was something
like this. Dualism (the idea that mind is a ghostly kind of nonmaterial
substance ) had beendiscreditedas nonexplanatorymysticismand was briefly
displacedby behaviorism~Behaviorism(Ryle 1949) held that mental states
were identical with sets of actual and counterfactualovert behaviors and
that inner statesof the subject, though no doubt causallyimplicatedin such
behaviors, were not theoretically important to understandingwhat it is to
be in certain mental states.
This dismissalof the importance of internal states (for a philosophical
theory of mind) was resistedby the first wave of identity theories which
claimedthat mental stateswere identicalwith brain processes (Smart 1959).
But the identity theory, if one took the claims of its proponents rather
literally (more literally, I am inclined to thinl than they ever intended) lay
Classicalcognitivism 23

'
open to a variety of criticisms. Especiallyrelevant here is Putnams (1960,
1967) criticism that identity theory makes far too tight the tie between
being in a certain mental state (e.g ., feeling pain) and being in a certain
- -
physicochemicalor neural state. For on an extreme, type type identity
reading, the identity of some mental state with , say, some neural state
would seemto imply that a being incapableof being in that neural state
could not, in principle, be in the mental state in question. But for rather
obvious reasonsthis was deemedunacceptable . A creaturelacking neurons
'
would be unable to occupy any neural state. But couldn t there be exotic
'
beings made of other stuff who were nonethelesscapableof sharing our
beliefs, desires, and feelings? If we allow this seeminglysensiblepossibility
then we, as philosophers, need someaccountof what physically variously
constituted feelersand believershave in common that makesthem feelers
and believers. Behaviorismwould have done the trick, but its denial of the
importanceof inner stateshad beenperceivedas a fault. Identity theory, it
seemed , had gone too far in the other direction.
Betweenthe Scyliaand the Charybdissailedthe good ship functionalism.
What is essential to being in a certain mental state, according to the
functionalistschema , is being in a certainabstractfunctional state. And that
functional state is deAned over two components: (1) the role of some
internal statesin mediating system input and systemoutput (the behavior
element) and (2) the r:ole of the statesor processes in activating or otherwise
affecting other internal statesof the system(the inner element). If we
alsopresumethat cognition is a computationalphenomenon, then we can
link this characterization(as Putnam [1960Jdid) to the notion of a Turing
machine, which is defined by its input and output and its internal state-
transition profile. What Turing machine you are instantiating, not what
substanceyou are made of, characterizesyour mental states. As I said, it
' '
ain t the meat, it s the motions.
Now the bad news. Functionalismhashad its problems. Of specialinterest
to us will be the problem of excessiveliberalism(seeBlock 1980). The
charge is that Turing-machine functionalism allows too many kinds of
things to be possiblebelieversand thinkers. For example, it might in principle
be possibleto get the population of China to passmessages(letters,
values, whatever) betweenthemselvesso asbrieRy to realizethe functional
-
specificationof somemental state (Block 1980, 276 278). (Recall, it is only
a matter of correctly organizing inputs, outputs, and internal state transitions
'
, and these, however they are specified, won" t be tied to any particular
kind of organism.) As Block (1980, 277) puts it , In describingthe Chinese
systemas a Turing machineI have drawn the line [i.e., specifiedwhat counts
as inputs and outputsJ in sucha way that it satisfiesa certain type of functional
description- one that you also satisfy, and "
one that, according to
functionalism , justifies attributions of mentality. But, saysBlock, there is at
24 1
Chapter

leastprima faciereasonto doubt that such.a systemcould have any mental


statesat all. Could that overall system really constitute, say, an agent in
pain1Surelynot. Surelythere is nothing which it is like, either nice or nasty,
to be such a system. It has no phenomenalor subjective experience. Or
as philosophersput it, the systemhasno qualia(raw feels, real subjectivity).
Hence, Block dubs this argumentthe absent-qualia argument. I suspectthat
this argument loses much of its force once the functionalist hypothesis is
Armly disassociatedfrom both STS-and SPSS - type approaches (see chapters
2 and 10). But for the moment, I leave the discussionof functionalism
on that quizzicalnote.
The players are on stage. We have met the physical-symbol-system
hypothesis and its methodological cousin, semantically transparent cognitive
modeling. We have met a computationally inspired philosophical
model of mind (functionalism) and hinted at its difficulties. The disembodied
presenceof connectionismawaits future flesh. But we should first pauseto
seewhat philosophershave made of the story so far.
Chapter2
Situationand Substance

1 Stuff and Common Sense

Classical cognitivism was not universally celebrated by philosophers .


There was a slightly inchoate sense of mind being finessed by sleight of
hand. With the work of Dreyfus ( 1972, 1981) and later Searle (1980 ), the
inchoate became flesh. The 8esh, however , was quite different in each case.
For Dreyfus the problem with classical cognitivism was the adequacy of
current (and perhaps any ) representational formalisms to re-create and
express human common sense knowledge . For Searle the worry was a
perceived gap between the formal and the intentional , or between syntax
and meaning . The biological stuff of which we are made, Searle felt , was at
least as essential to our abilities to know and understand as any formal or
computational properties it may happen to possess. In this chapter I shall
examine these two kinds of worry , exposing a few of the threads I later
weave into a defence of cognitive sciercce.

2 TheDreyfusCase
Hubert Dreyfus hasbeenone of the most persistentyet sensitivecritics of
the cognitivist tradition. At the core of his disquiet lies the thought that
there is a richnessor " thickness" to human understandingthat cannot be
capturedin any set of declarativepropositions or rules. Instead, the richness
depends on a mixture of culture, context, upbringing, and bodily
self-awareness . Accordingly, Dreyfus turns a scepticalgaze on the early
microworlds work associatedwith Winograd (1972) and the frame- and
script-based approaches associatedwith Minsky (1974) and Schankand
Abelson (1977), among others.
'
Winograd s (1972) SHRDLU was a program that simulated a robot
acting in a smallmicroworld composedentirely of geometricsolids(blocks,
pyramids, etc.). By restricting the domain of SHRD L U's alleged competence
and providing SHRDLU with a model of the items in the domain,
Winograd was able to produce a program that could engage in a quite
sensibledialogue with a human interlocutor. SHRDLU could, for example,
26 2
Chapter

resolve problems concerning the correct referents of words like " it " and
"
the pyramid" (there were many pyramids) by deploying its knowledge of
the domain. The output of the program was relatively impressive, but its
theoretical significanceas a step on the road to modeling human understanding
was open to doubt. Could a suitableextensionof the microworld
strategy really capture the depth and richnessof human understanding?
Dreyfus thought not, since he saw no end to the number of such micro
'
competencesrequired to model even a child s understanding of a real-
world activity like bargaining.
Considerthe following example(due to Papertand Minsky and cited in
Dreyfus 1981, 166).
" ' '
Janet: That isn t a very good ball you have. Give it to me and I ll
"
give you my lollipop .
For the set of micro theories neededto understandJanet's words, Minsky
and Pape~ suggest a lengthy list of concepts that includes: time, space,
things, people, words, thoughts, talking, social relations, playing, owning,
eating, liking, living , intention, emotions, states, properties, stories, and
places. Each of these requires filling out. For example, anger is a state
causedby an insult (and other causes ) and results in noncooperation(and
other effects). And this is just one emotion. This is a daunting list and one
that, as Dreyfus points out, shows no obvious signsof completion or even
'
completability. Dreyfus thus challengesthe micro theorist s faith that some
finite and statableset of micro competencescan turn the trick and result in
a computerthat really knows about bargainingor anything else. Suchfaith,
he suggests, is groundless, as AI has failed " to produce even the hint of
a system with the flexibility of a six-month old child" (Dreyfus 1981,
173). He thinks that " the specialpurpose techniqueswhich work in context
-free, gamelikemicro-worlds may in no way resemblegeneral
purpose
humanand animal intelligence."
Similar criticisms have been applied to Winston' s (1975) work in computer
vision and Minsky 's (1974) frame-based approach to representing
everyday knowledge. A frame is a data structure dealing with astereo -
typical courseof events in a given situation. It consistsof a set of modes
and relations with slots for specific details. Thus a birthday-party frame
would consist of a rundown of a typical birthday-party-going sequence
and might refer to cakes, candles, and presentsalong the way. Armed with
sucha frame, a system would have the common senserequired to assume
that the cakehad candles, unlesstold otherwise.
But as Dreyfus is not slow to point out, the difficulties here are still
enormous. First, the frames seem unlikely ever to cover all the contingencies
that common sensecopes with so well~(Are black candleson a
birthday cake significant?) Second, there is the problem of accessingthe
Situation arid substance 27

right frame at the right time. Humanscan easily make the transition from,
say, a birthday-party frame to an accidentframeor to a marital-sceneframe.
Is this done by yet more, explicit rules? If so, how does the system know
when to apply these. How does it tell what rule is relevant and when? Do
we have a regress here? Dreyfus thinks so. To take one final example,
considerthe attempt to imbue an AI system with knowledge of what it is
for somethingto be a chair. To do so, Minsky suggests, we must choosea
set of chair-descriptionframes. But what does this involve? Presumablynot
a searchfor common physical features, sincechairs, as Dreyfus says, come
in all shapesand sizes(swivel chairs, dentist' s chairs, wheelchairs, beanbag
chairs, etc.). Minsky contemplatesfunctional pointers, e.g ., " somethingone
can sit on." But that is too broad and includesmountains and toilet seats.
And what of toy chairs, and glass chairs that are works of act? Dreyfus'
suspicionis that ,the task of formally capturing all theseinter-linked criteria
and possibilities in a set of context-free facts and rules is endless. Yet, as
Dreyfus can hardly deny, human beings accomplishthe task somehow. If
Dreyfus doubts the cognitivist explanation, he surely owes us at least a
hint of a positive account.

3 It Ain 't What You Know; It 's the Way You Know It
And we .get such an account. In fact, we get hints of what I believe to be
two distinct kinds of positive account. Critics of Dreyfus (e.g ., Torrance
[1984, 22- 23]) have tended to run these together, thereby giving the
'
impressionthat reasonabledoubts concerningone line of Dreyfuss thought
constitute a global undermining of his position. In this sectionI shall try to
be a little more sympathetic. This is not to say, however, that such critics
are wrong or lack textual evidence for their position. In early articles
Dreyfus does indeed seemto run together the two accountsI distinguish.
And even in the most recent book (Dreyfus and Dreyfus 1986), he treats
the two as being intimately bound up. My claim is that they both can and
shouldbe kept separate.
The first line of thought concernswhat I shall call " the body problem."
The body problem is neatly summedup in the following observation: "The
computer comesinto our world even more alien than would a Martian. It
doesnot have a body, needsor emotions, and it is not formed by a shared
"
languageand other social practices ( Dreyfusand Dreyfus 1986, 79). We
can seethis worry at work in the following passage , in which Dreyfus tries
'
for a positive account of what makessomething a chair: What makesan
object a chair is its function [and] its place in a total practical context. This
pre-supposescertain facts about hwnan beings (fatigue, the way the body
bends) and a network of other culturally detennined equipment (tables,
floors, lamps) and skills (eating, writing, going to conferences , giving
28 2
Chapter

lecturesetc.). . . . Moreover, understandingchairs also includessocial skills


"
suchas being able to sit appropriately. . . at dinner, interviews, deskjobs
Dreyfus (1981, 184). Suchcommentsstressthe role of the humanbody and
"
our slow sodal and cultural upbringing, our slowly acquired situated
"
understanding. Now certainly our bodies and social and cultural contexts
constitute a rich sourceof knowledge about the human world. Thus, suppose
we want a computer to parse (or assigngrammaticalcategoriesand
values) and understandthe instrudions on a bottle of shampoo. On one
"
brand in my householdtheseinclude: As with all shampoos,avoid getting
"
this shampoo into the eyes. If it does, rinse well with warm water. 1
Imagine trying to program a bodylessmachinesuccess fully to parsesucha
messagel Out of context it is not at all obvious what the secondsentence
would have us rinse. We understand the messagein part becausewe
have eyes and we know they are easily irritated and that rinsing the eyes
with water helps. This is just one example of a host of knowledge that
we are privy to simply in virtue of having the bodies and the bodily
reactions we do. The attempt to force-feed such knowledge (along with
whatever we pick up socially and so on) into a nonsocial, bodylessmachine
is almost certain to fail by falling far short of the richnessof adult human
understanding.
The difficulty is knowing exactly what theoreticalforce attachesto such
observations. For they suggesta reading of Dreyfus that seemsto saddle
him with a rather implausible critical- massargument, which might go
somethinglike this.
'
Until you know a certain amountabout something, you don t really
know anything about it at all.
This critical mass is attained only through our bodily , social and
cultural interactionswith the world.
So a computer, lacking our socially situated and embodied advantages
, can never know anything.
Alas, Dreyfus sometimesgives us good causeto ascribesome such argument
to him. Speakingof a program (see section 4 below) that seeksto
"
model knowledge about restaurantshe writes, When the waitresscameto
the table, did she wear clothes1Did she walk forwards or backwards? Did
the customer eat his food with his mouth or his ear? If . the program
' ' '
answers, 1 don t know we feel that all of its right answerswere tricks or
lucky guessesand that it has not understood anything of our everyday
' '
restaurantbehaviour" (Dreyfus 1981, 189). This if you don t know that,
' '"
you don t know nothin approach has struck many critics as totally
2
implausible. We often (correctly, I believe) ascribedsomeunderstandingof
things to humanchildren in the faceof the most bizarremisconceptionsand
Situation and substance 29

'
gaps in their knowledge. And couldn t a Martian be said to have learned
somethingof eating by learning what we eat and why , even before knowing
enough about human anatomy to decide whether we eat with our
mouths or our ears? Critical-mass arguments thus strike me as very unconvincing
. Let us thereforepassquickly on to another line of thought.
'
The secondline of Dreyfus s thought takes its cue not from the body
problem but from "
more general observations about ways of encoding
"
if
knowledge ( encoding is the right word ). It is here that his ideas are
most suggestive. He notes that human beings seemnaturally inclined to
spot the signiAcantaspectsof a situation, and he relatesthis capacitynot to
a stored set of rules and propositions. but to their vast experience of
previous 'concretesituationsand somekind of holistic associativememory.
He asks, is the know-how that enableshumansconstantly to sensewhat
specificsituationsthey are in the sort of know-how that canbe represented
as a kind of knowledge in any knowledge representation language no
"
matter how ingenious and complex? (Dreyfus 1981, 198.) Viewed from
' '
this angle, Dreyfus s worry is not that machinesdon t know enough (because
of their lack of bodies and so on) but rather that the way in which
current AI programs representknowledge is somehow fundamentally inadequate
to the real task. Such programs assumewhat Dreyfus doubts,
" all that is relevant to
namely , that behaviour can be formalised
in a structured " . 200intelligent
. This
' description p ( ) is, for Dreyfus, the most basic
"
tenet of what he calls the " information processingapproach.
At this point I must pauseto raise a few questionsof my own. What
exactly counts as a structured description? What counts as aknowledge -
representationlanguage? It may seem as if Dreyfus here intends to rule
out all forms of computational accounts of cognition and lay total (and
mysterious) stresson human bodies and culture. This, however, is not the
case.In a recentbook Dreyfus stresses the needfor 8exible systemscapable
" "
of what he calls holistic similarity recognition if any progress is to
be made in modeling human expertise ( Dreyfus and Dreyfus 1986, 28).
He cites, with some approval, recent work on connectionist or PDP ap-
proachesto mind" (p. 91). Perhapsmost revealingly of" all, .he does so in a
section entitled AI without Information Processing. As we shall see, I
disagree with the claim that such approaches do away with structured
descriptionsor information processing. But that, for now, is another matter.
'
What we need to notice here is just that since Dreyfus s doubts exclude
suchapproachesbut include the approaches taken by Newell, Simon,
Winograd, and others, they may best be seenas doubts about what I earlier
dubbed the SPSShypothesis. They are doubts about whether a certain
computational approach can in principle yield systems with the kind of
8exibility and common sensewe tend to associatewith the warranted
ascription of understanding. The underlying thought, in effect, is that
30 2
Chapter
~ '
'
where real intelligence is concerned, it ain t what you know, it s th~ way
you know it. Of course, there could be a link with the points about bodies
and so forth even here. There may be somethings we know about largely
by our own awarenessof our bodily and muscularresponses(Dreyfus cites
swimming as an example). Perhapsa machinelacking our kind of body but
equipped with some kind of mechanismfor holistic similarity processing
could even so know less about such things than we can. Nonetheless, on
the reading of Dreyfus 1 am proposing (I have no idea whether he would
endorseit ), such a machinewould be flexible and commonsensicalwithin
its own domains,. and as suchit would be at leasta candidatefor a genuine
knower, albeit not quite a human one. This is in contradistinction to any
systemrunning a standardcognitivist program. 1shall expandon this point
in subsequentchapters.
'
Of Dreyfus s two points (the one about the social and embodied roots
of human knowing, the other about the need for flexible, commonsense
knowledge) it is only the secondwhich 1 think we can expect to bear any
deeptheoreticalweight. But at leastthat point is suggestive. So let us keep
it in our back pockets as we turn to a rather different criticism of AI and
cognitive science.

4 Manipulatingthe FormalShadowsof Mind


In a seriesof recent publications(1980, 1983, 1984) John Searlehas established
himself as a leading opponent of the information-theoretic approach
" "
to mind. That approach, he thinks, is just tilting at the formal shadows
of mind. But in contrast, real mentality, lie says, depends on far wetter
'
things, namely, on the physicochemical of humanbrains. Searles
" properties of "
criticism is targeted on what he calls the hypothesis strong AI . This is
"
definedas the claim that the appropriately programmedcomputerliterally
has cognitive statesand that the programs thereby explain human cognition
" Searle1980 283 . The attack
( , ) begins with a now infamous thought
experiment, the puzzling caseof. the Chineseroom. This thought experiment
aims to provide a general critique of the computational approachto
mind. Its starting point is a specificprogram that might seemto simulate
the intentional activity of understanding a story (Schank and Abelson
1977). Very briefly, the program provides the computer with some background
data concerningthe topic of a story to be presented. The computer
can then be given a story on this topic and afterward it will answer
questionsabout the story that are not explicitly resolved in the story itself.
's
Thus, to use Searle example (which I touched on earlier), we might
program background data on human behavior in restaurants
in . We may
then tell the story of a man who enters a restaurant , orders a hamburger,
and upon leaving, presentsthe waitress with a big tip . If the computer is
Situation and substance J1

then asked, "And did the man eat the hamburgerf' it can answer " yes,"
becauseit apparently knows about restaurants. Searle believes, I think
rightly , that the computer running this program does not really know
about restaurantsat all, at least if by "know " we mean anything like
"
understand." The Chinese-room example is constructed in part to demonstrate
this. But Searle believes his arguments against that sort of
computational model of understanding are also arguments against any
computationalmodel of understanding.
We are askedto imaginea humanagent, an Englishmonolinguist, placed
in a large room and given a batch of paperswith various symbols on it.
These symbols, which to him are just meaninglesssquiggles identifiable
only by shape, are in fact the ideogramsof the Chinesescript. A second
batch of papersarrives, again full of ideograms. Along with it there arrives
a set of instructions in English for correlating the two batches. Finally, a
third batch of papersarrives bearing still further arrangementsof the same
uninterpretedformal symbolsand again accompaniedby someinstructions
in English concerning the correlation of this batch with its predecessors .
The human agent performs the required matchings and issuesthe result,
which I shall call " the response." This painstaking activity , Searleargues,
'
correspondsto the activity of a computer running Schanks program. For
we may think of batch 3 as the questions, batch 2 as the story, and batch 1
as the script or background data. The response, Searlesays, may be so
convincing as to be indistinguishablefrom that of a true Chinesespeaker.
And yet, and this is the essential point, the human agent performing
the correlations understandsno Chinese, just as, it would now appear,
a computer running Schank's program understandsno stQries. In eachcase
what is going on is the mere processingof information. If the intuitions
prompted by the Chinese-room example are corred, understandingmust
involve something extra. From this Searle concludes that no computer
can ever understandmerely by " performing computational operations on
"
formally specified elements. Nor , consequently, can the programs that
determinesuchcomputationaloperationstell us anything abou~the special
nature of mind (Searle1980, 286).
Rammingthe point home, Searleasksus to comparethe understanding
we (as ordinary English speakers ) have of a story in English against the
" "
understanding the personmanipulatingthe formal symbols in the Chinese
room has of Chinese. There is, Searleargues, no contest. 'in the Chinese
caseI have everything that Artificial Intelligence can put into me by way
of a program and I understand nothing; in the English case I understand
everything and there is so far no reasonat all to supposethat my
understanding has anything to do with computer programs- i.e. "
, with
computational operations on purely formally specified elements (Searle
1980, 286). In short, no formal accountcan be sufficientfor understanding,
32 2
Chapter
"
since a human will be able to follow the fonnal principleswithout understanding
"
anything (p. 287). And there is no obvious reason to think
that satisfying somelonnal condition is necessary either, though as Searle
admits, this could (just conceivably) yet prove to be the case. The fonnal
descriptions, Searlethinks (p. 299), seemto be capturing just the shadows
of mind, shadowsthrown not by abstractcomputationalsequencesbut by
the actual operation of the physical stuff of the brain.
I shall argue that Searleis simply wrong thus completelyto shift the
emphasisaway from fonnal principleson the basisof a demonstrationthat
the operation of a certain kind of fonnal program is insufficient forinten -
tionality . The position to be developed below and in chapters3 and 5 to
11 views as a necessarythough" perhaps insufficient condition of real
understandingthe instantiation of a certain kind of fonnal description that
is far more microstructural than the descriptionsof the SPSShypothesis.
'
Undennining Searles strongest claims, however, is no simple matter, and
we must proceed cautiously. The best strategy is to look a little more
closely at the positive claims about the importance of the nonfonnal,
biological stuff.

5 ShowingWhat We're Made Of


Searleconsidersseveral possible replies to his paper, only one of which
need interest us here.3 It is what he calls the brain-simulator reply, and it
goes like this. Supposea program modeled the formal structure of actual
Chinesebrains engaged in understanding Chinese. Surely then it would
constitute a genuine Chinese understanding. At this point, to his credit,
" " "
Searlegraspsthe nettle. No , he says, we could imagine an elaborateset
of water pipes and valves, and a human switcher, realising that formal
description too. But wherein
'
would the understandingof Chinesereside1
'"
Surely the answer is nowhere (adapted from Searle 1980, 295). (This
argument should recall the worries about excessiveliberalism and absent
qualia, which, at the last showing (chapter1), had functionalismin a
"
vicelike
grip .) Regardingthe brain simulator, then, Searleis in no doubt: As long
asit simulatesonly the formal structureof the sequenceof neuron firings at
'
the synapses , it won t have simulatedwhat mattersabout the brain, namely
"
its causalproperties, its ability to produce intentional states (Searle1980,
'
295). Or again: What matters about brain operation is not the formal
shadowcastby the sequenceof synapsesbut rather the actualpropertiesof
"
the sequences(Searle1980, 300).
The allusionsto causalpowers have struck many critics as unforgivably
obscure. It is hard to seewhy . Searle's claim has two components: (1) The
formal properties of the brain do not constitute intentionality. (2) the
reason they do not constitute it is that only certain kinds of stuff can
Situation and substance 33

support thought. Well , (1) may be right (see chapter 3), though not for
the reasonscited in (2). But even so, (2) is surely not that obscurea claim.
Searlecites the less puzzling caseof photosynthesis. By focusing on this,
we may begin to unscramblethe chaos.
Photosynthesis , Searle suggests, is a phenomenon dependent on the
actual causal properties of certain substances . Chlorophyl is an earthly
example . But perhaps other . substancesfound elsewhere in the universecan
photosynthesize too. Similarly , Martians might have intentionality, even
though poor( souls ) their brains are made of different stuff from our own.
Suppose we now take a fonnal chemical theory of how photosynthesis
occurs. A computer could then work through the fonnal description. But
'
would actual photosynthesisthereby take place? No , it s the wrong stuff,
you see. The formal description is doubtlessa handy thing to have. But if
it ' s energy(or thought) you need, you had better go for the real stuff. In its
way, this is fair enough. A gross' formal theory of photosynthesismight
consist of a single production, if subjected to sunlight, then produce
"
energy. A fine-grained formal theory might take us through a seriesof
microchemicaldescriptionsin which various substancescombineand cause
various effects. Grossor fine-grained, neither formalismseemsto herald the
arrival of the silicon tulip . Market gardening has nothing to fear from
simulatedgardeningas yet .
Now , there are properties of plants that are irrelevant to their photosynthetic
capacities, e.g ., the color of blooms, the shapeof leaves(within
limits) the height off the ground, and so on. The questionsto askare: What
do the chemicalproperties buy for the plant, and what are the properties
of the chemicalsby which they buy it? The human brain is made out of
a certain physical, chemicalstuff. And perhapsin conjunction with other
'
factors, that stuff buys us thought, just as the plant s stuff buys it energy.
So, what are the properties of the physical chemicalstuff of the brain that
'
buy us thought? Here is one answer (not Searles or that of supporters
'
of Searles emphasison stuff, e.g ., Maloney [1987]): the vast structural
variability in responseto incoming exogenousand endogenousstimuli that
the stuff in that arrangementprovides.4
Supposethis were so. Might it not also be true that satisfyingsomekinds
of formal descriptionguaranteedthe requisitestructuralvariability and that
satisfying otherkinds of formal description did not? Sucha state of affairs '
seemsnot only possible but pretty well inevitable. But if so, Searles
argumentagainstthe formal approachis, to say the least, inconclusive. For
the only evidenceagainst the claim that the formal propertiesof the brain
buy it structural variability, which in turn buys it the capacity to sustain
thought, is the Chinese-room thought experiment. But in that example
the formal description was at a very gross level, in line with the SPSS
hypothesisof chapter 1, which in this caseamountsto rules for correlating
34 Chapter 2

inputs, corresponding to sentencesof Chinese, with similar outputs. It


could well be that a system capable of satisfying that level of formal
description need not possessthe vast structural variability by which (on
my hypothesis ) the brain supports thought. This could be neatly tied in
with Dreyfus' s observationsthat implementationsof conventional cognitivist
programs are inflexible and lack common sense. Such programs do
not depend on a suitably variable and flexible substructureand hencefail
to instantiate any understanding whatsoever. (If this talk about suitably
variable and flexible substructuresseemsmysterious, it should become
less so once we look at new kinds of computational models of mind:
connectionistor PDP models.)
But it might yet prove to be the casethat formal descriptionsat a lower,
more microstructurallevel will have only instantiationsthat mustconstitute
a system with the requisite structural variability . And as long as this
possibility remains, the casefor the importance of stuff is far from watertight
. Moreover, as we shall seein later chapters, cognitive scienceis just
beginning to develop formal, microstructuraltheories that fit this general
bill (seechapters5 to 10). The price of this maneuveris, of course, grasping
'
Searles nettle at the other end. If a set of pipes really did constitute
a systemwith the requisite structural variability, then (subject, perhaps, to
a few further stipulations- seechapter3) we should welcomeit asa fellow
thinker. J am ecumenicalenough to do this. The more so since, I am
reasonabiy convinced, it is at least physically impossible to secure the
relevant variability out of such parts in the actual universe. If there are
possible worlds subject to different physical laws than our own and if
in those worlds collections of pipes, beer cans or whatever exhibit the
relevant fttte-grained formal properties (if , for example, they are organized
into a valJ,le passing network with properties of relaxation, graceful degradation
, generalization, and so on [seechapter 5]), we should bear them
no ill will Some beer cans, it seems, satisfy formal descriptionsthat our
beer canscannot reach.

6 Microfunctionalism
The defenceof a formal approach to mind mooted above can easily be
extended to a defence of a form of functionalism against the attacks
mounted by Block (seechapter 1, sectionS). An unsurprisingresult, since
'
Searles attack on strong AI is intended to castdoubt on any purely formal
accountof mind, and that attack, aswe saw, bearsa striking resemblanceto
the ~ arges of excessive liberalism and absent qualia raised by Block.
Functionalism, recall, identified the real essenceof a mental. state with
.
an input internal state transition, and output profile. Any system with
the right profile, regardlessof its size, nature and components, would
Situation and substance 3S

occupy the mental state in question. But unpromising systems (like the
population of China) could, it seemed, be so organized. Such excessive
liberalismseemedto undenninefunctionalism: surely the systemcomprising
the population of China would not itselfbe a proper subjectof experience.
The qualia(subjectiveexperienceor feels) seemto be nowhere present.
It is now open to us to respond to this chargein the sameway we just
respondedto Searle. It all depends, we may say, on where you locate the
grain of the input, internal state transitions, and output. If you locate it at
the gross level of a semanticallytransparentsystem, then we may indeed
doubt that satisfying that formal description is a step on the road to being
a proper subjectof experience. At that level we may expect absentqualia,
excessiveliberalism, and all the rest, although this needn't precludeformal
accountsat that level being good psychological explanationsin a senseto
be developed later (chapters 7 and 10). But supposeour pro61eis much
finer-grained and is far removed from descriptionsof events in everyday
language, perhapswith internal-statetransitionsspecifiedin a mathematical
formalism rather than in a directly semanticallyinterpretable formalism.
Then it is by no meansso obvious (if it ever was- see Churchland and
Churchland 1981) either that a systemmadeup of the population of China
couldinstantiatesucha descriptionor that if it did, it would not be a proper
subjectof the mental ascriptionsat issue(other circumstancespermitting-
seechapter 3). My suggestionis that we might reasonably bet on a kind
of microfunctionalism, relative to which our intuitions about excessive
liberalismand absentqualiawould show up as more clearly unreliable.
Sucha position owes something to Lycan's (1981) defenceof functionalism
againstBlock. In that
'
defensehe accusesBlock of relying on a kind of
gestalt blindness (Lycan s term) in which the functional componentsare
made so large (e.g ., whole Chinese speakers '
) or unlikely (e.g ., Searles
beer cans) that we rebel at the thought of ascribing intentionality to the
giant systemsthey comprise. Supersmallbeings might, of course, have the
sametrouble with neurons. Lycan, however, then opts for what he calls
a homuncularfunctionalism, in which the functional subsystemsare iden-
tified by whatever they may be said to do for the agent.
Microfunctionalism, by contrast, would describe at least the internal
functional pro61eof the system (the internal state transitions) in terms
far removed from such contentful, purposive characterizations . It would
delineateformal (probably mathematical ) relationsbetweenprocessingunits
in a way that when those mathematicalrelationsobtain, the systemwill be
capableof vast, flexible structural variability and will have the attendant
emergentproperties. By keeping the formal characterization(and thereby
any good semantic interpretation of the formal characterization ) at this
fine-grained level we may hope to guaranteethat any instantiation of such
a description provides at least potentially the right kind of substructureto
36 2
Chapter

support the kind of flexible, rich behavior patternsrequired for true understanding
. These ideas about the right kind of fine-grained substructures
will be fleshedout in later chapters.
Whether such an account is properly termed a speciesof functionalism,
'
as I ve suggested, is open to somedebate. I have opted for a broad notion
of functionalismthat relatesthe real essenceof thought and intentionality
to patterns of nonphysically specifiedinternal state transitions suitablefor
mediating an input-output profile in a certain general kind of way. This in
effect identifies functionalism with the claim that structure, not the stuff,
counts and hence identifies it with any formal approach to mind. On
that picture, microfunctionalism is, as its name suggests, just a form of
functionalism, one that specifiesinternal state transitions at a very fine-
grained level. " "
Somephilosophers, however, might prefer to restrict the functionalism
label to just those accountsin which (1) we begin by formulating, for each
individual mental state, a profile of input, internal state transitions, and
output in which internal state transitions are described at the level of
beliefs, desires, and other mental states of folk psychology (see the next
chapter) (2) we then replacethe folk-psychological specificationsby some
formal, nonsemantic specification that preserves the boundaries of the
folk-psychological specifications .! Now there is absolutely no guarantee
that such boundaries will be preserved in a microfunctionalist account
(seethe next chapter). Moreover, though it may, microfunctionalismneed
not aspire to give a functional specificationof eachtype of mental state.
(How many are there anyway?) Instead, it might give an account of the
kind of substructureneededto support general, flexible behavior of a kind
that makes appropriate the ascription to the agent of a whole host of
"
folk-psychologicalstates. For thesereasons,it may be wise to treat microfunctiona
" asa term of art and the defenceof functionalismasa defence
of the possiblevalue of a fine-grained formal approachto mind. I use the
terminology I do becauseI believe the essentialmotivation of functionalism
lies in the claim that what counts is the structure, not the stuff (this is
consistentwith its roots- seePutnam 1960, 1967, 1975b). But who wants
to fight over a word? Philosophical disquiet over classicalcognitivism,
I conclude, has largely been well motivated but at times overambitious.
Dreyfus and Searle, for example, both raise genuine worries about the
kind of theories that seek to explain mind by detailing computational
manipulationsof standardsymbolic atoms. But it is by no meansobvious
that criticisms that make senserelative to those kinds of computational
modelsare"legitimately generalizedto all computationalmodels. The claim
that structure, not stuff, is what counts has life beyond its classicalcognitivist
incarnation, as we shall seein part 2.
3
Chapter
FolkPsychology , andContext
, Thought

1 A Canof Wonns

Imagine a can of worms liberally spicedwith red herrings. Such, I believe,


is the continuing debate over the role of folk-psychology in cognitive
science. What facesus is a convulsing massof intertwined but ill -deAned
issueslike:
. Is commonsensetalk of our mental lives (folk-
psychology) a protoscientific
theory of the inner wellsprings of humanaction?
. Should we expect a neat, boundary-preserving reduction of folk-
psychologicaltalk to the categoriesof a scientificpsychology?
. Failing such reduction, can cognitive scienceproperly claim to be
studying the mind?
. Conversely, could progressin cognitive scienceforce the abandonment
or revision of ordinary folk-psychological, talk of beliefs and
'
desires? Is it likely to?
There is a sprawling literature here: Churchland 1981, Stich 1983, Fodor
1980a, Searle 1983, McGinn 1982, Millikan 1986, Pettit and McDowell
1986, and Clark 1987a. And that barely scratches the surface. My strategy
will be to divide and selectivelyignore.
It may be helpful briefly to gesture at the relation of these issuesto
the overall themes of my discussion. One major goal of this book is to
develop a &ameworkin which someformal approaches to mind canbe seen
as plausible, despite the kinds of objections raised in the last chapter, and
philosophically respectable . This demand of philosophical respectability
requires me to be quite canny about the precise way in which formal
or computational considerationsare meant to illuminate the mind. One
stumbling block here is the thought that the very ideaof mind is intimately
tied up with our ordinary talk of mental stateslike belief, desire, hope, -and
fear. Thesemental states, according to some of the arguments treated
below, will necessarily(on some accounts) or probably (on others) elude
analysis in terms of the internal states that cognitive science
'
eventually
endorses . So whatever elsecognitive sciencedoes, it won t succeed(so the
38 Chapter 3

argumentgoes) in illuminating the nature of mind. If you acceptthis (which


you need not do), you might either conclude: so much the worse for the
commonsenseidea of mind (Churchland 1981; Stich 1983), or so much
the worse for the claimsof cognitive scienceto be investigating the nature
of mind (seevarious articles in Pettit and McDowell 1986). It thus falls to
any would-be apologist for cognitive scienceto delve at least some way
into the vermian heap. So here goes.

'
2 A Beginner
s Guideto Folk Psychology
The good news about folk psychology is that a beginner's guide won' t be
necessaryafter all. For it is a fair bet that nobody reading this book is
a beginner. The term " folk psychology" refers simply to our mundane,
daily understandingof ourselvesand others as believing, hoping, fearing,
desiring, and so forth. Some such understandingof mental states is the
commonproperty of most adult humanspeakersin contemporarysocieties.
At its core it usesbelief and desireascriptionsto shedlight on behavior or
(to avoid begging questions) bodily movements.
A colleaguesuddenlygets up and rushesover to the bar. You explainher
movementsby saying, "Sheis desperatefor a Guinnessand believesshecan
'
get one at the bar." Very likely you won t expressyourself in sucha stilted
and artificial way in sucha simplecase. But the explanatorymode is familiar
enough, and it is one we explicitly usein more complex cases , e.g ., solving
detective mysteriesthat involve the searchfor motives for the evil deed.
At a minimum folk psychology is thus the use of belief and desire talk
to explain action or, better, movement (movement becomesaction when
it is subsumableunder the intentional umbrella of a folk-psychological
'
understanding). Folk psychology is not the gossips understanding "
of, e.g .,
Freudian theories of psychoanalysis . In that respect the term folk pys-
" is
chology somewhat misleading. In the recent literature (Churchland
1979, 1981; Stich 1983) folk psychology is treated (and criticized) as a
primitive, protoscientifictheoryof the internal causalantecedentsof human
behavior. At first blush this may seema somewhatstartling thought. What
could be meant by the claim that our ordinary ideas about the mental
involve some kind of theory? And even if they do, why should it be
a theory of intenial causesof behavior? Let us addressthe first of these
questionsand leave the other to ferment until later. Return to the caseof
the lover of Guinness. Our belief-desire description of her movements
toward the bar is explanatory , it is argued, only if we tacitly accepta general
psychological law. In this case :
(:r)(p)(q) { [(:r desiresthat p) & (:r believesthat (q - + p ]
- + (:r will try , all elsebeing equal, to bring it about that q) }
Folkpsychology , andcontext 39
, thought

Substituting for x, p, and q we get (roughly):


In all cases, if our colleaguedesiresa Guinnessand believesshe can
get one at the bar, then (all elsebeing equal) shewill go to the bar.
'
Sa who s arguing? The story might be tightened up, but the moral shines
through. The explanatory force of our ordinary accountdependson tacitly
treating the behavior or movement as falling under generalpsychological
laws. Otherwise, we would not have an explanationat all. What constitutes
the theoreticalcontent of folk psychology is just this framework of general
laws that underpinsour daily understandingof one another(seeChurchland
1981, 68- 69).

3 TheTroublewith Folk
Now for the bad news. Folk psychology,', it seemsto some, is in various
ways flawed and unsatisfactory (Churchland 1981; Stich 1983). Specific
complaintsabout folk psychology include the following :
(1) Folk psychology affords only a local and somewhat species -
specificunderstanding . It floundersin the face of the young, the mad,
and the alien.
(2) It is stagnantand infertile, exhibiting little change, improvement,
or expansionover long periods of time.
(3) It shows no signs as yet of being neatly integrated with the
body of science. It seemssadly disinterestedin carving up nature at
neurophysiologicallyrespectableroots.
The folk, in short, just don' t know their own minds. Let us look at each
complaint in a little more detail. '
Complaint (1) surfacesin Churchlands insistencethat the substantial
explanatory and predictive successof folk psychology must be set against
its failure to cope with , " the nature and dynamics of mental illness, the
faculty of creative imagination, . . . the ground of intelligence differences
betweenindividuals, . . . the nature and psychologicalfunctions of sleep, . . .
the miracle of memory, [and] . . . the nature of the learning processitself"
(Churchland 1981, 73).
StephenStich, in a similar vein, is worried by the failure of folk psychology
to success fully explain the behavior of exotic folk and animals.
Thus, he claims that we are unable adequatelyto characterizethe content
of alien or outlandish beliefs. Giving the exampleof someonewho seems
to believehe is a heapof dung, Stich notes that we are tempted to say that
if it seemsthat someonebelieves that, then we really can't be sure what
they believe. That is, it seemsunlikely that any such alien belief can be
40 Chapter 3

adequatelycaptured by any ordinary sentenceof our language. Another


exampleagain developedby Stich (1983, 104) concernsthe description of
a dog as believing there is a squirrel up an oak tree. In one way, the
ascription of such a belief seemsfair enough: the dog saw the squirrel go
up the oak tree and now sits at the bottom waiting for it to come back
down. But in another senseit seemsquite unwarranted to credit the dog
with the belief that what it seesis a squirrel. ( Doesit also believe it sees
'
an animal, or somethingthat storesnuts1) Stichs point, then, is that in cases
of exotic or animalbeliefs, folk psychology shows signs of breaking down.
It seemsto urge both that someonedoes, and that they cannot, believe
they are a heapof dung. It seemsto urge that the dog does, and doesnot,
believe there is a squirrelup a tree. Folk psychology, it seems,is just not up
to taking on the really hard cases(Stich 1983, 101). And if we believe Stich
and Churchland, so much the worse for folk psychology as a theory of
mental life.
Moving on to complaint (2), we facethe accusationthat folk psychology
has a history of retreat, infertility , and decadence , whereasa good theory
should exhibit progress, refinement , and expansion. The thougJtt here is
as
that considered a standard, speculative scientific theory, folk psychology
would seem to be degenerating in the strict sense of Lakatos (1974,
-
91 196) . A scientific or
theory theory sequence is said to be degenerating
if it fails over a long period to extend its early success es and to predict
and explain novel phenomena. Becausethe theoretical substratumof folk
psychology is implicit, it is somewhat difficult to see exactly what the
point about"degenerationcan amount to. Churchland makeshis complaint
by saying, The [folk psychology] of the Greeks is essentially the [folk
psychology] we use today and we are negligibly better at explaining
human behavior in its terms than was Sophocles. This is a very long
period of stagnation and infertility for any theory to display. [Its] failure
to develop its resourcesand extendits rangeof success esis therefore darkly
"
curious and one must query the integrity of its basic categories (Churchland
1981, 74; my emphasis ). Presumably, then, the thought is that our daily
explanations of each other' s behavior ought ideally to be increasing in
variety ( new terms and phrases) and hencein detail, predictive power, and
success . This would be evidenceof a progressiveunderlying theory. One
immediate comment is that something like this actually does take place.
New terms and phrasesare coined, and they do seemto bring increased
" " " "
understanding. Terms like mauvaisefoi, Schadenfreude , and perhaps
even Freudiannotions of the ego and id are casesin point . (I owe these
examplesto Robert Griffiths.) So if there really is stagnationand infertility ,
it must be located at a much deeperlevel. And indeed, as remarkedabove,
it is true to say that the basic framework of ascribing beliefsand desiresas
explanations of actions is pretty much unchangedacross vast stretches
Folkpsychology
, thought, andcontext 41

of historical time and geographically distant cultures. But this level of


unchangingcommonality may be evidence, as we shall later see, that what
we are dealing with is something rather different to a mere stagnant folk
theory.
Lastly, complaint (3) folk psychology seemsto show no signsof carving
'
up nature at neurophysiologically respectable
"
joints. Thus, Churchlands
celebrationof the growing synthesisof particle physics, atomic and molecular
theory, organic chemistry ,
" isevolutionary theory, biology , physiology
and materialisticneuroscience cut prematurelyshort by the sadobservation
that folk psychology " is no part of this. . . . Its intentional categories
stand magnificently alone, without visible prospect of reduction to that
larger corpus. A successfulreduction cannot be ruled out . . . , but [its)
explanatory impotence and long stagnation inspire little faith that its
categories will find themselvesneatly reflectedin the framework of neuroscience
"
(Churchland 1981, 75). Neat reduction, it seems, is the name
of the game. Commonsenseascriptions of mental states must map onto
theoretical divisions within a successfulscientific account of states of the
heador perish. Already they are heading for the hills. But the panic, as we
shall see, is somewhatpremature.
" "
Justto round off the bad news, we might mention a generalsentiment
saidto be sharedby Stich, Churchland, and Daniel Dennett (seeStich 1983,
chapter.11, note 10). This is that folk psychology is almost boundto prove
deeply misguided.
The very fact that [folk psychology) is a folk theory should make.us
suspicious.For in just about every other domain one can think of the
ancient shepherdsand cameldrivers whose speculationswere woven
into folk theory have a notoriously bad track record. Folk astronomy
was falseastronomy and not just in detail. . . . However wonderful and
imaginative folk theorising and speculation has been it has turned
out to be screaminglyfalse in every domain in which we now have
a reasonably sophisticatedscience.(Stich 1983, 229.)
There is surely something very wrong with this picture. Can we really
imagine that our ancestorssat around a campfire and just speculated that
human behavior would be usefully explained with ideas of belief and
desire? Surelynot. Somesuchunderstanding , though not verbally expressed,
seemsmore likely to be a prerequisiteof a highly organized society of
languageusers than a function of their speculations.Moreover, what makes
'
Churhlands commentscriticismsof folk psychology, as opposed to observations
about its nature? There seem to be all sorts of assumptionshere
about the roleof ordinary ascriptionsof mental statesin our lives. Are such
'
ascriptions really just a tool for explaining and predicting others bodily
movements? And even if in some senseit is such a tool , is it really trying
42 3
Chapter

to fulfil its purpose by tracking states of the head? Would it even be


wise to try to explain behavior in such a way? If any of these pointed
queriesdraws blood, the honor of the folk may be preserved. Instead of
losing at protoscience, the folk turn out to be winning at a different game.
The suspicionof a deep mismatchbetween the game of folk psychology
and the game of scientific theorizing about states of the head has been
gaining ground in recentanalyticphilosophy. It is worth spendinga moment
to reconnoiterthe new terrain.

4 Contentand World
The content of a mental state is what gets picked out by the " that" clause
in construcHons like: "Daredevil believes that Elektra is dead," "Mary
' "
hopes that Fermats last theorem is true, and so on. Sincethe discussion
of content coversquesHonsabout meaningand about mind, in suchdiscussions
philosophy of mind, philosophy of psychology, and philosophy of
languageall meet up, with spectacularpyrotechnic results (see, e.g., Evans
1982 and essaysin W o~ eld 1982 and in PetHt and McDowell 1986).
The part of the display that interestsus hereconcernsthe debateover what
hasbecomeknown as broador world-involvingcontent.
There is a tendency to think of psychological states as, in essence ,
self-containedstatesof the individual subject. That is to say, of course, not
that we are not located in and affected by the world , but only that our
psychological statesare not essenHally determinedby how the world about
us really is so much as by how it strikes us as being. In other words, the
intui Hon is that whatever doesn't in any way impinge on your conscious
or unconsciousawarenesscan't be essentially implicated in any correct
specificationof your mental state. On this view your mental stateshave
the contents they do becauseof the way you are, irrespecHve of the
possibly unknown facts about your surroundings.
Much recent philosophy is characterizedby a snowballing crisis of faith
in this seeminglyimpregnabledoctrine. Content, accordingto the hereHcs,
essenHally involves the world (PetHt and McDowell 1986, 4). The crisis
began on twin earth (seePutnam 1975a). The twin earth thought experiments
work by varying the facts about the environment while keeping
all the narrowly specifiablefads about the subject constant. Narrowly
'
specifiablefacts about a subject include the subjects neurophysiological
profile 'and any other relevant facts specifiablewithout referenceto the
subjects actual surroundings either present or past. The upshot of such
thought experimentsis to suggest that some content, at least, essenHally
involves the world. Thus, to usethe standard, well-worn example, imagine
" "
a speakeron earth who says There is water in the lake. And imagine on
twin earth a na" ow doppelganger (someone whose narrowly specified
Folk psychology, thought, and context 43

statesare identical with the first speaker) who likewise says"There is water
"
in the lake. Earth and twin earth are qualitatively identical except that
water on earth is H2O while water on twin earth is XYZ, a chemical
differenceirrelevant to all daily macroscopicwater phenomena.Do the two
speakersmean the samething by their words? It has begun to seemthat
they cannot. For many philosophershold that the meaningof an utterance
must determine the conditions under which the utteranceis true. But the
utteranceson earth and twin earth aremadetrue or falseby the presenceor
absenceof H2O and XYZ respectively. So if meaning determines truth
conditions, the meaningof statementsinvolving natural kind terms (water,
'
gold, air, and so on) cant be fully explained simply by reference to
narrowly specifiablestates of the subject. And what goes thus" for natural
kind terms also goes (for similar reasons) for demonstratives( that table,"
" the "
pen on thei' sofa, etc.) and proper names. The l~sson, asPutnamwould
have it , is that meaningsjust ain' t in the head."
At this point, accordingto Pettit and McDowell (1986, 3), we have two
options. (1) We could adopt a composite account of meaning and belief
in which content dependson both an internal psychological component
(common to the speakerson earth and twin Earth) and an external world -
involving component ( by hypothesis, not constant acrossthe two earths).
Or (2) we could take suchcasesas calling into question the very idea that
the psychological is essentially inner and hence as calling into question
even the idea of a purely inner and properly psychological componentof
mental states, as advocatedin (1). As Pettit and McDowell (1986, 3) put it ,
"No doubt what is 'in the head' is
causallyrelevant to statesof mind.' But
must we suppose that it has any constitutive relevance to themf Of
course, we do not haveto take the twin earth casesin either of the ways
mentioned above. For one thing, they constitute an argument only if we
antecedentlyacceptthat meaning should determine truth conditions. And
even then there might be considerableroom for maneuver(see, e.g ., Searle
1983; Fodor 1986). In fact, I suspect that as argumentsfor content that
involves the world , the twin -earth casesare red herrings. As Michael
Morris hassuggestedin conversation, they servemore to clarify the issues
than to argue for a particular view. Nonetheless, the idea that contentful
statesmay essentially involve the world has much to recommendit (see
especiallythe discussionof demonstrativesin Evans 1982).
This, however, is not the place to attempt a very elusive argument.
Instead, I propose to conduct a conditional defenceof cognitive science.
Even if all content turned out to radically involve the world (option (2)
above), that in itself need not undermine the claim of cognitive science
to be an investigation that is deeply (though perhapsnot constitutively)
relevant to the understandingof mind. In short, acceptingoption (2) (i.e.,
44 Chapter 3

rejecting the idea that the psychological is essentially inner) does not
commit us to the denial of conceptual relevance, implied in the quoted
passagefrom Pettit and McDowell .
The notion of constitutive relevance will be amplified shortly. First,
- '
though, a word about an argument that (if it worked which it doesnt )
would make any defenceof cognitive scienceagainst the bogey of broad,
world -involving content look strictly unnecessary
. The argument(adapted
from Hornsby 1986, 110) goes like this:
Two agentscan differ in mental state only if they differ somehow in
their behavioraldispositions.
A behavioral difference (i.e., a difference in behavioral dispositions)
requiressomeinternal physical difference.
'
So there cant be a difference in mental states without somecor-
responding difference of internal physical states (contrary to some
readingsof the twin -earth cases ).
In other words, the content of mental stateshasto be narrowly determined
if we are to preservethe idea that a differencein behavioral disposition
(upon which mentality is said to supervene) requiresa differenceof inner
constitution.
This argument, as Hornsby (1986, 110) points out, tradeson a fluctuating
" "
understanding of "behavior." In the first premise behavior" means bodily
"
movements. This is clear, sinceno state of the headcan causeyou to, e.g .,
throw a red ball or speakto Dr. Frankensteinor sit in that (demonstratively
'
identified) chair in the real absence - despite appearances , let s presume-
of a red ball, Dr. Frankenstein , or the chair, respectively. At most, a state of
the head causesthe bodily movements that might count, in the right
externally specifiedcircumstances , as sitting in that chair, throwing the red
ball, and so on. In the second premise, however, the appropriate notion
'
of behavior is not so clearly the narrow one. It may be (and Putnams
argumentswere meantto suggestthat it mustbe) that the correctascription
of contentful statesto one another is tied up with the actual statesof the
surroundingenvironment. If this is so, then it would be reasonableto think
that since the ascription of contentful statesis meant to explain behavior,
behavior itself should be broadly construed. Thus, there could be no
behavior of picking up the red ball in the absenceof a red ball (whatever
the appearancesto the subject, his bodily movements, etc.). In this sense
the ideaof behaviorimplicatedin mental-stateascriptionsis more demanding
than the idea of merebodily movements. In another sense,as Hornsby also
- -
points out (1986, 106 107), it might be lessdemanding, as fine grained
differencesin actual bodily movement (e.g ., different ways of moving our
fingers to pick up the red ball) seemstrictly irrelevant to the ascription of
Folk psychology, thought, and context 4S

- thus seem
psychologicalstates. Ascriptions of folk psychologicalcontents
to carve reality at joints quite different to any we may expect to derive
from the solipsistic study of states of the head that determine bodily
movements. The conclusion(that ascriptionsof folk-psychologicalmental
statesare concernedonly with narrow content) is thus thrown into serious
doubt. It is not clear that we can make senseof any appropriate notion of
"behavior" cannotbe
narrow content. And equivocationon the meaning of
relied upon to fill in the gaps.
The radical thesisthat all ideasof content essentiallyinvolve the world
thus survivesthe latest assault. Despite the doubts voiced earlier, I propose,
as I said, to give away as much as possibleand acceptthis thesis, while still
" "
denying the pessimistic implications for cognitive science. ( This may
seem a curious project, but there are independent reasonsfor requiring
somesuchdefence, as we shall see.)
The target posture (accept broad content and deny the concep.tual or
uncomfortableif
philosophicalirrelevanceof cognitive science) may seem
you acceptthe following argument .
The mental states of folk psychology (belief, desire, fear, hope, etc.)
are individuated by appeal to a broad, world -involving notion of
content.
The accountsand explanationsgiven by cognitive science , insofar as
or
they are formally computationally specifiable , must in principle be
of semantic, world -involving considerations . They
independent any
must have an internal syntactic reading that treats only of narrow ,
definable statesof agents . (See, e. g ., Fodor 1980a .)
solipsistically
There is every causeto believe that semantic, world -involving accounts
and solipsistic, narrow accountswill not carve nature at the
samejoints. There will be no neatly individuated internal states(either
mental
neurophysiologicallyor formally specified) that map onto the
statesindividuated by folk psychology.
'
So cognitive sciencecant be in the businessof contributing to a
because
philosophical understandingof the nature of mental states,
the stateswith which it directly dealsdo not map satisfactorily onto
our notions of mental states.
The conclusion, which is essentiallythat of McCulloch (1986), amounts
"
to a pretty fundamentalrejection of the idea' that there can be a scientific
' - and the
synopsis (of the manifest folk psychological image of ourselves
scientific one) given that mind is unquestionably one of the things that
" 1986,
must show up in it in some suitably scientific guise ( McCulloch
-
87 88). The question, I th ~ is what counts as a scientific synopsishere.
Must a satisfactory synopsis involve a state-for -state correlation ? Or is
46 Chapter 3

there some more indirect way to achieve both synopsis and conceptual
relevance? I believe there is, but we must tread very carefully .

5 Interlude
' "
What a curious project!" you may be thinking. The author proposesto
attempt a defenceof the significanceof cognitive scientific investigations
againsta radical, intuitively unappealing
, and inconclusivelyargueddoctrine.
And he proposesto do so not by challenging the doctrine itself, but by
provisionally accepting it , and then turning the aggressor ' s blade." The
reason for this is simple. With or without the broad-content theory, it
looks extremely unlikely that the categories and classificationsof folk
psychology will reduce neatly to the categoriesand classificationsof a
scientificaccountof what is in the head. This is the " failing" that Church-
land and (to a lesserdegree) Stich ascribeto folk psychology. I embracethe
mismatch, but not the pessimisticconclusion. For folk psychology may not
be playing the samegame as scientificpsychology, despite its deliberately
provocative and misleadinglabel. So I take the following to be a very real
possibility: wheneverI entertaina thought, it is completely individuated by
a state of my head, i.et .he content of the thought does not essentially
involve the world , but there will be no projectablepredicatesin a scientific
"
psychology that correspond to just that thought. By no proledable
"I
predicate meanno predicate(in the scientificdescription) that is projectable
onto other caseswhere we rightly say that the being is entertaining
the samethought. Suchother caseswould includemyself at different times,
other humans, animals, aliens, and machines.
Regardlessof broad content, I thereforejoin the cynics in doubting the
scientificintegrity of folk psychology as a theory of statesof the head. But
I demur at both the move from this observation to the conclusion that
cognitive science, as a theory of states of the head, has no philosophical
relevance to the understandingof mind (Pettit and McDowell ) and the move
to the conclusionthat folk psychology be eliminatedin favor of a scientific
accountof statesof the head(Churchland).
What I try to develop, then, is more than just a conditional defenceof
cognitive sciencein the face of allegations of broad content. It is also
a defenceof cognitive sciencedespite any mismatchbetween projectable
states of the head and ascriptions of specific beliefs, desires, fears, etc.
Relatedly, it is a defenceof belief-desire talk against any failure to carve
nature at internally visible joints. Coping with the broad-content worry is
thus really a fringe benefit associatedwith a more careful accommodation
of commonsensetalk of the mental into a scientific framework. So, now
'
that we know we are getting value for our money, let s move on.
Folk psychology, thought, and context 47

6 SomeNaturalisticReflections
At this point I think we may be excusedfor indulging in a little armchair,
naturalistic reflection. It seemsa fair question to ask, What earthly use
is the everyday practice of ascribing mental states to one another using
the apparatusof folk psychology, that is, the apparatusof propositional-
attitude ascriptionwith notions of belief and desire? One answermight be '
that it is useful as a means of predicting and explaining other peoples
bodily movements by attempting to track internal states of their heads.
This, we saw, is what the eliminative materialist mustbelieve the practice
is for.l Otherwise, it would hardly be to the point to criticize it for failing
to carve up nature at neurophysiologicaljoints. And if that is what the
practiceis for, it may be in deep water. But why should we assumethat it
has any such purpose? Consider an alternative picture, due in part to
Andrew Woodfield.2
On this picture, the primary purpose of folk-psychological talk is to
makeintelligible to us the behavior of fellow agentsacting in the world. In
particular, it is to make their behavior intelligible and predictable just
insofar as that behavior bears (or could bear) on our own needs and
interests. Now let us throw in a few more small facts. The other agents
whosebehavior we wish to makeintelligible areprimarily our peers, beings
with four notable traits. First, they largely share our sensitivity to the
world, i.e., our sensesand any innate protoconceptualapparatus. Second,
they share our world. Third, they shareto a large extent our own most
basicinterestsand needs. Fourth, the biological usefulnessof their thoughts,
like our own, involves their tracking real states of the world , a purpose
for which we may (on evolutionary grounds) assumethat their thinking is
well adapted. Taken together, those traits help makeplain the convenience
and economy of ascribingfolk-psychologicalcontent. The thoughts of our
peers are well adapted to the same world as our own. So given also
a convergenceof needs and interests, we may economically use talk of
statesof the world generally to pick out the salientfeaturesof the thoughts
of others. The development of a tendency to make broad-content ascriptions
begins at this point to seemlesssurprising. Broad-content ascription,
we may say, is content ascription that is sensitiveto the point of thinking,
which is to track statesof the world. In general, it looks as if our thinking
succeedsin this. The paradox of broad-content ascription is just that when
"
our thinking fails (when, e.g ., that chair" is entertainedin the absenceof
any chair), we must say that the thought (or better, the thinking) failed to
have the content intended. But that seemsacceptable , once we see the
generalreasonableness .
of the overall enterprise
Even if we bracket the stuff about broad-content ascription, we still
have a naturalizedgrip on somereasonswhy ascribingfolk-psychological
48 3
Chapter

content ought not to aspireto track neat, projectable states of the head.
For the question must then arise: whose head? According to the present
account, what we are interested in is a very particular kind of understanding
of the bodily movementsof other agents. It is an understanding
of those movementsjust insofar as they will bear on our own needsand
projects. And it is an understanding that seemsto be availableany time we
" "
can use talk of the world (as in the that clauseof a propositional-attitude
'
ascription) to help us pick out "broad patterns in other agents behavior. "
Take for examplethe sentence John believesthat Buffalo is in Scotland.
This thought ascription is useful not becauseit would help us predict, say,
'
exactly how Johns feet would move were someoneto tell him that his
long-lost cousin is in Buffalo or even when he would set off. Rather, it is
useful becauseit helps us to predict very general patterns of intended
behavior (e.g., trying to get to Scotland), and becauseof the nature of our
own needs and interests, these are what we want to know about. Thus,
supposeI have a forged rail ticket to Scotlandand'I want to sell it. I am not
interested in the Ane-grained details of anyones neurophysiology. All
I want to know is where to And a likely sucker. If the population included
Martians whose neurophysiology was quite different from our own (perhaps
'
it involves different fonnal principles even), it wouldn t matter a jot ,
so long as they were capable of being moved to seek their long-lost
cousins. Thus construed , folk psychology is designedto be insensitive to
any differencesin statesof the headthat do not issuein differencesof quite
coarse-grainedbehavior. It papersover the differencesbetweenindividuals
and even over differencesbetweenspecies.It doesso becauseits purposeis
provide a general framework in which gross patterns in the behavior of
many other well-adapted beings may be identified and exploited. The
failure of folk psychology to fix on, say, neurophysiologicallywell-deAned
statesof humanbeings is thus a virtue, not a vice.

7 AscriptiveMeaningHolism
The preceding section offers what is in effect just a slightly different
way of putting the fairly common observation that belief ascription (and
propositional-attitude ascription in general) is holistic. It is a net thrown
over a whole body of behavior and is usedto makesenseof the interesting
regularitiesin that behavior. For this reasonbeliefsare ascribedin complexes.
- 'in that an
As one well known meaningholist puts the point, saying agent
performed a single intentional action, we attribute a very complex system
"
of statesand events to him (Davidson 1973, 349). It is important, however
, to be clear about exactly what meaning holism involves. In a recent
"
attack on the doctrine Fodor summarizesit as follows: Meaning holism
is the idea that the identity - speci8cally, the intentional content- of
Folk psychology, thought, and context 49

a propositional attitude is detennined by the totality of its epistemic


"
liaisons (1987, 56). An epistemicliaison of a proposition p is any proposition
that the agent takesto be relevant to the semanticevaluation of p, i.e.,
to the detennination of its truth or falsity.
Fodor rightly , I think, pours scorn on such a doctrine. For one thing, if
the content of a belief is so detennined by the totality of such liaisons, it
seemsunlikely that any of us very often succeedsin sharing a belief
or intentional state. (See Fodor 1987, 56- 57.) Fodor thus denies that
the content of a belief is dependenton its epistemicliaisons. Instead, he
believesthat beliefshave their contentsseverally . He thus choosesto bet .on
a fonn of denotational semanticsin which beliefs get their contents by
brain statesentering one by one into causalrelationswith the world. Thus,
"
he arguesthat a creature could have the concept HORSEwhetheror not it
"
has the conceptCOW " and that the thought that three is a prime number
"
could constitute an entire mental life (Fodor 1987, 84, 89).
Now we can begin to seewhat is going wrong . Fodor is assumingthat
' .
the crux of the meaningholist s argumenthas to do with epistemicliaisons
" "
In that casethe point about the thought involving 3 and that involving
" "
cow are on a par. A more persuasiveversion of meaningholism focuses
instead on the conditions of the warrantedascriptionof particular mental
statesto a being. And this in itself, at least, is not a point about epistemic
liaisons as defined above. Thus, for example, we would want to know
under what conditions we would be justified in ascribingto a being a grasp
of the concept of three. And here it does seem plausible to insist that
the concept is only ascribableto a system when it exhibits sophisticated
behavior with other numbers, with mathematicalfunctions, perhapswith
'
the counting of external objects, and so on. If such is the behavior, we
acquire in one go the warrant to ascribe a host of mathematicalbeliefs.
But if the behavior is not like that, we surely forfeit the right to ascribe
any. This requirement to ascribebeliefs holistically underlies, I believe,
the best versions of meaning holism. Thus understood, the doctrine is
surely a sensible one. How could some syntactic brain state warrant
ascribinga belief about selling to a system if the system could not show
by its behavior that it can also have beliefs about buying?
This kind of ascriptive holism makesperfect sensein my picture of the
point of belief ascription. On that picture we ascribebeliefs by throwing
a kind of interpretative net over a whole body of behavior. And the mesh
of that net is guagedto our particularinterest in making senseof behavior.
The knots (i.e., the particularascriptionsof beliefsand desires ) neednot correspond
to any natural projectible divisions in whatever underlyingphysical
or computational structuresmake possible the behaviors concerned(and
'
I, for one, cant see why they are even likely to ). When we examine
SO Chapter .3

connectionist models in part 2, we shall also see how a system can produce
semantically systematic behavior without any internal mirroring of the
semantically significant parts of the sentences we use to describe its
behavior .

8 Churchland Again

We can now return to Churchland's specificcriticisms of folk psychology.


There were three, recall.
. The explanatory power of folk psychology is limited. The exotic,
insane, and very young are left mysterious.
. It is a stagnant and sterile theory that has stayed the same for
a long period of time.
. It fails to integrate neatly with neuroscience
.
The last objectionis now easilyfielded. The failure to find a neat, boundary-
preserving reduction of the categoriesand claims of folk psychology to
projectible neuroscientificdescriptionsneed not count as a black mark for
folk theory. We may insist that the folk were not even attemptingto
theorize about projectible internal states. The concernwas rather to isolate
as economically as possible the salient patterns in the behavior of other
agents. Since such patterns may cut across projectible descriptions in a
scientificlanguagedealing solely with internal states, it is to the credit of
folk psychology that it fails to fix on descriptionswith neurophysiological
integrity . Nor needfailure to deal with the exotic, the young, or the insane
surprise or bother us. For as Stich, for example, clearly sees, the folk-
psychologicalmethod exploits commonalitiesin environmentsand cognitive
naturesto generatean economicaland appropriately located grasp of the
salient patterns in the behavior of our peers. If content ascription thereby
breaks down in extreme cases , so be it . That does not impugn its value
as a tool in its intended domain of application. Likewise, stagnation need
causeno loss of sleep. As a tool for its intendedpurpose, folk-psychological
talk hasbeenwell shapedby the constantpressuresin favor of a successful
understanding of others. Such pressuresmay yield the fonD of a good
solution more strongly than we often believe. The shapeof the tool was
forged many centuries ago. Becausethere are salient regularities in the
environment and becausethe use of folk -psychological talk has a limited
goal of making the behavior of others intelligible to us to just the degree
necessaryto plot their movesin relation to our needsand interests, it is not
surprising that the hard core of such understanding (i.e., ascriptions of
beliefs and desires) has remainedrelatively constant acrosstemporal and
geographicaldimensions.
Folk psychology, thought, and contextS 1

A more speculativeaccountof the stagnationmight explain the relative


constancyof folk -psychological understandingby positing an .innate element
. On this account, belief and desireascriptionsare the mind' s way of
making sense of itself and others. Though strictly unnecessaryto my
argument, I believe this hypothesis is more reasonablethan it appears
at first sight. It is extensively defended in Clark 1987a, a paper that
is unfortunately seriously marred by my conceiving the goals of folk
psychology to be solely the tracking of neurophysiologically sound states
of the head. Yet the central claim of that paper fits just as plausibly into
the new treatment developed above. It is that the basic framework of
a folk-psychologicalunderstandingmay well be innately specified. If this is
the case, conceiving of ourselves and others as moved by beliefs and
desires could be as natural and inevitable as seeing the world in three
dimensions. Yet in the latter case, no one talks of " folk vision" or of vision
as a stagnant theory of the world. And this is despite the fact that vision
fixes on some categoriesthat physics finds irrelevant. Nor need an innate
folk-psycholgocial competencesurpriseus. As social creaturesit is vitally
important that we quickly come to grips with the behavior patterns of
our peers. Just as the physically mobile need to know about depth and
sometimescolor, so the socially mobile may need to know about beliefs
and desires. For a sound psychologicalunderstandingof others must surely
makean important contribution to the overall fitnessof a socialanimal. As
Nicholas Humphreys (1983) points out there will always be substantial
evolutionary pressureon social animals to becomemore efficient natural
psychologists. For in the caseof such animals the other members '
of the
group are often the single most significant factor in the animals environment
for flourishing. To take just a single example, consider a recent
casestudy of rhesusmacaquemonkeys (Harcourt 1985). To prosper, these
animals seem to be required to make quite sophisticatedjudgments concerning
the motives of their peers. Very brieRy, support from a high-
ranking femaletends to be decisivein combat situations. The likelihood of
suchsupport is increasedby grooming such females. Thus, if one macaque
seesanother groom a high-ranking female, he must try to avoid contests
with that macaquein the nearfuture. Someknowledgeof the likely behavior
of others in lending and withholding support is essentialto success . It does
not seemunduly generousto describethe observations grooming and
of
predictions of future behavior as involving some primitive understanding
of the motivational states of other members of the group (see Tennant
1984a, 96; Tennant and Schilcher 1984, 178; also see Smith 1984, 69).
'
If this is right , a psychological understanding of one s fellows is as
important for the successof a social animal as recognizing food or
predators. No one baulksat an evolutionary accountof innatecompetences
subservingour capacitiesto achievetheselatter goals. Why not extend this
52 3
Chapter

generosity to the psychological realm? If we do so, we must revise our


ideas about the basis of commonsensepsychological understanding, for
this would not be the rambling speculationsof ancientshepherdsand camel
drivers. Innateness , it must be admitted, is no guaranteeof truth. But if
the goal of folk psychology is as I have painted it , the distinction between
truth and usefulnesslooks problematic. At any rate, ways of thinking that
survive the hard knocks of the school of evolutionary testing must have
something going for them. In such cases , stagnant waters run deep. (For
a fuller discussionof theseand related issuesseeCosmides1985; Premack
and Woodruff 1978; Baron-Cohen et al. 1985; dark 1987a; and for an
opposing view seeChurchland and Churchland 1978.)
With or without any innate element, in the light of all this it seemsfair
to reject the characterizationof our commonsenseunderstandingof mental
statesas a folk theory. For that characterizationis tied up with the idea that
such talk aims in its bumbling way to do what a good scientifictheory of
brain stateswould do better. But this is not so. A better parallel might be
drawn with Hayes's 1979 conceptionof a naive physics. Naive physicsis a
body of commonsenseknowledge of physicallaws and conceptsthat helps
us to get around our everyday world of macroscopicobjects. Knowing
(perhapsnonverbally) such conceptsand relations as 8uid, cause, support,
above, below, and beside is vital to a mobile, manipulative being. (A
leaping monkey, as Boden [1984a, 162] points out, must have somekind of
grasp of distance, flexibility , support, and so on.) The most vital and basic
elementsof such a naive physics must be either innately specified(Boden
[1984a] cites the visual-cliff experimentson newborn animalsasevidenceof
some innate grasp of depth) or else must flow directly from the operation
of probably quite specializedlearning capacitiestrained on the available
data (e.g., visual and tactile data). But the point for now is that however
we get whatever knowledge of naive physics we have, what we get
is an implicit understanding of just those features of the physicalen -
vironment most centrally relevant to our daily projects. If the categories
of a naive physics fail to match those of physical science, this is no deficit.
So long as these ideas serve our daily needs, they have all the integrity
they require. If it would be useful to equip a robot with a grasp of naive
physics, why not also a grasp of naive psychology? In fact, the latter may
be even morepressing. For the physical world has no conception of itself.
But humanagentsdo have conceptionsof themselves , and they involve the
commonsenseideasof mental states. Success fully living in suchacommunity
may well require robots to have a grasp of the framework used by us
to conceiveof our own actions. For all these reasonsI object to the term
" folk " " "
psychology and prefer the more neutral naive psychology or
" mentalistic "
understanding. Generally, however, I shall capitulate to the
Folk psychology, thought, and context S3

now-standardusage. With the honor of our everyday mentalisticdiscourse


thus secure, there remainsthe issueof the relationbetween such discourse
and work in cognitive science. Two questionsloom large here. (1) Should
mentalistic discourse even form the starting point of genuine scientific
inquiry into the inner well-springs of thought? (2) How , if at all, can
cognitive science claim to be contributing to a properly philosophical
understanding of mind? If it at best illuminates the causal backdrop of
the practiceof mentalisticdiscourse, how can it contribute to a constitutive
or otherwise philosophically interesHng understanding of the nature of
mental states?
The first question has an obvious answer. CogniHve sciencemust, of
course, rely on a folk-psychological understandingat every stage, for we
need to seehow the mechanismswe study are relevant (even in a merely
causalsense) to our performancein various cogniHve tasks. Suchtasks(e.g .,
believing at least some of the logical' implicaHons of our other beliefs,
recognizing red blocks, planning a day s work) are necessarilyspecifiedin
folk-psychological terms. If the task is to model thought, we could not
even know whether we had succeededor failed without using a folk-
psychological understandingto individuate the target thoughts. Scientific
psychology thus ultimately answersto folk psychology in precisely the
sameway as physics ulHmately answersto observations. There may be
many l~yers of intervening theory, but the ultimate goal most always be to
do justice to the observedphenomena(see, e.g ., Van Fraassen1980). The
thought that a scienHficpsychology might avoid being likewise answerable
to our daily understandingof mind is madeplausibleonly by relying on the
"
ambiguity of behavior" mentioned earlier. If the goal of a scientific psychology
were simply to explain bodily movements, then it could proceed
independentlyof our folk-psychological " understanding
"
. But it is hard to see
how that study would merit the name psychology (seeHornsby 1986).
Folk psychology thus definesthe goalsof cognitive science(what has to
be explained), and it is consequentlyimplicated in any assessmentof the
successor failure of cogniHve science. This is becausecogniHve science
seeksat a minimum to shedlight on the causalantecedentsof our capacities
to behavein ways that merit desaipHon in an intenHonal , content-asaibing
vocabulary. And we therefore depend on these semanHc descripHons to
assessthe causaland explanatory value of the theories put forward. This,
however, is not to say that the actual pictures of inner statesput forward
by cogniHve scienceneed themselves involve any essenHal use of terms
drawn from folk psychology. Indeed, we may insist that they not use such
terms, at least insofar as they ar~ meant to have their usualsense(i.e., their
broad, contentful, world -involving sense). The parallelwith physicaltheory
makesthis clear. ObservaHon at the normal, unaided, human level is both
the start and the touchstoneof physical theory. But advanced physical
S4 3
Chapter

theoriesthemselvesdo not deploy ordinary kinds of observations. Rather,


they may chooseto carve up the world in ways quite different to those of
commonsenseobservation. I do not mean to push this parallel too far. My
aim is simply to illustrate the possibility of a sciencebeing grounded in,
and answerableto, a range of observationalevidencethat does not appear
(and perhapscouldnot appear, in the caseof broad psychologicalstates) in
the detailedaccountsgeneratedwithin the sciencein question. If cognitive
scienceboth begins with , and is answerableto, the folk -psychological
concept of mind, we must reject any strong version of methodological
solipsism. Cognitive sciencecannot, and typically doesnot, proceedin any
kind of semanticvacuum. And this is true despite the acceptedlack of fit
between scientific, head-bound kinds and broad, world -involving psychological
.
kinds.
The secondquestion, concerningthe allegedphilosophicalimpotenceof
cognitive sciencein the study of mind, is, alas, more serious. How can
cognitive sciencecontribute to a philosophicalanalysisof mind?

9 CognitiveScience
and ConstitutiveClaims
There is without doubt a connectionbetween the broad-content theorist' s
'
worry that' cognitive sciencecant illuminate the mind and the eliminative
materialists worry that folk psychology is a distorting influence on any
scientific study of mind. Eachparty seesthe samebumps and potholes in
the cognitive terrain, the samedeep-seatedmismatchof folk-psychological
kinds and narrowly specifiedscientific kinds. But one side concludesthat
the folk just don' t know their own minds, while the other concludesthat
' '
cognitive sciencecant know the folks minds. Eachside claspsmind tightly
to its bosom, pitying the other side as embracing at best a distorted
shadowof the real thing and deprived forever of the joys of true constitutive
involvement. But this is surely overly romantic. I shall sketcha more
permissiveapproach. First, though I must unpackthe notion of constitutive
relevance.
The intended contrast, as far as I can see, is between constitutive and
merely causalrelevance. What has constitutive relevanceis somehowconceptually
bound up with the subject of study (in this case, mind), whereas
various factorsmay be causallyrelevant to thought without the tie being so
tight that the very idea of thought is unable to survive their subtraction.
The contrast, I suspect, is not as hard and fast as some of those who use
(and often abuse) the term seemto believe. If by intellectual reflection we
can see that a certain phenomenon could not occur in any physically
possibleworld (i.e., in any world where the laws of physicsapply), is this a
caseof constitutive or merely causalrelevance? Despite suchunclearcases
Folk psychology, thought, and context 55

we can, I think, make enough of the distinction to find it intelligible. It is


perhapsclearestin the caseof games. The rules of a game are constitutive
"
insofar as they create or define new forms of behaviour. The rules of
football or chess,for example, . . . createthe very possibility of playing such
games. [ They constitute ] an activity the existence of which is logically
" Searle1969 34 . A fact is as we said constitu-
dependent on the rules. ( , ) p
tively relevant to a phenomenon q if P is not merely causallyimplicatedin
of . Let
q but somehow conceptuallybound up with the very possibility q
us capturethis by a principle of subtraction. Fad p is constitutively relevant
to q if on conceptual grounds we can see that q could not survive the
's
subtractionof p. Thus, to return to Searle example, the rules of chessare
not, as it were, part of the mechanicsthat makeschesspossible; rather, they
are part of what it is for a gameto bechess. We might imagine that certain
memory capacitiesand a universeobeying certain physical laws constitute
chess.
causallynecessaryconditions for the possibility of actually playing
But there is no conceptuallink betweenthe very idea of chess and the idea
of such conditions. Likewise, it may be that no being could in fact think
were it madeentirely of gas. But there is no directconceptuallink between
being a thinker and not being gaseous.
The major worry can now be appreciated. It is that though causally
relevant to our thinking, the scientificstories of what goes on in the head
can standin no conceptualrelation to the notions of thought and meaning,
and hence, from a certain rather purist viewpoint , they lack philosophical
interest. This is the idea canvassedby the radical broad-content theorist
4 above). But even if
depicted by Pettit and McDowell (section " -component" approach
we accept broad content and disallow any dual
the
(seeMcGinn 1982), it is hard to see what forces the conclusion that
scientific story lacks constitutive bite. Presumably , the thought is that
a neat,
any hope of a constitutive scientific account dependson finding
- between scientific kinds and mental kinds.
boundary preserving mapping
But why should that be s01
Here is an alternativeway of being constitutively relevant. Supposethat
the proper ascription to one another of contentful psychological statesis
statesdepends
doublybroad. It is broad first becausethe existenceof such
(suppose) on states of the external world. But it is broad second becausethe
existenceof such states is conceptually bound up with their being grounded
in the right kind of inner causes. The picture I am proposing looks roughly
like this: The corred ascription of psychological states is constitutively
relatedto the actualand counterfactualbehavior of the subject. Behavioris
broad in the senseof involving not just movementsbut their semanticand
world -involving specification,but it is alsobroad in that it involves not just
of a certain formally
any old internal causeof the movementsbut a cause
S6 Chapter 3

specifiablekind (seechapters5 to 9). Justas we might revise '


our description
of the behavior on receipt of new data about the being s relation to the
world , so too we might revise it on receipt of new data concerning the
inner cause of the bodily movements involved. To give the standard
3
example, supposesomeonediscoveredthat his neighbor' s movementshad
all been causedby a giant look-up-tree program with precisedescriptions
"
of outputs in the form 'if (input) then (output). In this case(astronomically
unlikely and maybe even physically impossible) the descriptions of the
neighbor as truly ading are in fact unwarranted, or defeasiblywarranted
and now defeated. In the absenceof the right , doubly broadly specified
behavior, our earlier ascriptions of psychological states and contents are
likewise defeated.
There are various complications here. For example, it may be
that although the discovery of a certain computational substructure
gives us warrant to withdraw ascriptions of mentality, there is no substructure
whose presenceis necessary for the ascription of mentality. This
is a very real possibility, and it leaves the constitutive status of the substructura
stories uncertain. It seemsas if there is logical spacebetween
fully constitutive features (meeting the subtradion principle) and merely
causal supports. That spaceis filled by , for example, casesin which
we have a set of features (or types of computational substructures ) such
that
(1) being a thinker requiresdeployment of somememberof that set,
and
(2) a conceptually visible relation obtains between eachindividual
memberof the set and the warrantedascriptionof contentful thought
(e.g., the substructurescan eachbe seento support the flexible actual
and counterfactualbehavior that warrants the use of a mentalistic
vocabulary), but
(3) there is no further formal or scientificunity to the set of structures
picked out in (2) (i.e., no metalevel fonnal or scientific description
capableof meeting the demandsof the subtradion principle).
My own suspicionis that the boundary between the constitutive and the
causal is too sharply drawn and that genuine conceptual interest may
accrue to all kinds of casesthat fail to pick out relations as strong as
constitutivity .
This issue of constitutivity raises the somewhat thorny problem of
philosophical relevance. Under what conditions would knowledge of the
in-the-head, computationalsubstructureof thought count as a contribution
to a properly philosophicalunderstandingof mind? In point of fad , I feel a
strong resistanceto such a form of question. If one goal of philosophical
Folk psychology, thought, and contextS 7

reflection is to achievean integrated picture of the world and our place in


it as knowers, studiesof the computationalbackdropof our knowing have
as much potential to reveal something of the nature (and possiblelimitations
) of humanthought as any other researchI can think of. And the days
of hard and fast disciplinary boundaries, often dictated more by administrative
conveniencethan academicconcerns, are recedingfast, thankfully. My
own experiencein the highly interdisciplinarySchoolof Cognitive Sciences
at. SussexUniversity is that this blurring of disciplinary boundariesis a
necessarystep along the road to solving many of the major problemsthat
the individual disciplines(e.g., philosophy, psychology, linguistics, artificial
intelligence) once wanted to call their own.
Nonetheless, there is a certain point to pursuing the kinds of questions
just raisedunder the bannerof constitutive relevance. Someclaim is clearly
to be madeconcerningthe relation betweenvarious kinds of computational
substructuresand the nature of human thought. But what, precisely, is the
intended relation? Until we are clear about this (or if it is too early to
decide, clear at least about the possibleoptions), we cannot fulfill our goal
of suggestingjust how theseexplanatory projects might fit into an overall
picture of the nature of mind. Moreover, unlesswe are clear about what
claims are or are not being made, we will have no idea what kind of
evidencewould support or refute them. In pursuing questionsabout constitutive
relations and the like in subsequentchapters, it is thus not my
intention to suggestthat nothing short of full constitutive relevancecounts
as philosophically interesting. Such a view - although common enough
among philosophers- depends on a much crisper conception of the
boundaries of both constitutivity and philosophical interest than any I
.
wish to endorse. "
One centralfeatureof the position I here.advocateremainsto be spelled
out. It is that the relation between the coqect type of inner story and the
required behaviors need not itself be specifiedon any one-to- one or neat
boundary-preserving basis. Rather, we may imagine specifying in some
formal way (deferred until chaptersS to 9) a kind of internal structure
conceptuallyrelated to the very possibility of the rich, flexible actual and
counterfactualbehaviors required for the ascription of mental 'states. An
internal structure is thus deeply implicated in rich flexible behavior, which
warrants simultaneouslyascribing a whole host of mental states to the
subject. But to repeata point that it is almost impossibleto overstress, this
in no way requires or suggestsany neat boundary-preserving mapping
betweeneachof the holistically ascribedmental statesand scientificstories
about the inner causesof the bodily movements involved. In short, I
reject the picture of some neat, boundary-preserving mapping between
commonsensemental statesM I ' . . . , M . and narrowly specifiedscientific
S8 3
Chapter
states51, . . . , Sn. And instead, I adopt this picture:
Level (1) Mental statesM1 , . . . , M .

holistically ascribed
on basisof

Level (2) Rich, flexible behaviors81, . . . , Bn

Level (3) a
() Actual
- / "'"
speaker-world
relations
constitutively
dependent on both

(b) The right kind of


inner causeof bodily
movementstaken as
behaviors81, . . . , 8"
Note that the relationbetweenlevels(1) and (2) is holistic. We arewarranted
in ascribinggroups of mental states.on the basisof overall behavior. The
relation between (3b) and (1) is far from a neat, boundary-respectingiso-
morphism. Suchisomorphismis sabotagedby both the role of (3a) and the
holistic nature of the relation between (1) and (2).

without Folk
10 Functionalism
How does all this relate to our earlier discussionsof functionalism and
classicalcognitivism1Think of the way extra nails relate to a coffin lid. The
classicalcognitivist, recall, was committed (in practiceat least) to a specific
architecturalassertion, namely, that you could instantiatea thinking system
by ensuring(at least for the in-the-headcomponent) that it engagedin the
appropriate manipulation of standard symbolic atoms. But if my earlier
conjecturesare at all on the mark, the attempt to model (with a view to
instantiation) the inner, scientificallyinvestigablecausesof our behavior (or
better, movements) by putting versions of folk-psychologicaldescriptions
into themachinebeginsto look very peculiarand unsound. It is asif we were
attempting to create a human thinker by putting sentencesspecifying
various contentful statesinto her head. This now looks preciselybackward.
On my picture, we need to specify inner states capableof causing rich,
flexible behavior, which itself determines (without boundary-preserving
mappings ) the correctnessof folk -psychological descriptions. Putting the
'
mind possibly broad descriptions of the states of agents acting in the
s
world back into the head and expecting thereby to create mentality is
slightly bizarre. In this respect, the eliminative materialist seemsto have
'
been right ; cognitive science shouldn t seek to model internal states on
Folk psychology, thought, and context 59

ordinary contentful talk. For such talk is not able (nor, I would add, intended
) to be sensitiveto the relevant internal causes , yet it is sensitiveto
irrelevant external statesof affairs. The contrast is between putting tokens
of ordinary contentful talk back into the head (classicalcognitivism) and
seeking an account of how what is in the head enables the holistic
ascription of such contents to the subject in the setting of the external
world. In sum, the less plausible we And folk -psychological ideas as a
scientific theory of the internal causesof behaviour, the less plausiblethe
classicalcognitivist program should seem, since it relies heavily on that
level of description. Where goes the classicalcognitivist, there goes the
standardfunctionalist also. For standardfunctionalism (not the microfunctionalism
describedin chapter 2, section 6) is committed to filling in the
following schemasfor eachindividual mental state.
Mental statep is any stateof a systemthat takesx, .v, z (environmental
effects) as inputs, gives 1, m, n, ( bodily movements, vocalizations, etc.)
as outputs, and fulfils internal state transitionsg, h, i.
The difficulties start in the speciAcation of g, h, i, the internal state transitions
. For internal state transitions specify relationsbetweenmental states,
folk psychologically identiAed. The folk-psychologicalspeciAcationsact as
-
scien-
placeholdersto be 811edin with appropriate, syntactically speciAed
ti Ac kinds in due course. But this, of course is
, simply to bet on the neat,
-
boundary preserving relation between folk psychological kinds and scientific
kinds, which I have beenat pains to doubt for the last umpteenpages.
That kind of functionalism- the kind that treats folk-psychological descriptions
as apt placeholdersfor scienti Acally well-motivated statesof the
head rightly deservesmost of the scorn so freely heapedon it by the
-
eliminative materialist.
In conclusion, if I'm even halfway right , the folk do know their own
minds. But they do so in a way sensitive to the pressuresof a certain
in
explanatory niche, the niche of our everyday understandingof patterns
behavior. The pressureson a computational theory of brain activity are
a
quite different. Such a theory is about as likely to share the form of
folk-psychologicalpicture of mind asa land-bound herbivore is to sharethe
form of a sky-diving predator.
Chapter4
Biological Constraints

1 Natural - Born Thinkers

Cognitive scienceis, in practice, a highly design-oriented investigation into


the nature of mental processes. As we saw earlier (chapter 1, section 4), a
popular methodology goes somethinglike this. First, isolate an interesting
human achievement , story understanding, say. Second , find the best way
you can of getting a conventional Von Neumann computer to simulate
some allegedly central chunk of the input-output profile associatedwith
humanperformanceof the task. Finally, hope that the program so devised
will be at least a clue to the form of a good psychologicaltheory of how
humanbeingsin fact manageto perform the task in question. This classical -
cognitivist strategy , I shall now argue, is as biologically implausible as it is
philosophically unsatisfactory. No seriousstudy of mind (including philo-
sophicalones) can, I believe, be conductedin the kind of biological vacuum
to which cognitive scientistshave becomeaccustomed . In this respectat
least, Pylyshyn' s admiration of the post-Turing idea of a study of cognitive
"
activity fully abstracted in principle from both biological and phenome-
"
nonological foundations (seechapter 1, section2) strikesme asmisplaced.
Constraintsthat apply to naturally evolved intelligent systemsare relevant
to any attempt to model or understandthe nature of human thought. This
chapter simply plots some of these constraintsand shows how they fuel
the fires already lit underneathclassicalcognitivism. I treat the constraints
in ascendingorder of importancefor cognitive science.

2 Most Likely to Succeed


Rememberthose American high-school yearbooksin which a few characters
were singled out as possessingthe kinds of traits that indicate future
success ? In the evolutionary yearbook the star qualities are revealing and
somewhatremovedfrom the capacitiesmost easilymodeledby researchers
in artificial intelligence. A preliminary list of such qualities might include
real-time sensoryprocessing, integration of various input and output mod-
alities, capacityto cope with degenerateand inconsistentdata, and 8exible
62 Chapter 4

deployment of available ' cognitive resources . Suchcapacitiessubserve the


goal of the organisms successin a fast-moving, competitive environment.
In the serviceof such a goal, constant accuracymay have to be traded for
speed.
Thus, to take a central case, it is a fair bet that naturally intelligent systems
must cope with degenerateand even inconsistentdata without crashing
. If we are told both that someoneis fat and that she is thin, this ought
not to precludeour identifying her if we alsoknow that she is wearing an
orange, polka-dot suit. Having good grounds to assertp and good grounds
to assertnot p ought not to encourageus to allow ourselvesthen to infer
anything at all, given the contradiction. To survive and prosper, a system
must be able to cope with conflicting and inadequatedata and do so
without simply ceasingto act entirely. When the tiger leaps, do something,
'
anything; don t just stand there. Ideally, systemsshould be able and willing
to generateintelligent guesseseven on the all-too-common basisof inadequate
'
or inconsistentdata. If a speech-recognition system cant hear a particular
phoneme, it should guess on the basis of the phonemes it can
hearand the words they could makeup. The needto cope with incomplete
or inconsistentdata is sometimescalledthe requirementof gracefuldegradation
. Perfonnanceshould gradually become less satisfactory as available
data decreases ; it should not suddenly cease. Systems, in short, must be
robust enough to survive in an infonnationally hostile environment.
Robustnessof a somewhat different kind is needed to withstand the
physical battering of violence, old age, and entropy. Our storage and
deployment of infonnation ought not to be overly susceptibleto the loss
of a few brain cells or a bang on the head. This requirementtoo is known
as one of graceful degradation, except this time the partial lack of data is
due not to external circumstancesbut to damage sustainedby our encoding
, storing, and retrieving mechanisms . Apart from being robust, the
natural-born thinker should also be flexible. Some organisms choose to
'
competein infonnation processingwarfare; humanshave and clamshavent.
Dennett (1984a, 37- 38) notes that this is the evolutionary choicebetween
a Maginot line (an immobile, annoured approach) and guerilla warfare (a
mobile, intelligent approach). The latter choice results in a cognitive anns
race. If you do chooseto competein a cognitive anns race, one vital asset
is the capacity to deploy as much of your infonnational annory in as wide
a range of situations as possible. In particular, it should be possibleto do
something, even when facedwith a new and unexpectedsituation. This, as
we shall later see, may mean trying to avoid the rigid, task-bound storage
of infonnation.
Finally, I should note the pervasivedemandsof cognitive integration (or
perhapsprotocognitive integration) and real-time sensory processing. A
successfulorganism of the mobile, guerilla-warrior classmust receive and
Biological constraints 63

integrate data from many sensorymodalities, achievethe task in real time,


and control any appropriate sensorimotoractivity (see, e.g ., Walker 1983,
188- 209). This may prove to be a surprisingly strong constraint on the
kind of computationalarchitecturerequired(seepart 2).
In sum, the mobile organismmost likely to succeedis one that is willing
and able to ad quickly on messyand even inconsistentdata, is able to perform
sensory-processingtasks in real time, is preferably able to integrate
the data receivedthrough various modalitiesand deploy it flexibly in new
situations, and is generallyan all-round biological achiever.This multifaceted
profile contrasts quite starkly with the kinds of systemsoften studied in
artiAcialintelligence. Most of thesewould not last Ave minutes in the real
world. Like overprotected children, they have been freed to develop an
impressive performance along some very limited dimension (e.g ., chess
playing) by never having to cope with the most rudimentary tasks that
any natural chess-playing organismwill have coped with . This strategy of
investigating intelligencein what I shall call verticallylimitedmicroworldsis,
I suspect, a major causeof the failure of AI to contribute as much to an
understandingof humanpsychologicalprocesses as we might have hoped.
Vertically limited microworlds take you right up to something close to
human performancein some highly limited and evolutionarily very recent
intellectual domain. Horizontally limited microworlds , by contrast, would
leave you well below the level of human performanceright across the
board but would tackle many of the kinds of tasksforced on the evolving
creatureat an early stage. Suchhorizontally limited microworlds would be,
in effect, the cognitive domainsof animalsmuch lower down the phyloge-
netic tree than us. Flexible, robust, multipurpose, but somewhatprimitive
systemswill (I suspect) teachus more about humanpsychology than inflexible
, rigid simulacraof fragments of high-level human cognitive achievements
. (A current example of such work is Schreterand Maurer' s [1986]
project on sensorimotor spatial learning in connectionist artiAcial organisms
.) This point is further developedin section4 below.

3 Thrift, the 007 Principle

Spongesfeed by filtering water. Successfulfeeding thus requiresthat water


passthrough the sponge. To this end these small beings have developed
flagella that are capableof pumping water at a rate of one bodily volume
every Ave seconds.This much was known as long ago as 1864. Until quite
recently it was assumedthat this pumping action accountedfor all the
water that the sponge processed . Evolution, however, shows herself to be
a thrifty mistress. It turns out that spongesalsoexploit the structureof their
natural environment to reduce the amount of pumping required ( Vogel
1981). The discovery that spongesuse ambient water currentsto aid their
64 Chapter 4

feeding was madeonly in the last decade. And yet, as Vogel points out:
The structure of spongesis most exquisitely adaptedto take advantage
of such currents, with clear functions attaching to a number of
previously functionlessfeatures. Dynamic pressureon the incurrent
openings facing upstream, valves closing incurrent pores lateral and
downstream, and suction from the large distal or apical excurrent
openings c~mbine to gain advantagefrom even relatively slow currents
. And numerousobservationssuggestthat spongesusually prefer
moving water. Why did so much time elapse before someone
madea crude model of a sponge, placedit in a current and watcheda
streamof dye passthrough it? (1981, 190)
'
Vogel s questionis important. Why wassuch an obvious and simple adaptation
overlooked? The reason, he suggests, is that biologists have tended
to seeknarrowly biological accounts, ignoring the role of various physical
and environmental constraints and opportunities. They have, in effect,
treated the organism as if it could be understood independently of an
understandingof its immediatephysical world. Vogel believesa diametrically
opposedstrategy is required. He urgesa thorough investigation of all
the simple physical and environmental factors in advanceof seekingany
"
more narrowly biological account. He thus urges, 00 not develop explanations
-
requiring expenditure of metabolic energy (e.g. the full pumping
hypothesis for the sponge) until simple physical effects ( g e. . the use of
" 182 . a number
ambient currents) are ruled out ( Vogel 1981, ) Vogel gives
of other examplesinvolving prairie dogs, turret spiders, and mimosatrees.
It is the generallessonthat should interest us here. As I seeit , the lesson
is this: if evolution can economize by exploiting the structure of the
'
physical environment to aid an animals processing , then it is very likely
to do so. And processing here refers as much to information processingas
'
to food processing. Extending Vogel s observations into the cognitive
domain, we get what I shall dub the 007 principle. Here it is.
The 007 principle. In general, evolved creatureswill neither store nor
processinformation in costly ways when they canusethe structureof
the environmentand their operationsupon it asa convenientstand-in
for the information-processingoperations concerned. That is, know
only as much as you need to know to get the job done.
Something like the 007 principle is recognized in some recent work in
developmentalpsychology. Rutkowska (1984) thus argues that a proper
-
understandingof the nature of infant cognition requiresrejecting the solip
sistic strategiesof formulating models of mind without attending to the
-
way mind is embeddedin an information rich world. It is her view that
computational models of infant must
capacities be broad enough to include
Biological constraints 6S

useof external structuresas essentialelementsof computation. As sheputs


it , "The notion of computation as rule-governed structure manipulation
"
must be taken to include environmentalas well as intrasubject structures
(Rutkowska1984, 1). This extensionis vitally important. Various studiesin
developmentalpsychology support the need for such an approach. In the
very simplestcase(I investigatea much more complex and interesting one
in chapter7) when seekingan ingredient for baking a cake, a child doesnot
need to remember exactly where the ingredient is located in a store.
Instead, the child may simply go to the right shelf and there seekwhat it
needs. In such casesthe external world stands in for a highly detailed
memory store. (This example, originally from Cole, Hood, and McDermott
1978, is cited in Rutkowska 1986, 88.)
Cognitive ethologists are likewise quick to recognizethe ways in which
animals have developed to augment their limited intellectual capacities
'
with the wily use of environmental structures. Clark s nutcracker, for example
, buries seedsnear logs to facilitate future rediscovery.
Nor , of course, is the strategy of exploiting environmentalstructuresto
aid cognition confinedto children and lower animals. Considerthe pra.ctice
of solving a jigsaw puzzle. This activity combinespurely internal cognita-
"
tions (e.g., This sectionhashalf a bird on it , so I need a piecewith a wing
"
and a piece with a foot ) with physical operations on a real object (the
actual puzzle). These physical operations are essential to our problem-
solving activity ; we seldomrepresentthe shapeof a pieceto ourselveswell
enough to know for surethat it will "fit in advanceof trying it in a plausible
location. The physical operation of trying the piece out may result in a fit ,
in which case the ordinary, nonextended cognitive processagain commences
' .
( What will the next piece have to look like, given the shapeand
'
pictorial content of this piecef ). Solving a jigsaw puzzle(or at leasthuman
solving of jigsaw puzzles) cannot be explainedpurely by appealto a set of
internal processes interpreted as working out the steps of the solution.
Rather, the internal processes must tie in with real operationson the world
for testing hypothesesand generatingnew statesof information. Imagine
trying to devise a model of human capacitiesto solve jigsaw puzzlesthat
took no account of our ability to manipulate the real puzzle. Any such
model would achieve its goal only by constructing a complete internal
representationof the shapeof eachpiece. This may do the job, but it would
hardly be a model of humancognition in the" domain. It would" insteadbe
an exampleof what Dennett (1984b) calls a cognitive wheel, an elegant
but unnaturalsolution to a problem of natural design.
A final point to stressin this context (one that harks back to some of
'
Dreyfus s criticisms of AI outlined in chapter 2) is that the external structures
an intelligent system may exploit include both other agents and its
own body. The potential uses of other agents are, I suppose, obvious
66 4
Chapter

enough. Two headsare indeed often better than one. Since it is so phe-
'
nomenonologically immediate, the use of one s own body is easily overlooked
. Jim Nevins, a researcherinto computer-controlled assembly, cites a
nice example(reported in Michie and Johnston 1984). One solution to the
problem of how to get a computer-controlled machineto assembletight -
fitting componentsis to design a vast seriesof feedbackloops that tell the
computerwhen it hasfailed to find a fit and get it to try again in a slightly
different way. The natural solution, however, is to mount the assembler
armsin a way that allows them to give along two spatial axes. Once this is
"
done, the parts simply slide into place just as if millions of tiny feedback
"
adjustmentsto a rigid systemwere being continuously computed ( Michie
and Johnston1984, 95).
A proponent of the ecological movement in psychology once wrote,
" ' ' "
Ask not what s inside your head but rather what your heads inside of
(Mace 1977, quoted in Michaels and Carello 1981). The positive advice, at
least, seemsreasonableenough. Evolution wants cheapand efficient solutions
to the problems of survival in a real, richly structured external environment
. It should come as no surprisethat we succeedin exploiting that
structureto our own ends. Justas the spongeaugmentsits pumping action
by ambient currents, intelligent systemsmay augment their information-
processingpower by devious use of external structures. Unlike the eco-
logical psychologists, we can ill afford to' ignore what goes on inside the
head. But we had better not ignore what s going on outside it either.
The moral, then, is to be suspiciousof the heuristic device of studying
intelligent systemsindependentlyof the complex structure of their natural
environment. To do so is to risk putting into the modeledheadwhat nature
leavesto the world . Classicalcognitivism, we shall later see, may be guilty
of just this cardinalsin.

4 GradualisticHolismand theHistoricalSnowball
A very powerful principle pervades the natural order. It is often cited,
though, asfar asI know, it is unbaptised. The principle was explicitly stated
by H. Simon (1962) and hassincebeenmentionedby a great"many writers.
The principle states that the evolution of a complex whole will generally
dependon its being built out of a combination of parts, eachof which has
itself evolved as a whole stable unit. The recursive application of such a
procedure enables us to account for the evolution .of complex wholes
without the explosive increasein improbability that would dog any claim
that such a complex evolved in a single step. For example, the one-step
evolution of such a structure as depided in figure 4.1 is much less likely
than the evolution of sucha structureif there are stableintermediateforms,
as illustrated in figure 4.2. The example is deliberately simplistic. But the
Biological constraints 67

60

Figure4.1
Complexst Ncturewith no simplerfonns

60 60

Figure4.2
fonns
Complexstructurewith stableintennediate
68 Chapter 4

principle is genuineand powerful. It covers both the combinationof existing


units at a given time and the successionof structuresover a period of
time. In the latter incarnation it surfacesas the evolutionary demand for
gradual change that results in the successof the whole organism. Thus,
supposewe imagine the structure (c) in figure 4.2 to representan evolved
being. The evolution of (c) is possibleonly becausethe previous structures
(a) and (b) are themselvessuccessfuladaptationsable to survive and reproduce
. As Dawkins (1986, 94) recently put it , it is essentialthat an evolutionary
trajectory should avoid passing through any disadvantageous
intennediatestates. An intennediatefonn that is maladaptivemay well be
a stage in the hypothetical evolution of a very well adapted being. But
nature is not impressedby such farsightedness . Nature countenancesonly
intennediatefonns that prosper in the short tenn, even if the perfonnance
of later models is compromizedas a result.
Such, in the abstract, is the generalprinciple that I shall dub " gradualistic
holism." According to gradualistic holism the evolution of a complex
product is practically possible only insofar as that product is the last (or
current) link in a chain of structuressatisfying two conditions. First, at each
stage the chain must involve only a small changein structure from earlier
stages(gradualism ). Second, eachsuchstagemust yield a new structurethat
is itself a viable whole (holism). When applied to complex biological organisms
, this amountsto the requirementthat the complexproduct emergeas
the outcomeof a seriesof small structuralalterations, eachof which yielded
a whole organismcapableof surviving and prospering in its own niche.
Biological theory countenancesthree ways in which the requirementsof
gradualistic holism may be met. These ways representacceptedsolutions
to the problem of coadaptation , i.e., the problem of accountingfor the evolution
of complex organismswhose parts seemmade for each other yet
whose simultaneousone-step evolution would require a ridiculously improbable
coincidenceof mutations. As outlined, e.g ., in Ridley 1985, 35- 41
the three solutions to the problem of coadaptationare
(1) piecemealevolution with constantfunction,
(2) piecemealstructuralevolution with functional change, and
(3) symbiosis.
It is worth examining these in a little detail, as they representimportant
alternativeswith rather different implications (as we shall later see) for the
cognitive domain.
The clearestexample of (1) is the evolution of the eye. This story has
been told often enough, and I doubt it is necessaryto repeatit again (see,
e.g., Ridley 1985, 36; Dawkins 1986, 80- 83; Clark 1986, 54). The point
is that we can describea seriesof seeing organs such that the seriespro-
Biological constraints 69

gressesfrom simple collections of light sensitive cells, through pinhole


cameraeyes, to eyeswith lenses, and thenceto the eyes of vertebrateslike
ourselves. Eachstep in the seriescan be obtained by small structuralalterations
to its predecessor , thus meeting the requirementof gradualism. And
eachstep conferssomeuseful incrementin visual acuity, thus meeting the
-
requirementof holism. Moreover, we can find instancesof eachintennedi
ate stagein living species , which is tantamount to empiricalproof that the
organismshave met the holistic requirement. Eachlink in the chain leading
to the vertebrateeye was itself an organ usedfor seeing.
Piecemealevolution with changeof fmtction is also possible, as indicated
in option (2). The standardexamplehere is the evolution of flight, again
discussedby Ridley (1985). Feathers , it is said, may first have evolved to
perform the function of thennoregulation by effectively covering the bird
and trapping a cushion of air between body and feather. A bird that had
already evolved a basic wing structure for simple gliding might later find
that the featherscould aid flight . When an organ that initially served one
purposeproves capable of subserving a subsequentand different function,
" "
we speakof its being preadapted for the later function. This is a bit misleading
, since it suggests some foreknowledge of the subsequentuse,
which is precisely what we set out to deny. For consistencyI shall stick
with the receivedtenninology . Feathers , then, may be describedasadapted
for thermoregulation and preadapted for flight . When there is such a
changein function, the organ mayor may not continue simultaneouslyto
perfonn its original function. Featherscontinue to help thermoregulation
even when used for flight According to one interpretation, our lungs
.
evolved as breathing devices only thanksto a preadaptationin the form of
the swim bladders of fish. Swim bladders are sacsof air that help a fish
move in water. On one account, becauseour lungs evolved out of a change
of function of the swim bladder, we are susceptibleto ailmentslike pleurisy
"
and emphysema.As Lieberman(1984, 22) puts it: Swim bladdersare logically
designeddevices for swimming- they constitute a Rube Goldberg
"
system for breathing (Rube Goldberg, I am told, is the American Heath
.
Robinson) . Here , then, we have a casein which a seriesof gradualstructural
changes includes a large and significantchangein function (from swimming
aid to breathing device) and the end product (the breathing device) can be
seenas a kludge if it is consideredas a device designedfor its current role.
" "
Kludge is a term used in engineeringand computer scienceto describe
-
something that, from a pure (i.e., ahistorical), design oriented viewpoint,
looks messy and inefficient. But it gets the job done. And it may even
count asan elegantsolution onceall the constraints(e.g ., the availableskills
and resources ) are taken into account.
The third and final meansby which biological systemsmay meet the
demandsof gradualisticholism is symbiosis. In symbiosisthe various parts
70 Chapter 4

of a subsequentunit evolve separatelyand are put together at a later date.


The classicexamplethis time is the evolution of the eukaryotic cell (Ridley
1985, 38- 39; Jacob 1977, 1164). A eukaryotic cell has a nucleus and
containssuchorganellesas mitochondria and chloroplasts. It is much more
complex than the prokaryotic cell, which is little more than a sacof genetic
material. One account of the evolution of the complex eukaryotic cell
depicts the mitochondria and chloroplasts(which now convert food into
energy for the cell) as once-independent organisms that later formed a
mutually desirablealliancewith the host cell. The terms of the allianceneed
not concern us. Very roughly, the host cell is provided with energy, and
the organelleswith food and raw materials. Suchalliancesallow for great
increasesin complexity achievedvery rapidly by the associationof existing
systems. (Something similar may occur in human thought when we see
how to unite ideasdevelopedin separatedomainsinto a single overarching
structure.) Symbiosis thus meets the requirementsof gradualistic holism
with a contemporaneous set of independentviable structures(thus meeting in
an unusualway the demandof holism) and with a smallstructuralalteration
(as demandedby gradualism) that supportstheir amalgamationinto a new
whole.
Complex biological systems, then, have evolved subject to the constraints
of gradualisticholism. This mode of evolution suggeststhe possibility
of a snowballing effect with worrying .implicationsfor cognitive
science. The snowballing effect is summed up in an informal principle
formulated by Jacob, a cell geneticist. It is just this: " Simpler objects are
more dependenton (physical) constraintsthan on history. As complexity
"
increases , history plays the greater part (Jacob 1977, 1163). The idea is
simple, and follows immediately from our earlier observations. Gradualism
requiresthat each structural step in the evolutionary processinvolve only
a small adjustment to the previous state. Jacobcomparesevolution to a
tinkerer who must use whatever is immediately at his disposalto achieve
somegoal. This casecontrastswith that of an engineerwho, within certain
limits, decides on the appropriate materials and design, gathers them,
and then carries out the project. (Levis-Strauss[1962] explored a similar
analogy involving the notion of bricolage. Seealso Draper 1986.)
The point, then, is that what the tinkerer producesis heavily dependent
on his historical situation in a way in which the engineer's product is not.
Two good engineerswill often arrive independentlyat a similar designthat
"
approaches the level of perfectionmadepossibleby the technology of the
time" (Jacob1977, 1163), whereastwo tinkerersattempting to useonly the
materials they happen to have to hand will be unlikely to chanceon the
samesolution. If evolution proceedsas a tinkerer, each step in the evolutionary
chain exploits a net historical opportunity whose nature is determined
by whatever materials happen to be available to adapt to a new
Biological constraints 71

requirement. Chanceand local factors will play some role at every stage
along the way. Sinceevery later developmentoccursin a spacedetermined
by the existing solutions (and materials), it is easy to seethat there will be
a snowballeffect. Every idiosyncrasyand arbitrarinessat stage51forms the
historical setting for a tinkering solution to a new problem at 52' As
complexity increasesand 51gives way to 5" , the solutions will come more
and more to depend on the particular history of the species.This may be
one reason why some evolutionary theorists (e.g ., Hull 1984) prefer to
regard a speciesas a historical individual picked out by the particular
circumstancesof its birth , upbringing, and culture, rather than as an instance
of a general, natural kind.
This historical snowballing effect, combined with the need to achieve
some workable total .system at each modification (holism), often makes
natural solutions rather opaque from a design-oriented perspective. We
have already seenone examplein the evolution of a breathing device from
a swim bladder. If we set out to build a breathing device from scratch, we
might, it seems,do a better job of it .
Two further examplesshould bring home the lesson. These examples,
beautifully describedby Dawkins (1986, 92- 95), concern the human eye
and the eye of the flatfish. The human eye, it seems, incorporatesa very
strangepiece of design. The light sensitive photoreceptor cells face away
from the light and areconnectedto the optic nerve by wires that facein the
direction of the light and passover the surfaceof the eye before disappearing
through a hole in the retina on their way to the brain. This is an odd
and seeminglyclumsy pieceof design. It is not sharedby all the eyes that
nature has evolved. An explanation of why vertebrate eyes are wired as
they are might be that the combination of some earlier historical situation
with the need to achievean immediateworking solution to someproblem
of sight forced the design choice. The wiring of the eye, then, has every
appearanceof being a kludge, a solution dictated by availablematerialsand
short term expediency. As a pieceof engineeringit may be neither elegant
nor optimal. But as a pieceof tinkering, it worked.
Dawkins' s secondexampleconcernsbony Aatfish, e.g., plaice, sole, and
halibut. All flatfish hug the seafloor. SomeAatfish, like skatesand rays, are
flattened elegantly along a horizontal axis. Not so the bony flatfish. It is
flattened along a vertical axis and hugs the seafloor by lying on its side.
This rather ad hoc solution to whatever problem forced thesefish to take
to the seabed must have raiseda certain difficulty . For one eye would then
be facing the bottom and would be of little use. The obvious tinkerer' s
solution is to gently twist the eye round to the other side. Overall, it is a
rather messy solution that clearly shows evolution favoring quick, cheap,
short-term solutions (to hug the sea bed, lie on your side) even though
they may give rise to subsequentdifficulties (the idle eye), which then get
72 4
Chapter

solved only by further tinkering. This process, incidentally, is recapitulated


in the development of the bony flatfish, the bony flatfish starts life as a
symmetrical surface-swimming fish and subsequentlyundergoesa distortion
of the skull as one eye moves over the head to the other side of the
fish, which then settles on the sea bed to live out its days. As Dawkins
'
(1986, 92) puts it : ~ e whole skull of a bony flatfish retains the twisted
and distorted evidenceof its origins. Its very imperfectionis powerful testimony
of its ancienthistory, a history of step by step changerather than of
deliberatedesign. No sensibledesignerwould have conceivedsucha monstrosity
if given a free hand to createa flatfish on a cleandrawing board."
It seems , then, that to understandthe appearanceof the bony flatfish, we
must treat the speciesas a historical individual whose current state is the
product of a particular seriesof accidents, problems, and short-term solutions
to theseproblems.
The underlying explanatory principle of gradualistic holism and the
snowballeffect it inducessuggestthat we should treat all evolved complex
systemssimilarly. If this is so, the implications for what is fundamentally
a design -orientedcognitive sciencemay be profound. For why suppose
that cognitive adaptationsare exempt from the sameconstraints? To put
the point starkly, why suppose that our means of, say, playing chess
is not fundamentally informed by the natural constraint of building a
chess-playing capacity out of cognitive componentsdesignedfor spotting
predators? The mind, as far as I can see, is as good a place to find a kludge
as the lung or the eye.
An analogy may help to clarify the idea. Imaginethat you devisea range
of software for a small business. As the businessexpandsyou are able to
changesmall bits of the software in fairly circumscribedways. What you
can neverdo, even when the firm becomesa multinational corporation, is
rewrite the software packagefrom scratch. You end up trying to run the
equivalent of a small country with a deviously deployed software package
that still bears the hallmarksof the comer grocery shop for which it was
originally devised. That is the way I see the cognitive kludge. Someone
sitting down with a cleanslateto devisesoftwarefor a multinational would
doubtless end up with a very different package, one that might boost
profits quite considerably. But evolution builds its new cognizersout of old
parts and seeksresultswith minimal alterations. Hencethe kludge.
The third and final moral, then, is that a computationally oriented investigation
into the principles of human psychology had better attend to the
kinds of capacitiesrequiredfor successat a fairly evolutionarily basiclevel,
since the solutions developed here are likely to impose very strong constraints
on the later solutions to higher-level problems. The cognitive
scientistthus needsat least a broad appreciationof what we may term the
functionalphylogenyof mind. Sucha functional phylogeny would comprisea
Biological constraints 73

rough ordering of the problemsfacing an evolving organismcompeting in


a cognitive arms race. The kinds of capacity that sucha phylogeny would
pick out as primitive, central, and hencesuitableobjectsof ordinary design-
oriented investigation might include:
Locomotor and manipulativeskills
. sensorimotor coordination for walking, climbing, running, using
tools, etc.
. kinesthetic and proprioperceptive awareness(knowledge of one's
'
own movement in spaceand of the relative locations of one s own
bodily parts)
Object-oriented skills
. recognition of objects as enduring in spaceand time
. recognition of the value of objects(e.g ., to eat, to play with , to fear,
etc.)
Spatialskills
. navigational skills
. spatialmemory
. path recognition (e.g., spotting a way through denseundergrowth)
Perceptualskills
. various perceptualsystems
. cross-modal abstractionand integration
Generalcognitive skills
. analogicalreasoning, learning from experience
. selectiveattention
. emergency interrupt systems (e.g., " Stop feeding. There's a lion
"
coming! )
. memory and anticipation
. modeling oneselfand the environment
. curiosity, actively seekingout relations of causeand effect
. playfulness
Socialskills
. recognition of others' rank or standing
. prediction and control of others' behavior by psychologicalmodeling
of the other
( Theabove list is provisional and incomplete. It is drawn in part from some
section headings in a recent work on animal cognition [ Walker 1983,
189- 235].)
74 Chapter 4

The list is odd in that some of the items (like emotional responseand
curiosity) seemeither too biological or too complex for current work to
address.Yet these, I suggest, are the building blocks of humancognition. It
seemspsychologically ill -advised to seekto model, say, natural language
understandingwithout attending to such issues. Paradoxically, then, the
protocognitive capacitieswe sharewith lower animals, and not the distinctive
human achievementssuch as chess-playing or story understanding,
afford, I believe, the best opportunities for design-oriented insight into
human psychology. This is not to say, of course, that all design-oriented
investigation into higher-level skills is unnecessary
, merely that it is insufficient
. Indeed, a design-oriented approachto the higher-level achievements
may be necessaryif we are to understand the nature of the task that
evolution may chooseto perform in somemore devious way. The general
point I have been stressmgis just that understandingthe natural solution
to an information-processing task may require attending at least to the
following set of biological motivated constraints:
. High value must be placed on robustnessand real-time sensory
processing(section 2).
. The environment should not be simplified to a point where it no
longer matters. Instead, it should be exploited to augment internal
processingoperations(section3).
. The general computational form of solutions to evolutionarily recent
information-processing demands should be sensitive to a requiremen
of continuity with the solutions to more basic problems
(section4).

5 TheMethodologyof MIND
The methodology of classicalcognitivism involved a valiant attempt to
illuminate the nature of thought without attending to such constraints.
There are many motivations for such a project, some highly respectable ,
somelessso. Among the respectablemotivations we find the belief that the
spaceof possibleminds far exceedsthe spaceof biologically possibleminds
and that investigating the larger, less-constrainedspaceis a key to understanding
the specialnature of the biological spaceitself (see, e.g., Sloman
1984). Also among the respectablemotivations we find the need to work
on isolable, tractable problem domains. It is thus clearly much easier to
work on a chess playing algorithm than to try to solve a myriad of
evolutionarily basic problems (vision, spatial skills, sensorimotor control,
etc.) in an integrated, robust, and flexible fashion. The evolutionary reflections
on the nature of human thought can only count againstthe direction
of work in the classicalcognitivist tradition if there is somerealisticalterna-
Biologicalconstraints 7S

tive approach. And some cognitive scientistsdo not seeany such alternative
(but seepart 2 for somereasonsfor optimism). It is also worth noting,
as stressedat the very beginning, that what I am calling the classicalcognitivist
tradition by no meansexhauststhe kinds of work alreadybeing done
even in what I shall later view as conventional artificial intelligence (the
intended contrast is with the PDP approachinvestigated in part 2). Thus,
early work on cybernetics, more recent work on low -level visual processing
, and some work in robotics can all be seen as attempting to do
somejustice to the kinds of biological constraintsjust detailed. To takejust
one recent example, Mike Brady of Oxford recently gave a talk in which
he explained his interest in work on autonomously guided vehicles as
rooted in the peculiartask demandsfacedby suchvehiclesas robot trucks
that must maneuver and survive in a real environment (see Brady et al.
1983). These included severetesting of modulesby the real environment,
three-dimensionalranging and sensing, real-time sensoryprocessing, data
fusion and integration from various sensors , and dealing with uncertain
information. Working on autonomously guided vehicles is clearly tantamount
to working on a kind of holistic animalmicro world: suchwork is
forced to respectmany (but not all) of the constraintsthat we saw would
apply to evolved biological systems.
Classicalcognitivism tries to make a virtue out of ignoring such constraints
. It concentrateson properties internal to the individual thinker,
paying at best lip serviceto the idea of processingthat exploits the world;
it seeksneat, design-oriented, mathematicallywell understoodsolutions to
its chosen problems; and it choosesits problems by fixating on various
interesting high-level human achievementslike consciousplanning, story
understanding, languageparsing, game playing, and so forth. Call this a
MIND methodology. MIND is a slightly forced acronymmeaning: focused
on Mature (i.e., evolutionarily recent) achievements ; seeking Intemalist
solutions to information processing problems (i.e. not exploiting the
world ); aimed at Neat (elegant, well-understood) solutions; and studying
systemsfrom an ahistorical, Design-orientedperspective . The methodology
of MIND thus involves looking at present human achievements , fixating
on various intuitively striking aspectsof those achievements(e.g., planning
, grammatical competence , creativity), then treating each such high-
level aspectas a separatedomain of study in which to seekneat, internalist,
design-oriented solutions, and hoping eventually to integrate the results
into a useful understandingof human thought. This general strategy is
reflectedin the plan of AI textbooks, which will typically featureindividual
chapterson, e.g., vision, parsing, search, logic, memory, uncertainty, planning
, and learning (this is the layout of Charniak and McDermott "1985).
Our earlier reflections(sections2 to 4) already give us causeto doubt the
76 4
Chapter

long-term effectivenessof such a methodology, at least if the goal is an


understandingof humanthought.
Such reflections suggesttwo particular, related pitfalls for MIND -style
theorizing. At the risk of repetition, I should like to end this chapter by
making them as explicit as possible. The first is what I shall call the danger
of missing the mess. Human thought in evolutionarily recent tasks (like
chessplaying and probabilistic reasoning) may be very messyindeed. We
may be placedvery much in the role of a tinkerer, having to makedo with
processingpowers and strategiesdesigned for much more basic tasks. A
pure, design- oriented approachto theserecent achievementsseemsto be,
we saw, in severedanger of missing the mess. Worse still, it may be that
what makes us messy probabilistic reasoners , say, (compulsive pattern
completing, holistic information processing , etc.) is the very thing that
makesus flexible and creative in our capacity to usesuch reasoningin the
service of our changing needsand desires(on our probabilistic-reasoning
skills see, e.g., Kahnemanet al' 1982). In such cases , missing the mess
would not merely yield a psychologicallyincorrect accountof our thought,
it would blind us to the correct explanation of further aspectsof our
thought. The point about messholds" at much-lower levelstoo. With unusual
clarity one writer recently wrote: A computer scientist contemplating a
'
monkey s central nervous system (or that of a human- the differencesare
slight) could be forgiven for wondering whether it was designed by a
genius or a lunatic. . . . Only a genius could design something so effective.
Only a maniaccould designsomethingso complicated.Thesecontradictory
featurescould only co-exist in something that was not designedat all, but
"
simply evolved (Durham 1987 "
, 28).
"
The secondpitfall (call it bad focus ) is in some ways just a different
kind of perspectiveon the first. It again points to the danger of amethod-
ology that fixates on intuitively striking, recent cognitive achievements .
But the worry this time is that somesuchachievementsmay not be proper
objects of computational investigation at all. It is not easy to convey the
flavor of this possibility in advanceof some of the detailed work in part 2
but an analogy drawn from evolutionary biology may help.
An adaptationistin evolutionary biology is one who holds that natural
selectionis an optimizing process(subjectto trade-offs), that the biological
organism consists.of a set of traits (aggressiveness , brain size, leg length,
etc.) that are individually optimized (subjectto trade-offs), that the presence
of any given trait is to be explained by alluding to the biological advantages
accruingto those who exhibit it (adaptedfrom Gould and Lewontin
1978). In short, an adaptationistis someonewho seeksa direct evolutionary
explanation for everystriking feature of an organism , i.e., they seekto
explain the presence of the feature a
by telling story about the selective
advantages it confers . But not every striking feature has any such direct
Biological constraints

function. The shapeof the chin, to usea classicexample, is merely a striking


by -product, the result of an architectural relation that obtains between
anatomicalfeatur:es selectedon quite independentgrounds.
In a seminal paper Gould and Lewontin (1978) caricaturethe adapta-
tionist approachby applying such reasoning to two nonbiological cases .
The first concernsthe spandrelsof Saint Marks Cathedral in Venice. A
spandrelis a triangular spaceformed by the intersection of two rounded
arches.Spandrelsare a necessarystructuralby -product causedby mounting
a dome on a number of rounded arches. The spandrelsof Saint Marks,
however, have been put to particularly good use, as can be seenin 6gure
4.3. The spandrelsare usedto expressthe Christian themesof the dome. In
this casea man (said to representone of the biblical rivers) is seenpouring
water from a pitcher. Overall, the effect of the designs worked into the
spandrelsis so striking that we might even tempted to view the overall
structureof pillars and dome as themselvesa result of the need to have a
triangular spacefor the designs. But this, of course, would be precisely to
reversethe true order of explanation. As a resultof the decision to rest a
dome on rounded arches, spandrelscome into being as an inevitable byproduct
. Thesewere then exploited by the artist or designer.
As a second example consider the ceiling of Kings College Chapel
(Gould and Lewontin 1978, 254). The ceiling (describedby Wordsworth as
" that
branching roof self-poised, and scooped into ten thousand cells,
where light and shaderepose" ) is supported by a seriesof pillars that are
fan-vaulted at the top . Where the fan vaultings meet between pillars, a
star-shaped space is inevitably created. In Kings College Chapel these
star-shapedspaceshave been decorated with portcullises and roses (see
6gure 4.4). Once again, the architectural constraint is clearly the main
sourceof the design. Gould and Lewontin point out, " Anyone who tried to
arguethat the structureexists becausethe alternation of rose and portcullis
makes so much sensein a Tudor chapel would be inviting . . . ridicule"
(1978, 254). . ,
In the casescited, we would regard the adaptationist explanationsof
spandrelsand star-shapedspacesas bizarre. The lesson, then, is that we
must not simply acceptour intuitions about the basic and central features
of an organism as a reliable guide to its decomposition into a set of
individual traits in needof adaptationistexplanation~To do so is to commit
a fallacy of reification. In this fallacy intuitively striking but emergent
featuresof a complex object are reified, and direct explanationsof each
featureare constructed. By trying to deal with high-level cognitive features
without first attending to their basicunderpinnings, standardAI , I believe,
commits a version of this fallacy. Table 4.1 makesthe parallelexplicit. The
fact that the features listed to the right of human thought strike us as
suitableindividual objects of computationalinvestigation may be an effect
78 4
Chapter

Figure4.3
of SaintMark's. Reproduced
Oneof the spandrels from GouldandLewontin
by pennission
1978, 582
Biological constraints

Figure 4.4
The ceiling of King' s College chapel Reproducedby permission&om Gould and Lewontin
1978, 583
80 Chapter 4

of our linguistic perspective, much as the. isolation of the spandrelsis an


effect of an artistic perspective. Having a languagein which sententialformulations
pick out various kinds of mental states may mislead us into
believing that these sententialformulations pick out computationally iso-
lable achievementssuitablefor MIND -style explanation. But suchexplanation
may project properties of the sentential formulations back into the
headsof those who use sentential formulations. As we saw in chapter 3,
sentential formulations are used to do many things, none of which need
involve tracking computationally or neurophysiologicallyisolablestatesof
the head. Such back projection is as unwarranted as it is evolutionarily
implausible. (For a thoroughgoing attack on the sentential approach see
Churchland 1986.) Being may surely come to use languagewithout what
goes on in their headshaving the propertiesof the languagethey come to
use. Becausewe are so steepedin language, its apparatusmay misleadus to
conceiveof the computationaland neural substrateof thought in terms of
the categoriesthat languageuses to describethought. But such a policy
(pursued , we saw, by standardfunctionalistsand classicalcognitivists alike)
hasbegun increasinglyto look ill advised.
In the remaininghalf of this book I look at a different approach, variously
'
known asconnectionismand paralleldistributed processing. Thisapproach,
it seemsto me, goes some way toward avoiding the gross biological and
philosophicalimplausibility of much conventionalwork in AI and cognitive
science.
II
's- View
TheBrain Eye

Therich behaviourdisplayedby cognitivesystemshastheparadoxicalcharacter


hard rules,
of appearingon theonehandtightly governedby complexsystemsof
andon the other to be awash with variance
, deviation, ,
exception and a degree
andfluidity that has quiteeludedour attemptsat simulation ....
of flexibility
TheSUbsymbolic paradigmsuggests a solutionto this paradox.
" "
- Paul Smolensky, On the Proper Treatment of Connedion! sm
Chapter5
ParallelDistributedProcessing

1 PDP or Not PDPl

Oassicalcognitivism was indeed the pure scienceof the mind. Or perhaps


'
it was the pure scienceof the mind s own idea of the mind. However you
see it , it would be hard to exaggerateits deliberate intellectual distance
from the messy substrataof biological fact. On the structure of the brain
and the phylogeny of cognitive processes, the classicalcognitivist maintained
a studied indifference. But perhapscognitive sciencecannot afford
that indifference.
.
Paralleldistributed processing(or connectionism) is an attempt to provide
slightly more biologically realistic models of mind. Such models,
though hardly accuratebiologically, are at least inspiredby the structureof
the brain. Moreover, they are tailored, in a sense to be explained, to
evolutionarily basic problem-solving needs, like perceptualpattern completion
. These models, I shall argue, offer the best current prospect for
soothing the philosophicaland biological sore spots inflamed (I hope) by
the Arst half of this book.
As counterpoint to the enthusiasm , a word of warning. There is a certain
danger in the extreme polarization cognitive sciencewhich this
of treatment
" POP or
may" seem to imply . The danger is summed up in the slogan
not POP7 This, I hope to show, is not the question. POP is not a magic
wand.I And a connectionisttouch will not turn a Von Neumannfrog into
a parallel distributed princess. Both our deeperexplanatory understanding
of cognition and on some occasionsour actual processingstrategiesmay
well demandthe useof higher level, symbolic, neoclassicaldescriptions. For
all that, connectionistmodels offer insights into the way nature may provide
for certain properties that seem to be quite essentialto what we
consider intelligent thought. Walking this tightrope between the cognitivist
and connectionist camps is a major task of the next five chapters.
First though, we had better get someidea of what a POPapproachactually
involves.
84 Chapter 5

2 TheSpacebetween theNotes
'
The musicians talent, it is sometimessaid, lies not in playing the notes but
in spacingthem. It is the silencesthat makesthe great musiciangreat. As it
is with music, so it is with connectionism. The power of a connectionist
system lies not in the individual units (which are very simple processors )
but in the subtly crafted connedions between them. In this sensesuch
models may be said to be examplesof a brain' s eye view. For it has long
been known that the brain is composedof many units (neurons) linked in
parallelby a vast and intricate massof jundions (synapses ). Somehowthis
mixture of relatively simple units and ~omplex interconnectionsresults in
the most powerful computing machinesnow known, biological organisms.
Work in paralleldistributed processingmay be said to be neurally inspired
in the limited sensethat it too deploys' simple processorslinked in parallel
in intricate ways. Beyond that the differencesare significant. Neurons and
synapsesare of many different types, with properties and complexitiesof
interconnectionso far untouchedin connectionistwork. The POP " neuron"
is a vast simplification. Indeed, it is often unclearwhether a single POPunit
correspondsin any useful way to a single neuron. It may often correspond
to the summedactivity of a group of neurons. Despite all the differences ,
however, it remainstrue that connectionistwork is closer to neurophysio-
logical structure than are other styles of computational modeling (see
Durham 1987; McClelland, Rumelhart, and the POP ResearchGroup 1986,
vol. 2, chapters20- 23).
Neurally inspired theorizing has an interesting past. In one senseit is a
descendantof gestalt theory in psychology (see Kohler 1929; Baddeley
1987). In anothermore obvious senseit follows the path of cybernetics, the
study of self-regulating systems. Within cybernetics the most obvious
antecedentsof connedionist work are McCulloch and Pitts 1943; Hebb
1949; and Rosenblatt 1962. McCulloch and Pitts demonstratedthat an
idealizednet of neuronlike elementswith excitory and inhibitory linkages
could compute the logical functions and, or, and not. Standardresults in
logic show that this is sufficient to model any logical expression. Hebb
went on to suggestthat simpleconnectionistnetworks canact asapattern-
associatingmemory and that such networks can teach themselveshow to
weight the linkagesbetween units so as' to take an input pattern and give
a desiredoutput pattern. Roughly, Hebb s learning rule was just that if two
units are simultaneouslyexcited, increasethe strength of the connection
between them (see McClelland, Rumelhart, and the POP ResearchGroup
1986, vol. 1, p. 36, for a brief discussion). This simple rule (combinedwith
an obvious inhibitory variant) is not, however, as powerful as those used
'
by modem day connectionists. Moreover, Hebb s rules were not sufficiently
rigorously expressedto use in working models.
Paralleldistributed processing 8S

'
This deficiency was remedled by Rosenblatts work on the so-called
perceptron. A perceptron is a small network of input '
units connectedvia
somemediating units to an output unit. Rosenblatts work was especially
important in three ways, two of them good, one disastrous. The two good
things were the use of precise, formal mathematicalanalysisof the power
of the networks and the use of digital-computer simulationsof such networks
(see McClelland, Rumelhart, and the POP ResearchGroup 1986,
vol. 1, p. 154- 156). The disastrousthing was that someoverambitiousand
politically ill advised rhetoric polarized the AI community. The rhetoric
elevatedthe humble perceptron to the sole and sufficientmeansof creating
real thought in a computer. Only by simulating perceptrons, Rosenblatt
thought, could a machinemodel the depth and originality of humanthought.
'
This claimand the generalevangelismof Rosenblatts approachprompted
a backlashfrom Minsky and Papert. Their work Perceptrons (1969) was
received by the alienated AI community as a decisive debunking of the
usurping perceptrons. With the rigorous mathematicalanalysis of linear
threshold functions Minsky and Papert showed that the combinatorial
explosion of the amount of time neededto learn to solve certain problems
underminedthe practical capacity of perceptronlikenetworks to undergo
suchlearning. And they further showed that for someproblemsno simple
perceptron approachcould generatea solution. Rather than taking these
resultsas simply showing the limits of one type of connectionistapproach,
'
Minsky and Paperts work (which was as rhetorically excessiveas Rosen-
'
blatt s own) was seen as effectively burying connectionism. It would be
someyearsbefore its public resurrection.
But the miracle happened. A recent three-page advertisement in a
leading sciencejournal extols the slug as savant, claiming that the parallel
neural networks of the slug suggest powerful new kinds of computer
design. The designs the advertisershave in mind are quite clearly based
on the work of a recent wave of connectionists who found ways to
overcomemany of the problems and limitations of the linear-thresholded
architecturesof perceptrons. Landmarksin the rise of connectionisminclude
Hinton and Anderson 1981; McClelland and Rumelhart 1985a; and
McClelland, Rumelhart, and the POP ResearchGroup 1986. Other big
namesin the field includeJ. Feldman , D. Ballard
, P. Smolensky , T. Sejnowski,
and D. Zipser. It would be foolhardy to attempt a thorough survey of this
extensive and growing literature here. Instead, I shall try to convey the
flavor of the approaches by focusing on a few examples. Thesehave been
chosento display assimply aspossiblesomebasicstrategiesand properties
common to a large classof connectionistmodels. The precisealgorithmic
form of suchmodelsvaries extensively. The emergent propertiesassociated
with the general class of models bear the philosophical and biological
weight. This is reflectedin the discussionthat follows.
86 ChapterS

Figure5.1
A localhardwirednetwork
. FromMcClelland
, Rumelhart
, andHinton 1986, 28

3 The Jetsand the Sharks

Following an example developed in McClelland , Rumelhart , and Hinton


1986,2 let us imagine two New York street gangs , the Jets and the Sharks.
Some of the facts about them are presented in table 5.1. One way to
encode and store this kind of information is with a local hard -wired net -
~ ork, which may be represented as in figure 5.1. The following conventions
are adopted :
. Irregular clouds
signify the existence of mutual inhibitory links
between all the units within a cloud . Thus , figure 5.2 is composed of
three units , one signifying that the individual concerned is in his
twenties , another signifying that an individual is in his thirties , and so
on. Since no one can be both in his twenties and in his thirties or
forties , the units are set up to be mutually inhibitory . If one fires it will
dampen the other two .
. Lines with arrow heads
represent excitatory links . If the line has an
arrow at each end, the link is mutually excitatory . Thus , suppose all
burglars are in their thirties . There would be an excitatory link be-
Paralleldistributed processing
Chapter 5

Figure5.2
Inhibitorylinks

tween each burglar unit and the thirties unit. If, in addition, only
burglars are in their thirties, the thirties unit would be excitatorily
linked to the units representingburglars.
. Solid black spots signify individuals and are connectedby excitatory
links to the properties that the individual has, e.g ., one suchunit
is linked to units representingLance, twenties, burglar, single, Jet, and
junior-high-school education.
By storing the data in this way, the system is able to buy, at very little
computational cost, the following useful properties: content addressable
memory, graceful degradation, default assignment, and generalization. I
shall discusseachof thesein turn.

Contentaddressable memory
Consider the information that the network encodesabout Rick. Rick is a
divorced, high-school educatedburglar in his thirties. In a more conventional
approachthis informationwould be storedat one or severaladdress es,
with retrieval dependentupon knowing the address. But a designermay
want to make all this information accessibleby any reasonableroute. For
example, you may know only that you want data on a Sharkin his thirties,
or you may have a description that is adequateto identify a unique individual
but neverthelesscontainssomeerrors. Suchflexible (and in this case
error-tolerant) accessto stored information is known as content addressable
'
memory. Humanscertainly have it. To borrow McClellandand Rumelharts
"
lovely example, we caneasily find the item that satisfiesthe description: is
an actor, is intelligent, is a politician," despite the meagre and perhaps
partially falsedescription. Flexible, error-tolerant accessrequiressomecom-
putational acrobaticsin a conventional system. In the absence of errors, a
techniquecalledhash coding is quite efficient (seeKnuth 1973). The error-
tolerant case, however, requires an expensivebest-match search. Storing
the information in a network of the kind just describedis a very natural,
fast, and relatively cheapway of achieving the sameresult.
Paralleldistributed processing 89

Figure5.3
. &mburst=
Thepatternof activationfor a Sharkin histhirties.Hatching= inputactivations
units to which activationspreads. The diagramis basedon McClelland , and
, Rumelhart
Hinton 1986, p. 28, fig. 11.

It is easyto seehow this works. Supposeyou want to know who satisfies


" "
the description is a Shark in his thirties. The thirties and Sharkunits are
activatedand passpositive valuesto the units to which they haveexcitatory
links. There is a chain of spreadingactivation in which first the individual-
signifying unit, then those other units to which it is excitatorily linked get
activated. The result is a pattern of activation involving the units for Shark,
thirties, burglar, divorced, a high-school education, and Rick. The processis
shown infigureS .3. The important point is just this: the samefinal pattern
of activation (i.e., the overall pattern of units active after the spreadof activation
) could have been achievedby giving" the systemanyone of a number "
of partial descriptions, e.g ., the inputs Shark, high-school education,
" "
Rick, thirties, and so on. Simply by using a network representation
of the data, we obtain a flexible, content-addressablememory store.

Gracefuldegradation
, aswe sawin chapter4, comesin two, relatedvarieties.
Gracefuldegradation
The first demandsthat a system be capableof sustainingsome hardware
90 Chapter 5

damagewithout being totally incapacitated. The second demandsthat a


systembe capableof behaving sensibly on the basisof data that is partial
or includes errors. The POP mode of data storage and retrieval supports
both thesequalities. The capacityto tolerate somehardwaredamageis best
seen in more distributed networks of the kind detailed in subsequent
examples. Sensiblebehavior despite data that is partial or includes errors
can be demonstratedin a local net. We have already seenhow partial data
can prompt a full pattern of activation. The extensionto data that includes
errors is easy.
Supposewe want to retrieve the nameof an individual whom we believe
to be a Jet, a bookie, married, and educatedat the junior -high-school level.
In fact, no one in our model satisfiesthat desaiption. The best fit is Sam,
who is a bookie, a Jet, and married but has a college education. The
network can cope, thanksto the inhibitorylinks. The systemworks like this.
The units for bookie, married, Jet, and (the mistake) a junior-high-school
education are activated. The units for bookie and married directly excite
only one of the units that specify individuals. (I have labeled it in figure
5.4.) The Jet unit excites the individual-signifying units labeled A , 5, Ra,
and L. (Only Rick, whose individual-signifying unit is labeledri, is a Shark.)
The junior-high-school-educationunit excitesL, Ra and A . In sum:
The bookie unit excites5.
The married unit excites5.
The Jet unit excitesA , 5, Ra, L.
The J.H.-educationunit excitesL, Ra, A .
Thus, the 5 unit is stimulatedby three times, and the L, and Ra, and A units
twice. But the various individual-representing units are themselves connected
in a mutually inhibitory fashion, so the strong, threefold activation
of the 5 unit will tend to inhibit the weaker, twofold activation of the A , L,
and Ra units. And when the activation spreadsoutwards from the individual
units, the 5 unit will passon the most significant excitatory value.
The 5 unit is excitatorily linked to the name unit Sam. And the various
nameunits too, are competitively connectedvia mutually inhibitory links.
" " '
Thus Sam will turn out to be the network s chosen completion of the
" " " " " " "
error-involving description ~eginning Jet, bookie, married, junior-
"
high-schooleducation. A sensiblechoice. The spreadof activation responsible
is shown in Figure5.4.

Defaultassignment '
Supposethat you don t know that Lancewas a burglar. But you do know
that most of the junior-high-school-educated Jets in their twenties are
burglarsrather than bookies or pushers(seethe data table in figure 5.2). It
Paralleldistributed processing 91

Figure5.4
The patternof activationfor a Jet bookie with a junior-high-schooleducation . A unit
labeledwith a gang member ' s initial standsfor that individual. The input patternsare
markedwith hatching. The stronglyactivatedindividualunit is markedwith an x, andthe
nameunit it excitesis markedwith a sunburst . The diagramis basedon McClelland ,
Rumelhart, andHinton 1986, p. 28, fig. 11.

is reasonableto assumethat Lancetoo is a burglar, at least until we learn


otherwise. This kind of assumption is called a default assignment. It is
generally good practiceto assumethat patterns found in known data will
extend to cover new cases . The network under considerationis able to
assign default values in this way. Unfortunately, it would be too messyto
attempt diagrammaticrepresentationof this process. But the basicstory
a
is not hard to grasp. Supposethat we do not know that Lanceis a burglar.
Still, when we activate the nameunit Lanceit will activate the units for all
'
of Lances known properties(i.e., Jet, junior-high-schooleducation , married,
in his twenties). Theseproperty units in turn will excite the units for others
'
who have theseproperties. If most of those who shareLances knownproperties
also sharea particular further property (i.e., if there is a real pattern
here) then the spreadof activation from theseunits will combineto activate
for Lancethe unit representingthe further property in question. In this way
the burglar unit gets activatedas a kind of default assignmentto Lance.
92 ChapterS
Flexiblegeneralization
The property of flexible generalization is closely related to those considered
above. Indeed, in a number of respectswe may look upon all the
propertiestreated in this exampleas involving different high-level descriptions
and uses of the sameunderlying computational strategy of pattern
completion. In this final case, the pattern-completing talent of the systemis
used to generatea typical set of properties associatedwith some description
, even though all the systemdirectly knows about are individuals, none
of whom need be a perfedly typical instantiation of the desaiption in
question. Thus, supposewe seeka sketchof the typical Jet. It turns out, as
we saw in the discussion of default assignments , that there are indeed
patternsin Jet membership, although no individual Jet is a perfect example
of all these patterns at once. Thus the majority of Jets are single, in their
twenties and educatedto the junior -high-school level. No significant patterns
exist to specify particular completions of a pattern beginning "Jet"
along the other "dimensions
"
that the system knows about. Thus, if the
system is given Jet as input, the units for single, twenties, and junior -
high-school education will show significant activity , while the rest will
mutually Cancelout. In this way, the system effectively generalizesto the
nature of a typical Jet, although no individual Jet in fact possess es all three
properties simultaneously.
Perhapswhat is most significant here is not the capacity to generalize
per se so much as the flexibility of the generalizing capacity itself. As
McClellandand Rumelhartpoint out, a conventionalsystemmight explicitly
create and store various generalizations. One striking feature of the POP
version is its capacity to generalizein a very flexible way with no need
for any explicit storageor prior decisionsconcerningthe form of required
generalizations. The network can give you a typical completion of any
pattern you careto nameif there is somepattern in the data. Thus, instead
of asking for details of a typical Jet, we could have askedfor details of a
typical person in his twenties educatedat the junior -high-school level or a
'
typical married pusher, and so on. The network s generalizationcapacityis
thus flexible enough to deploy available data in novel and unpredided
'
ways, ways we neednt have thought of in advance. As our accountpro-
gresses , this kind of unforced flexibility will be seento constitute a major
advantageof POP knowledge representation.

4 EmergentSchemata
In chapter2 (especiallysections2 and 3) we ,spokeof scriptsand schemata .
These are special data structures that encode the stereotypic items or
'
eventsassociatedwith somedescription. For example, Schankand Abelsons
restaurantscript held data on the course of events involved in a typical
Paralleldistributed processing 93

visit to a restaurant. The motivation for introducing scripts and schemata


was simple. Human reasoninginvolves Alling in numerousdefault valuesin
order to understand what we see or are told. The schemais meant to
capturethis background knowledge
" restaurant
and (
" " by associating "
it with a particular
description g , e. ., visit , birthday party, etc.) enable us to
deploy the right subset of our total background knowledge at the right
time. But as we saw, the traditional schema-basedapproachwas beset by
difficulties. Individual scripts or schemataproved too structured and inflexible
to be able to cope with all the variants of a situation (e.g., novel
situations, mixtures of stereotypic scenarios , etc.), yet humansare sensible
enough to take such variants in stride. The obvious way of overcoming this
problem is to accept a massive proliferation of schemata . But the computa-
tional expensewould be prohibitive. As Dreyfus pointed out, there is no
obvious end to the multiplication of explicit schematarequired.
McClelland, Rumelhart, and the POP ResearchGroup (1986, vol. 2,
p 20- 38) detail a POP model in which the properties of explicit stored
.
schemataemerge simply from the activity of a network of units that
respond to the presenceor absenceof microfeaturesof the schematain
question. These emergent schemataare presentedas a partial solution to
the dilemma facing classicalapproaches that depend on explicit stored
schemata , the dilemma that schemataare the structure of the mind on the
one hand and schematamust be sufficiently malleableto fit around most
everything on the other (McClelland, Rumelhart, and the POP Research
Group 1986, vol. 2, p. 20). At an abstractlevel (we shall get more concrete
in a moment) McClelland, Rumelhartet al. characterizethe kind of malleability
required as involving the capacity of a system to adjust the default
values of each item in a scenarioin a way that is sensitiveto all the data
"
known in a current situation. Thus, it might be reasonableto assign some
" " "
meat as a default value for the slot contents of refrigerator. But if you
are also told that the refrigerator belongs to a vegetarian, this default
assignmentshould change. This kind of flexibility has to be explicitly
provided for in any conventional schema-based approach, at a cost of
significantinformation-processingcomplexity.
The POP model gets around thesedifficulties in a natural and unforced
way by not having explicit schematarepresentedat all. As the POP
" ' '
ResearchGroup put it: Schemataare not things (in this model). There is
no representationalobject which is a schema. Rather, schemataemergeat
the moment they areneededfrom the interaction of large numbersof much
"
simpler elementsall working in concert with one another (McClelland,
Rumelhart, and the POP ResearchGroup 1986, vol. 2, p. 20). The key to
the system, as they note, is the notion of subpattemsof units that tend to
come on together, becauseof excitatory links. The tendency can be overwhelmed
by sufficient incompatible excitation and inhibition from else-
94 Chapter 5

where. But insofar as it exists, it acts like a stored schema.Yet there is no


needto decidein advanceon a set of schematato store. Instead, the system
learns (or is told ) patterns of cooccurrenceof individual items making up
the schema,and the rest (aswe shall now see) comesin the backdoor when
required.
The parHcularmodel detailed by McClelland, Rumelhart, et al. concerns
our understandingof a typical room, e.g ., a typical kitchen or a typical
office.3 The generalinterest of the model for us is twofold . First, it moves
us in the direction of more distributedrepresentaHons. Second, it shows
how high-level symboliccharacterizations (e.g., our ideaof a typical kitchen)
canbe among the emergentpropertiesof a network of simplerenHHesand
how such high-level descripHons ("believes that . . . " ) may be perfectly
co" ect without indicaHng the underlying computaHonal structure of the
systemso described. This recallsour accountof the emergenceof spandrels
Horn the interacHon of other, more basic, architecturalfeatures, and it will
prove important later on.
Room-dwelling human beings have ideas about the likely contents of
parHculartypes of rooms. If you are told to imaginesomeonein the kitchen
standingby the cooker, you may well fill in someother detailsof the room.
If you did, there would likely be a refrigerator and a sink, some wall cupboards
, and so on. How does this happen? One answer would be to
" " " "
propose a mental file marked kitchen and detailing all the expected
items. But that approachbrings in its wake all the difficulHesraisedearlier.
The answer we are being asked to consider goes like this. You were
exposed to the contents of lots of rooms. You saw objects clustering on
theseoccasions. Generally, when you were in a room with a cooker, you
were in a room with a sink and not in a room with a bed. So supposeyou
had something like a set of PDP units responding to the presenceof par-
Hcularhouseholditems (this is clearly an oversimplificaHon, but it will do
for current purposes). And you fixed it so that units that tended to be on
together got linked by an excitatory connecHon and units that tended to
go on and off independently got linked by an inhibitory connecHon. You
would get to a point where activaHng the unit for one prominent kitchen
feature(e.g ., the cooker) would activate in a kind of chain all and only the
units standing for items commonly found in kitchens. Here, then, we have
an emergentschemain its grossestform. Turn on the oven unit and after a
while you get (in the particular simulaHon done by McClelland, Rumelhart
et aI.) oven, ceiling, walls, window , telephone, clock, coffee cup, drapes,
stove, sink, refrigerator, toaster, cupboard, and coffeepot. So far, all is
'
convenHonal. But let s look more closely at some of the properties of this
mode of represenHng the data. The first interesHng property is the distributed
'
nature of the systems representaHon of a kitchen. The concept of a
kitchen here involves a pattern of acHvaHon over many units that. may
Parallel
distributed
processing 9S
stand for (or respond to) more basic items treated in our daily language,
or even for items not visible to daily talk at all. Such featuresmight be
functional or geometrical properties of objects. Whether or not the low -
level features(known as microfeatures ) are visible to daily talk, the strategy
of building functional correlatesof high-level conceptsout of suchminute
parts in a PDP framework brings definite advantages.The main one, which
I shallreservefor treatmentin the next chapter, is the capacityto represent
fine shadesof meaning. A spin-off is the capacity of the gross, high-level
concept or ability to degradegracefully in the first way mentioned in the
previous example. That is, it turns out to be robust enough to withstand
some damage to the system in which it is distributively encoded. Thus,
supposethe coffeepot unit or its links to other units got destroyed. The
system would still have a functional kitchen schema , albeit one lacking a
coffeepot default.
The distinction betweena local and a distributedrepresentationalsystem
is somewhatperspectival.4 Distribution is in the eye of the beholder or at
best in the functionalrequirementsof the systemitself. Thus, in the example
described , we have a distributed (hence , gracefully degradeable ) representation
of kitchensbut a local and quite gracelesslydegradablerepresentation
of coffeepots. Of course, we could in turn have a distributed representation
of coffeepots (e.g ., as a pattern of activation across a set of units
standing for physical and functional featuresof coffeepots). But there will
always be somelocal representationat the bottom line; even if it lies well
below the level of any featureswe might fix on in daily talk. Conversely,
even an intuitively local network like the Jetsand the Sharksnet described
in section 3 can be seen as a distributed representationof some slightly
artificial concept such as Jet-Shark membership. The second and, I think,
more interesting property of the room network is the multiplicity and
flexibility of the emergent schematait supports. McClelland, Rumelhart,
et al. ran a simulation involving forty namesof householdfeatures. After
fixing connectivity strengthsaccordingto a rough survey of human opinions
, they found that the network stored five basic schemata , one for each
type of room they had elicited opinions on. (They were: kitchen, office,
bedroom , bathroom, and living room.) The sensein which it stored five
basicpatternswas that if you set just onedescriptionunit on (e.g., the oven
or bed unit) the system would always settle on one of five patterns of
activation. But the system proved much more flexible than this state of
affairsinitially suggests. For many other final patternsof activation proved
possible if more than one description was turned on. In fact, there were
240 possible states into which it might settle, one for each vertex of a
40-dimensionalhypercube. It is just that someof thesepoints are easierto
"
reachthan others. It will thus find a " sensible pattern completion (subject
96 Chapter 5

to how sensibleits stored knowledge is) even for the input pattern "bed,
bath, refrigerator." To show this in action, McClelland, Rumelhart, et al.
"
describea casewhere bed, sofa" is given as the input pattern. The system
then setsabout trying to find a pattern of activation for the rest of its units
that respects, as far as possible, all the various constraintsa$sodated with
the presenceof a bed and a sofa simultaneously. What we want' is a set of
values that allows the final active schemato be sensitive to the effectsof
the presenceof a sofa on our idea of typical bedroom. In effect, we are
asking for an unantidpated schema : a typical bedroom-with -a-sofa-in-it.
'
The systems attempt is shown in figure 5.5. It ends up having chosen
" "
large as the" size description (whereasin its basic bedroom scenariothe
size is set at medium" ) and having added " easy-chair." " floor lamp," and
" "
fireplace to the pattern. This seemslike a sound choice. It has added a
subpattern strongly associatedwith sofa to its bedroom schemaand has
adjustedthe size of the room accordingly.
Here, then, we have a concrete example of the kind of flexible and
sensibledeployment of cognitive resourcesthat characterizesnatural intelligence
. Emergentschemataobviate the need to decide in advancewhat
possiblesituations the system will need to cope with , and they allow it to
adjust its default values in a way maximally consistent with the range of
' '
possibleinputs. Justthis kind of natural flexibility and infonnationalholism
constitutes, I believe, a main qualitative advantageof POPapproaches over
their conventionalcousins.

5 DistributedMemory

My final example concernsthe processof learning and remembering. It


follows work originally presentedin McClelland and Rumelhart 1985 and
modified in volume 2 of McClelland, Rumelhart, et al. The goal of the work
is to generatea model of memory in which the storageof tracesof specific
experiencesgives rise in a very natural way to a general, nonspecific
understandingof the nature of the domain in question. Thus, for example,
storage of tracesof specificexperiencesof seeingdogs will give rise to a
general, prototypical idea of doghood. This neatly sidestepsa recurrent
problem in modeling memory, namely, the choice between representing
specificand general information. In terms of its behavior, the model looks
as if it explicitly generatesand stores prototypes (of e.g ., the typical dog).
But as in the schemaexample, there are no such explicit, stored items.
Instead, the prototype-basedunderstandingis an emergentproperty of the
'
systems way of dealing with specificexperiences . The model sharesmany
of the featuresof our two previous examples, but it enablesus to extend
our discussionto include:
I~,~~0w
Paralleldistributed processing 97

.
. . . . ~v@n
. . .

. . .

- - - - - - - - - - - - - - -
-;-
. . . .

0000000000000000000
..000000000000
0000000000000000000
0000000000000
000000000000
000000000000
.0000000000000000000
.....
.aDa ..aa D
.Daa
.D.DaD .D
.D.D~
..Daa .aaOOD
..000000000000000000
.DaD .D.D.D.~D.
DDDDDDDDDDDDDDDDDDDD
. . . . . . . .

..CCCCCCCCCDDD
CCCCCCCCCCDD
CCCCCCCCCOOC
O DO
ODD
OO
OO
DDoooooaca
FigureS.S
Theoutputof theroomnetworkwith bed , sofa,andailing initially clamped. Theresultmay
be desaibedasa large, fancybedroom . White boxesindicateactiveunits. The verticalsets
of suchboxes, reading&om left to right, indicatethe successive statesof activationof the
network. The startingstate(far left) hasthe bed, sofa, andceilingunitsactive. Thefigur~ is
&om Rumelhart , Smolensky, et aI., 1986, 34.
J
98 ChapterS
. the use of learning rules in POP,
. the economy of POP " "
storage(so-called superpositionalstorage ),
. the POP capacity to mimic the
explicit storageof rules, prototypes,
and so on,
. the relation of POP to
experimentalpsychologicaldata, and
. the limits of networks without hidden units.
The model that McClelland and Rumelhart propose is a fairly standard
POP network of the kind discussedabove. The network is exposedto successive
setsof inputs given in terms of a fixed set of representationalprimitives
, i.e., a fixed set of featuresto which (on this interpretation) its units (or
sets of units) are seenas sensitive. Suchfeaturesmay include visual ones,
like color or size, and nonvisual ones, like names. As a model of memory
the task of the system is just this: given someinput with featuresh , . . . , f,.
(say, h , . . . , ho for definiteness ), the system needsto store the input in a
way that enablesit later to re-createthe input pattern from a fragment of it
acting as a cue. Thus, if the systemis given valuesfor h , . . . , 14' we want it
to fill in Is, . . . , ho with values somehow appropriate to its earlier experience
in which h , . . . , h 0 were active. A simple learning rule, called the
delta rule, suffices to produce this kind of behavior. The delta rule is
explainedformally in McClelland, Rumelhart, et al. 1986, vol. 1, chapters2,
8, 11, and 2, chapter 17. Informally, it works like this. Getting a systemto
re-createan earlier activation patternh , . . . , h 0 when given the fragmenth ,
. . . , 14amountsto requiring that the internalconnedions betweenthe units
in the net be fixed so that activation of the fragment h , . . . , 14 causes
activation of the rest of the patternh , . . . , ho . So we needstrong excitatory
links between h , . . . , 14 and the units Is , . . . , ho . After the system has
receiveda teachinginput of h , . . . , h 0 the delta rule simply gets the system
to check whether the internal connedions between the units that were
active would support sucha re-creation. If they would not - as is generally
the case- it modifies the overall pattern of connedivity in the required
diredion . The delta rule is strong enough to guarantee that subject to
certain caveats (more on which later) a system following it will learn
to re-create the pattern over its units from a suitable fragment of that
pattern.
To get a better idea of how this looks in action, let's considerMcClelland
'
and Rumelharts exampleof learning about dogs. The exampleis in many
ways similar to previous ones, but it will help bring out the main points
nevertheless . First, circumscribethe domain by fixing on a prototypical
dog. Take a picture of this, and describe it in terms of a fixed set of
representationalprimitives, say, sixteen features. Next , create a set of
specificdog descriptionsnone of which quite matchesthe prototype . These
Paralleldistributed processing 99

are obtained by varying anyone of the featuresof the prototypical dog at


random. Now give eachindiVidualdog a name. For eachdog codeits name
as a pattern of activation acrosseight units. Give the network a seriesof
experiencesof individual dogs by activating the units that correspondto
the dog description and the dog names. After each such exposure, allow
the systemto deploy the delta rule to lay down a memory tracein the form
of a pattern of altered connedivity and to facilitate recall of the last dog
description.
After SOsuch runs the system had never been exposed to the prototypical
dog but only to these distorted instances. The system was then
given a fragment of the prototype as input and was able to complete the
prototypical pattern. No nameunits becameactive, asthese, being different
for eachindividual dog, tended to cancelout. Indeed, the network had, in
effect, extracted the pattern common to all the slightly distorted inputs,
producing a general idea of the prototypical member of the set of which
'
the inputs were instances. Plato scholarswill envy the systems ability to
seethe true form of doghood on the basisof these distorted shadowson
the wall of the cave.
The systemcan do more than extract the prototype from the examples.
It can also re-createthe pattern of activation for a specificdog if it has had
a number of exposures to the dog in question and it is given as its
recall-prompting input a disambiguatingcue, something distinctive about
that dog: the nameof the dog or somedistinctive physicalfeature.
Such, in brief, is the model. dearly , it doesquite well in its main goal of
exhibiting prototype knowledge in the absenceof any explicit prototype
generation and storage procedure. The way of encoding and retrieving
-
speci6c information results in a functional correlate of prototype based
reasoning. I gave details of a similar phenomenonin section4 on emergent
schemata . This is, in fad , a rather general property of POP approaches.
They exhibit behavior that, takenat facevalue, strongly suggestsa reliance
on somespecialmechanismaimedat the generationand storageof explicit
hypothesesconcerning the central structuresof a domain. But in fact, no
specialmechanismis requiredand the hypothesesare not explicitly stored,
at leastnot in any normal sense.In a somewhatparallelcase, someclassical
theorizing about human language acquisition suggest that our linguistic
-
competencearisesas the end result of a three stageprocess.
(1) We are exposedto a number of utterances.
(2) Perhapsusing some innate grammar or just a powerful learning
strategy, we seekto formulate at an unconsciouslevel the linguistic
rules to accountfor the structureof the utterances~
(3) We store such rules and deploy them to understand new
utterances.
100 ChapterS
In a POP model the storageand retrieval strategy targeted on the specific
utteranceswill yield, in much the sameway as describedabove, behavior
that looks as if it dependson the formulation and deployment of specific
linguistic rules (see, e.g ., McClelland, Rumelhart, and the POP Research
Group 1986, vol. 2, chap. 18). But there is no specialmechanismrequired
to seektheserules and no needto store them explicitly in advanceof some
occasionof deployment. It is perhapsmisleading to say that the network
does not in some senselearn and deploy the rules. For it becomesstructured
in a way that makesit yield outputs that tend to conform to the rule
in a nicely flexible manner. Insofar as rules can everbe stored inside a head,
or a mechanism , this seemsto me to amount to a version of such storage.
What is interesting, however, is that such rules depend on no special
mechanismof rule generationand storageand are representedin a manner
that makes them extremely flexible and sensitive to contextual' nuances
(more on this in chapter6). In only this (very important) sense , I believe, do
"
distributed models . . . provide alternatives to a variety of models that
postulate abstractsummaryrepresentationssuch as " prototypes
, . . . semantic
memory representations , or even linguistic rules Mcdelland, Rumelhart,
et ale1986, vol. 2, p. 267.
It remainsonly to mention two further featuresof the current example
and to offer somecommentson its limitations. The two further featuresare
superpositional
" storage and a capacity to model fine-grained experimental
data. By superpositionalstorage" I meanthe property that one network of
units and connectionsmay be usedto store a numberof representations , so
" "
long as they are sufficiently distinct (the term often used is orthogonal )
to coexist without confusion. Thus one network can be trained to exhibit
behavior appropriate to knowledge of a number of distinct prototypes,
suchas dog, cat, and bagel (McClelland, Rumelhart, and the POP Research
Group 1986, vol. 2, p. 185). This is becausethe delta rule can find a set of
weights that allows it to complete to quite different pattems accordingto
whether it is given a nonambiguouscue for a dog, cat, or bagel. Interestingly
, if it is given an input that is indetenninatebetween say, a cat and a
dog, it will complete to a blended overall pattern, as if it had an idea not
just of dogs and cats but also of somethinghalfway betweendogs and cats
(this property will loom large in later discussions ). Indeed, it doeshave such
an idea, insofar as its prototypes come into being only in responseto particular
callsand so function in a maximally flexible way.
The other remaining feature- the capacity to model fine-grained experiment
data- constitutes a major attraction of the approach. By fine-
grained experimental data I mean data on the way performancein sup-
posedly central casesis affected by other factors, including context and
breakdowns. One of the striking things about some POP models is that
they have endedup, often unintentionally, producing fine-grainedbehavior
Paralleldistributed processing 101

of the type found in human performance. This natural modeling of fine-


grained data may seem to indicate what Lakatoscalled a " progressiveresearch
"
programme, i.e., one whose models succeednot just in accounting
for the data they set out to explain, but also in predicting or explaining
other, perhapspreviously undiscovereddata as a consequence(seeLakatos
1974). There are various examplesof this in the POP literature. In discussing
the caseof distributed memory McClelland and Rumelhartmention
a number of such points. These include amnesicsyndromes, interference
phenomena,and blending errors.
Someamnesics , it seems,canlearn things by very denserepetition of the
appropriate experiences . And they prove better at learning general ideas
than at rememberingspecificexperiences . Both these traits are neatly explicable
in terms of the distributed model. Regardingthe first, we needonly
conjecturethat the damageinvolves a reduction in the extentto which each
individual experiencecan effect a changein the connectivity weightings of
the network. It follows that amnesicsso afflicted can learn, but that they
need many more exposuresto do so. The secondtrait naturally follows.
Since specific experiencesmake little impact on the system, it will tend
to learn only what is common to a vast number of suchexperiences . Thus,
it will learn general tendenciesin preferenceto any specific individual
instances.
Interferencephenomenaare simple. If a network is trying to recall a
pattern very like another one it has been exposed to , it may suffer from
interferenceor cross talk. This occurs when units primed to figure in the
restoration of one pattern are mobilized by accidentin responseto a cue
for a similar pattern. At first cross talk was seen as a problem for POP
approaches that use superpositionalstorage. However, it turns out that
many of the error patterns that POP systemsare prone to as a result of
such interferenceare very similar to the error patterns of human subjects
(see, e.g ., the word -recognition and skilled-typing errors dealt with in
McClelland, Rumelhart, and the POP ResearchGroup 1986, vol. 2, p. 139,
and vol. I , p. 14.). Suchcrosstalk is familiar enough. If you want to recall a
telephonenumberand you know anothernumbervery like it , your chances
of making a mistake greatly increase. In classicalmodels in which the
number is simply retrieved from an addressassociatedwith a name, there
is no reasonat all for such a pathology. On a POP model the pathology
falls naturally out of the mode of storageand recall.
Finally, there is the strange phenomenon of blending errors. These
involve our making a blend of two memories into a composite whole.
Thus, somesubjects, when shown a film involving a yellow truck and then
given a repetition of the film but with a truck painted blue, later " recall" a
"
green truck. This blending is explained as just the beginnings of the
formation of a summary representation "
, i.e., the start of the processof
102 ChapterS

fixing on the common tendency in a set of input experiences(McClelland,


Rumelhart, and the POP ResearchGroup 1986, vol. 2, p. 208). This, incidentally
'
, gives the lie to the claim that POP models dont useS\ Hnmary
representationsat all (recall the passageHorn vol. 2, p. 207 quoted above).
McClelland, Rumelhart, et alecannot have it both ways. The correct response
is to say that these models do form summaryrepresentations , but
that they are summaryrepresentationsof a special, flexible sort.
Let me add a word about the limitations of the modelsconsideredabove.
As McClelland and Rumelhartare the first to admit, many of thesemodels
have one major fault: they rely on a fixed set of representationalprimitives
. Thus, our dog recognizermay have units interpreted as standing for
size, color, and age, or whatever. But whatever the list, theseare then the
only dog featuresto which the network can be sensitive. If two dogs are
different in a way not capturableby the list, the system cannot learn the
difference. Indeed, the limitation is even worse than this. For if the linear
combination of the values of the set of units excluding the one receiving
external stimulation cannot be used to predict uniquely the activation of
the currently, externally stimulated unit, the delta rule cannot guarantee
perfect learning (see, e.g ., McClelland, Rumelhart, and the POP Research
Group 1986, vol. 2, p. 181). It turns out, however, that this limitation can
be overcomeby having hidden units. Thesetake no inputs Horn outsidethe
system and send no outputs outside the system. Instead, they lie between
the input and output units and canbe usedto mediateinternal connectivity
patterns so as to increasethe number of input-output patterns the overall
system can generate. In effect, the presenceof hidden units allows the
system to generatenew representationalprimitives if these are neededto
capture a pattern in the input. This greatly increasesthe power of the
system. Using the powerful learning algorithms of recent connectionism
(e.g., the generalized delta rule and the Boltzmann learning rule) these
systemscangeneratethe kinds of representationneededto solve problems
that simple perceptron systemsare provably unableto cope with .
The standardexample of the use of hidden units is the exclusive " or"
"
problem. Exclusive or" is true just in caseeither A or B is true but not both
A and B are true. You can have your cake or eat it , but not both. Now
imagine a network whose input units are sensitiveonly to the presenceor
absenceof A and the presenceor absenceof B. It would be easy if the
problem was to get a network to reasonusing inclusive" or" (i.e., allow that
"A
or 8" is true even if bothA and B are true). Make one input unit passa
value of 1 when it seesA , and make another input unit passa value of 1
when it seesB. Connect theseto an output unit with a firing threshold of
1. If the systemis given input of A , B, or A and B, the output unit will fire.
'
Failing that, it won t. (seefigure 5.6.) But this fails to cope with exclusive
" "
or, sinceif the system is given input of A and B, the output unit will still
Paralleldistributed processing 103

output unit

units
input
An5.6 or
Figure network
inclusive
unit
output
unit
hidden

Figure5.7
, Hinton, andWilliams
A simpleexclusiveor networkwith onehiddenunit. FromRumelhart
, 321.
1986

fire. If the input units are restricted to an A recognizer and aBrecognizer ,


the solution is to have a hidden unit that fires just in case A and B
recognizers both fire and that then passesa strongly inhibiting value to the
output unit . The hidden unit thus represents the conjunction A and B,
a representation that the system needs to solve the problem to which its
two input units alone were unable to respond . Figure 5.7 shows a network
"
that represents exclusive or" (after McClelland , Rumelhart , and the POP
Research Group 1986, vol . 1, p . 321 ).
A major achievement of recent connectionism is its finding learning
rules that , in the majority of cases, enable a system with hidden units to
learn to deploy these in whatever ways are necessary to do justice to the
'
structure of the input - output pattern required . The first limitation of the
distributed model of memory (its reliance on a fixed set of representational
primitives ) is thus surmountable , at least in theory , using only POP apparatus
. The second limitation is not so obviously surmountable in the same
" "
way It is what I shall call the serial control problem . Quite simply , the
.
"
model specifies the internal workings of some of the components of infor -
Paralleldistributed processing 105

look very much as if POP goes at least someway to finding the domesand
'
archesout of which the spandrelsvisible to the mind s eye emerge. Many
of the distinctions drawn at the upper level may still reflect our proper
epistemologicalinterestswithout marking any differencesin the underlying
computational form or activity of the system. Thus, we distinguish, say,
memory and confabulation. But both may involve only the samepattern
completing capacity. The difference lies, not in the computational substructure
, but in the relation of that structure to states of the world that
impinged on the knower. (For a discussionof how POP from an internal
perspective blurs the distinction between memory and confabulation,
see McClelland, Rumelhart, and the POP ResearchGroup 1986, vol. 1,
p. 80 - 81.)
Obviously, there is considerableand biologically attractive economy
about all this. The use of one underlying algorithmic form to achieve so
many high-level goals and the superpositional storage of data should
appeal to a thrifty Mother Nature. And the double robustnessof such
systems( being capableof acting sensibly on partially incorrect data and
capableof surviving some hardware damage ) is a definite natural asset.
' use of its stored
Finally, the sheer flexibility of the system s knowledge
constitutesa major biological advantage(and philosophicaladvantagetoo,
as we shall see). The capacity sensibly to deploy stored knowledge in a
way highly sensitiveto particular (perhapsnovel) current needswill surely
be a hallmarkof any genuineintelligence.
In sum , the kind of approachdetailed in the present chapter seemsto
offer a real alternative (or complement) to MIND -style theorizing. What
makesPOP approaches biologically attractive is not merely their neuro-
physiological plausibility, as many seem to believe. They also begin to
meet a seriesof more gel:leral constraints on any biologically and evolu-
Honarily plausiblemodel of human intelligence. POP approaches may not
be the only kind of computationalapproachcapableof meeting such constraints
. But work within the tradition of semanticallytransparentsystems
has so far failed to do so and instead has produced fragile, inflexible
systemsof uncertainneural plausibility.
Chapter6
InformationalHolism

1 In Praiseof Indiscretion
Discretion, or at any rate discreteness , is not always a virtue. In the
previouschapterwe saw how a profoundly nondiscrete, paralleldistributed
encoding of room knowledge provided for rich and flexible behavior. The
systemsupporteda finely gradatedset of potential, emergentschemata . In
a very real sense , that system had no single or literal idea of the meaning
"
of, say, bedroom." Instead, its ideasabout what a bedroom is are inextricably
tied up with its ideasabout rooms and contents in generaland with
the particular context in which it is prompted to generate a bedroom
schema.The upshot of this is the double- edgedcapacity to shadeor alter
its representaHon of bedroomsalong a whole conHnuumof usesinformed
by the rest of its knowledge (this double- edgedcapacity is attractive yet
occasionally problemaHc- see chapter 9). This feature, which I shall call
the " informa Honalholism" of POP, consHtutes a major quaiitaHve difference
between POP and more convenHonal approaches. This differenceis
Hghtly linked to the way in which POP systemswill fail to be semantically
transparent , in the senseoutlined in chapterone.
The present chapter has two goals. First, to elaborateon the nature of
informaHonal holism in POP. Second, to discusswhat conceptualrelaHon
parallel distributed encoding bears to the more general phenomenonof
informaHonalholism. On the latter issue, opHonsrangefrom the very weak
(PDP sustainssuch holism, but so can work in the STS paradigm) to the
very strong (only POP can support suchholism). The truth, as ever, seems
to lie somewherein between.

2 Informational Holism in a Model of SentenceProcessing

Before we begin , it is worth reiterating an earlier warning . In what follows


in this section , do not take any talk of what so-and- so meansto a network
too seriously , likewise with talk of what the network knows, and so on. This
'
talk, e.g ., of a network s shading its grasp of the meaning of a concept is
a legitimate shorthand for claims of the following nature: the shading of
108 Chapter 6

meaning found in natural-language understandingmay be supported by


the action of mechanismsthat encodeand retrieve data in the holistic POP
fashion. POP researchersstudy such mechanismsusing simple networks
not in causalcommercewith the referentsof words like "bedroom" and not
locatedin complex agentssharinga communallanguage. As suchit is quite
reasonableto withhold ascription of any actual grasp of meaning whatsoever
to suchnetworks. Still, if it is the action in us of something operating
according to the principles of such networks that enables us to be as
flexible and holistic in our grasp of meaning as we are, then the study of
suchmechanismssurely illuminateshow we succeedin grasping the meanings
we do. And this, rather than any deep philosophicalconfusion, is the
basisof suchtalk. So much for the obvious but essentialdisclaimer.
To get a better idea of the shading power that constitutes the informatio
holism of POP approaches, let us consider McClelland and
Kawamoto' s model of sentenceprocessing(fully describedin McClelland,
Rumelhart, and the POP ResearchGroup 1986, vol. 2, chapter 19). The
model aims to show in a highly simplified systemthe generalway in which
the capacity of POP models simultaneouslyto satisfy multiple constraints
might be exploited in a sentenceprocessor. In particular, the model focuses
on what is known as case -roleassignment . Case-role assignmentinvolves the
of ,
process deciding among other things, which bit of the sentencespecifies
an agent, who does things; which bit specifiesa patient, which gets things
done to it ; and which bit (if any) specifiesthe instrument of the doing. It
turns out that a variety of factors combine to guide our interpretations:
constraintsof context, syntax, word -order, and basicsemantics . The model
developed is a fairly standard, distributed model of the kind describedin
the last chapter. It takes as input the result of a surface parse on the
sentenceand yields a reformulation as a set of semanticmicrofeatures.And
it learns case-role assignmentfrom paired presentationsof such canonical
input and desiredoutput of case-role assignments . The kinds of microfeature
used in the canonicalformat are unimportant in detail. In fact, the authors
of the model used such features as volume, pointiness, breakability, and
softnessand allowed thesefeaturesto adopt various values(seetable 6.1).
All this must be treated as a gross simplification. In the long run we might

Table6.1
~ ~of
Miaofeatures ~~- of
~ a~model -- case
~---role
~-~- usi ent
Feature Values
volume smaILmedium , large
pointiness pointed , rounded
breakability fragile, unbreakable
, hard
softness soft
Informational holism 109

expect a reasonablemodel of the processes underlying grasp of sentence


meaningto deploy a set of microfeaturesthat correspondto minute details
of the visual, tactile, functional, and even emotive dimensionsof human
sensitivity to the world.
But even as it stands, the model exhibits some fascinating behavior
whose extension to much more fine-grained kinds of microfeaturalrepresentation
is intuitively obvious. According to its creators, the model exhibits
"
an uncanny tendency to shadeits representationof the constituentsof a
sentencein ways that are contextually appropriate. . . without any explicit
"
training to do so (McClelland and Kawamoto 1986, 276). I shall suggest
that this property, which comesfor hee with parallel distributed storage
and retrieval (at leastwith all genuinely distributedapproaches), allows POP
models to provide a mechanismwell suited to supporting a variety of
important semanticphenomena. Of all the interesting properties of such
models, this one, I believe, most firmly fixes any conceptualor qualitative
advantagesthat POP might have over other approaches. And indeed,
McClelland and Kawamoto themselvesdescribethe capacity to represent
"
a huge palette of shadesof meaning" as being " perhaps. . . the paramount
reasonwhy the distributed approachappealsto us" (1986, 314).
Here are someexamplesof the kind of shadingthey have in mind. The
network was trained on various canonical reformulations of sentences
involving , among other things," balls and breaking of objects. All the balls
it learnedabout had the value soft" along the dimension of softness. But
all the objectsit learnedabout as responsiblefor breaking things (hatchets ,
hammers, baseballbats, etc.) had the value "hard" along the dimension of
softness. Becauseof the general tendency of POP models to pick up on
patterns in the input and to generalizeto new cases , the subsequentbehavior
of the network should comeasno surprise. Sinceall the objectsused
for breaking were hard, its knowledge of this range of other casesinfects
the way it dealswith the sentence" the ball broke the vase." It correctly
activatesunits for most of the featuresof the ball in the caserole of instrument
. But it diverges Horn the standard ball pattern along the softness
dimension, activating "hard" insteadof " soft." To us, this is a mistake. We
know that soft balls are often responsiblefor breaking objects.
But the choiceof the network to shadethe meaningof ball in the context
of breakingin the way it did cannotreally be faulted. In light of its training,
the use of the outlying information to the effect that all instruments of
breaking are hard, not soft, seemsquite insightful. As the authors put it ,
" As far as this
model is concerned,balls that are usedfor breakingare hard,
"
not soft (McClelland and Kawamoto 1986, 305). This kind of shading of
'
meaningin the light of the rest of the systems knowledge is the heart and
soul of what I am calling the informationalholism of distributedconnection-
"
ist models. And it is bought by the simplefact that different readingsof the
110 O1apter6

sameword arejust different patterns of activation [of microfeaturesb really


different readings, onesthat are totally unrelated, . . . simply have very little
in common. Readingsthat are nearly identical with just a shadingof a difference
are simply representedby nearly identical patterns of activation"
(McClelland and Kawamoto 1986, 3 IS ).
The power of POP systemsto shademeaningsacrossa whole continuum
of casesenablesthem to model a numberof effects. Most straightforwardly,
it enablesthem to disambiguatewords accordingto the context built up by
the rest of the sentence.Thus take a sentencelike "The bat ate the fruit." In
this casethe bat is clearly an animal not a cricket bat, and a POP model
could usethe context of occurrenceto detenninethis fact. The featuresthat
constitute the distributed representationof a live bat would be radically
different from those appropriateto a cricket bat. The presenceof the other
words designating an act of eating fruit encourageactivation of the features
of the live bat.
This kind of effect, as McClelland and Kawamoto point out, can quite
easily be capturedin a conventional or in a localconnectionistapproach. In
eachcaseone would have a separateunit or memory store for eachof the
" "
readingsof bat , and a set of rules or heuristics(in the conventional case)
or a pattern of connectivity strengths (in the local connectionist version)
detennining which chunk to deploy when. This works until we need to
model very-Me -grained differencesin meaning. Separatechunksfor a fruit
bat and a cricket bat seemokay. But words seemto take on different shades
of meaningin a continuously varying fashion, one that seemsunspedfiable
in advance. Thus, consider:
(I ) The boy kicked the ball.
(2) The ball broke the window.
(3) He felt a ball in his stomach.
Sentences(I ) and (2) are from McClelland and Kawamoto 1986, 315.) In
case(I ) we may imagine a soft, toy ball. In case(2) we imagine a hard ball
(a tennis or cricket ball). In case(3) we have a metaphoricaluse: there is no
ball in his stomach, but a feeling of a localized, hard lump. Everyday talk
and comprehensionis full of such shading effects according to overall
'
context. Surely we don t want to commit ourselvesto predetennining all
suchusesin advanceand setting up a specialchunk for the semanticmeaning
of each.
The POP approach avoids such ontological excessby representingall
these shadesof meaning with various patterns in a single set of units
representingmicrofeatures. The patterns for sentences(I ) and (2) might
share, e.g ., the microfeature values sphericaland game object, while the
pattern for sentences(2) and (3) sharethe valuessmalland hard. One inter-
Infonnational
holism 111

esting upshot here is the lack of any ultimate distinction between meta-
phorical and literal usesof language. There may be central usesof a word,
and other usesmay sharelessand lessof the featuresof the centraluse. But
there would be no inn, God-given line between literal and metaphorical
meanings; the metaphoricalcaseswould simply occupy far-flung comersof
a semantic-state space. There would remainvery real problemsconcerning
how we latch on to just the relevantcommon features in understanding
a metaphor. But it begins to look as if we might now avoid the kind of
cognitivist model in which understandingmetaphor is treated as the computation
of a nonliteral meaningfrom a stored literal meaningaccordingto
high-level rules and heuristics. Metaphorical understanding, on the present
model, is just a limiting caseof the flexible, organic kind of understanding
involved in normal sentencecomprehension . It is not the icing on the cake.
It is in the cakemix itself.1
So far, then, we have seenhow the informational holism of distributed
models enablesthem to support the representationof subtle gradationsof
meaningwithout needing to anticipateeach suchgradation in advanceor
dedicate separatechunks of memory to each reading. And we also saw
hints of how this might undenninethe rigidity of somestandardlinguistic
categorieslike metaphoricalversusliteral use. One other interesting aspect
of this informationalholism concernslearning. The systemlearnsby altering
its connectivity strengths to enable it to re-create patterns in its inputs.
At no stagein this processdoes it generateand store explicit rules stating
how to go on. Superpositionalstorageof data meansthat as it learnsabout
one thing, its knowledge of much elseis automaticallyaffectedto a greater
or lesserdegree. In effect, it learnsgradually and widely . It doesnot (to use
the simplified model described) learn all about bats, then separatelyabout
balls, and so on. Rather it learns about all these things all at once, by
example, without formulating explicit hypotheses. And what it can be said
to know about each is informed by what it knows about the rest- recall
the caseof its knowledge that only hard things breakthings.
In sum, POP approaches buy you a profoundly holistic mode of data
storage and retrieval that supports the shading of meaningsand allows
gradual learning to occur without the constant generationand revising of
explicit rules or hypothesesabout the pattern of regularitiesin the domain.

3 SymbolicFlexibility
Smolensky(1987) usefully describesPOP models as working in what he
calls the subsymbolicparadigm. In the subsymbolicparadigm, cognition is
not modeled by the manipulation of machinestatesthat neatly match (or
stand for) our daily, symbolic descriptionsof mental statesand processes.
Rather, thesehigh-level descriptions(he cites goals, concepts, knowledge,
112 6
Chapter

perceptions, beliefs, schemata , actions) turn out to be useful


, inferences
labelsthat bear only approximaterelationsto the underlying computational
structure. He arguesthat work in the subsymbolic(or distributed connec-
"
tionist) paradigm aims to do justice to the real data on human intelligent
" i.e.
performance, , to clinical and experimental results, while settling for
merely emergent approximations to our high-level descriptive categories.
The essentialdifference between the subsymbolic and the symbolic approach
, as Smolenskypaints it , concernsthe question, Are the semantically
interpretableentities the very sameobjects as those governed by the rules
of computationalmanipulation that define the system?
In the symbolic paradigm, the answeris yes. Considerthe STSapproach
we sketchedway back in chapter 1. Here we find computationaloperations
directly applied to high-level desaiptions of mental states presentedas a
meansof capturing the computational backdrop of mind. Thus, we might
find a model of scientific discovery in which operations are performed on
statesdirectly interpretableas standing for particular hypothesesconcerning
the laws governing some data. Against this kind of approach, the subsymbolic
theorist urgesthat the entities whosebehavior is governedby the
rules of computationalmanipulation that define the systemneed not share
the semanticsof the task desaiption . For what is so governed is just the
activation profiles of individual units in a network. And in a highly distributed
model these units in the end will have no individual semantic
interpretation, or at least none that maps neatly and projectibly onto our
ordinary conceptsof the entities to be treated in a model of the processing
involved. Rather, what gets semanticallyinterpretedwill be generalpatterns
of activation of suchunits. A singlehigh-level conceptlike that of a kitchen
or a ball will , we saw, be associatedwith a continum of activation patterns
correspondingto the subtly different ideas about kitchen or ball that we
entertainin variouscircumstances . Smolenskyputs it nicely in the following
passage :
In the symbolic approach, symbols (atoms) are used to denote the
semanticallyinterpretableentities (concepts). Thesesamesymbols are
the objects governed by symbol manipulations in the rules which
definethe system. The entitieswhich are capableof being semantically
interpreted are also the entities governed by the formal laws that
definethe system. In the subsymbolicparadigm, this is no longer true.
The semanticallyinterpreted entities are patterns of activation over a
large number of units in the system, whereasthe entities manipulated
by formal rules are the individual activations of cells in the network.
The rules take the form of activation passingrules, of essentiallydifferent
characterfrom symbol manipulation rules. (1987, 100)
The claim, in effect, is that POP systemsneed not (and typically will not)
Infonnational holism 113

be semanticallytransparent in the senseintroduced in chapter 1 above.


Sucha claim may not seemimmediately plausiblefor the following reason.
A system will count as semanticallytransparentjust in case the entities
found in a top -level task analysis of what the system does have neat
syntactic analogueswhose behavior is governed by the computational
rules (explicit or tacit) of the system. Now clearly, it will not do to say that
just becauseindividual units cannotbe treatedas the syntacticanaloguesof
" " " " " "
such entities (e.g ., as coffee, ball , kitchen, and so on) the condition
fails to be met. For why not treat patternsof activation of suchunits as the
required analogues ? The behavior of such patterns surely is governed by
the computationalrules of the system.
This is where the requirementthat suchanaloguesbe projectiblecomesin.
"
Considera sentencelike " the ball broke the window. A conventional AI
systemdealing with sucha sentencewill have a "syntactic " analogue
"
(first in,
"
say , LISP and hence down to machine code) for ball and window. Consider
now a connectionistrepresentationof the samesentence.There will
be a pattern of active units, and it may well be possible to nonarbitrarily
isolate a subset of that pattern that, we would like to say, stands for
" "
ball . But that subpattem , it is important to note, will vary from context
" " " "
to context. Ball as it occurs in The ball broke the window will have
a different (though doubtlesspartially overlapping) syntactic analogueto
"ball " as it " "
occurs in The baby held the ball. In one casethe hardness
related microfeatures will be adive. In the other case, not. Thus, although
in each individual casewe can isolate a connectionist syntactic
analoguefor the entities spokenof in a conceptualaccount, these entities
will not be neatly projectible, i.e., the samesyntadic entity will not continue
to correlatein other caseswith the top-level semanticentity . This is
the real sensein which POP systemscan constitute a move away from
semantictransparency.
The examplegiven above could be multiplied. Smolensky(1988) makes
" "
similar commentsabout the symbol coffee as it occurs in various contexts
. And we could say the samefor bedroom" as it occursin the sofa-
"
including and sofa- excluding contexts treated earlier. The general point,
"
then, is that the context [in POP systems] alters the internal structure of
the symbol: the activities of the sub-conceptual units that comprise the
"
symbol- its subsymbols- changeacrosscontexts (Smolensky1988, 17).
Smolenskyformalizesthis point as a charaderistic of the highly distributed
POP systemshe is interestedin as follows: 'in the symbolic paradigm the
context of a symbol is manifest aroundit and consistsof othersymbols ; in
the subsymbolicparadigmthe context of a symbol is manifestinsideit , and
consistsof subsymbols" (Smolensky(1988, 17). Both the intrinsic holism
and flexibility of POP systemscan be seento flow from this fad .
114 Chapter 6

4 Gradesof SemanticTransparency

Using this apparatusas a base, Smolensky(1988) formulatesan interesting


picture of the cognitive terrain. He suggeststhat somehuman knowledge
(e.g ., public scientific knowledge) exists in the first instanceas linguistic
"
items such as the principle energy is conserved." Human beings, he suggests
, may use suchknowledge by deploying a virtual machineadaptedto
manipulateanaloguesof such . Suchexplicitly formulated
" linguistic representations
" "
knowledgehe calls cultural knowledge. The top-level conscious
processor" of an individual is precisely, he argues, a virtual machine
adaptedto that "
end. This machine, which is realizedby a POP substructure,
he calls the consciousrule interpreter." It is contrastedwith what he calls
"
the intuitive processor." The distinction again dependson the kind of entities
processed . The consciousrule interpreter actually takesas its syntactic
objects the semantic entities we use in desaibing the task domain (e.g .,
" "
energy ). The intuitive processor, by contrast, takes as its objects distributed
microfeaturalrepresentationsof the kind treated above. Theserepresentation
, we saw, bear only a fluid and shifting relationship to the
semanticentities (like " coffee" and "ball " ) spokenof at the conceptuallevel.
It thus follows that the programsrunning on the consciousrule interpreter
have a syntax and semanticscomparableto our top- level articulation of the
domain. ( This is no accident: they are precisely models of that top-level
articulation.) While the programsrunning on the intuitive processordo not.
In my terminology, programsrunning on the consciousrule interpreterwill
be semanticallytransparentand the semanticswill seepneatly down to the
formal level, while those running on the intuitive processorwill not. The
intuitive processoris quite clearly to be seen as the more evolutionarily
basic of the two and is reponsible (he says) for all animal behavior and
"
much human behavior, including: perception, practised motor behavior,
fluent linguistic behaviour, intuition in problem solving and game-playing-
in short, practically all of skilled performance" (Smolensky1988, S).
There need not, however, be an all-or-nothing divide between the semantical
transparentprocessingof the consciousrule interpreter and the
semanticallyopaque processingof the intuitive processor. For the cognitive
systemitself is presumedto be at root a subsymbolicsystemthat, to a
greater or lesserdegree in various cases , approximatesto the behavior of
a symbolic system manipulating conceptualentities. The greater the socalled
dimensionshift betweenthe conceptualdescriptionand the semantic
interpretation of the units in the network, the rougher suchan approximation
becomes. Thus, the model of emergentschematawe examinedearlier
behavesin a wide variety of casesas if it had a standard, rigid schemaof
bedrooms. But it diverges in casesof nonstandardrooms (e.g ., a bedroom
with a sofa). The range of such divergencewill increasewith the distance
Informational holism 115

" "
between the conceptualentity ( kitchen etc.) and the microfeaturesin the
" "
network. In a treatmentwith many microfeatures , where suchitemsas bed
" "
and sofa are merely approximatetop-level labelsfor subtle and context-
sensitivecomplexes of geometricand functionalproperties, the distancewill
be great indeed, and a conceptualmodel only a very high approximation.
Another route to the approximation claim is to regard the classical
accountsas describingthe competence of a system, i.e., its capacityto solve
-
a certain range of well posed problems (see Smolensky 1988, 19). In
idealized conditions (sufficient input data, unlimited processingtime) the
POP system will match the behavior speci6edby the competencetheory
" '"
(e.g ., settling into a standardkitchen schemaon being given oven and
" " -
ceiling as input). But outside that idealiseddomain of well posed problems
and limitless processingtime, the performance of a POP system will
diverge from the predictions of the competencetheory in a pleasingway.
It will give sensibleresponseseven on receipt of degradeddata or under
severetime constraints.This is becausealthoughdescribablein that idealized
caseas satisfying hard constraints, the system may actually operate by
satisfying a multitude of soft constraints. Smolenskyhere introduces an
analogy with Newtonian mechanics . The physical world is a quantum
system that looks Newtonian under certain conditions. Likewise with the
cognitive system. It looksincreasingly classicalas we approachthe level of
consciousrule following . But in fact , according to Smolensky, it is a POP
systemthrough and through .
'
In the samespirit Rumelhartand McClellandsuggest: it might be argued
that conventional symbol processing models are macroscopicaccounts,
analogousto Newtonian mechanics , whereasour modelsoffer more microscopic
accounts, analogous to quantum theory. . . . Through a thorough
understandingof the relationship between the Newtonian mechanicsand
quantum theory we can understand that the macroscopiclevel of description "
may be only an approximationto the more microscopic theory
(Rumelhartand McClelland 1986, 125). To illustrate this point, considera
simple exampledue to Paul Smolensky . Imagine that the cognitive task to
.
be modeled involves answeringqualitative questionson the behavior of a
particular electrical circuit. ( The restriction to a single circuit may appall
classicists , although it is defendedby Smolenskyon the grounds that a
small number of such representationsmay act as the chunks utilized in
-
generalpurposeexpertise seeSmolensky1986, 241.)' Given a description
of the circuit, an expert can answer questions like if we increasethe
resistanceat a certain point, what effect will that have"on the voltage, i.e.,
will the voltage increase , decrease , or remain the same?
-theoretic speci6-
Suppose , as seems likely , that a high-level competence
cation of the information to be drawn on by an algorithm tailored to
answerthis questioncites various laws of circuitry in its derivations (what
116 Chapter 6
" " ' '
Smolenskyrefersto as the hard laws of circuitry: Ohm s law and Kirchoffs
'
law). For example, derivations involving Ohm s law would invoke the
equation
voltage = current x resistance.
How does this description relate to the actual processingof the system?
The model representsthe stateof the circuit by a pattern of activity over a
set of feature units. These encode the qualitative changesfound in the
circuit variables in training instances. They encode whether the overall
voltage falls, rises, or remainsthe samewhen the resistanceat a certainpoint
goes
" up. Thesefeature"
units are connectedto a set of what Smolenskycalls
knowledge atoms , which representpatterns of activity acrosssubsetsof
the feature units. These in fact encode the legal combinations of feature
states allowed by the actual laws of circuitry. Thus, for example, "The
' '
systems mowledge of Ohm s law . . . is distributed over the many mowl -
edge atoms whose subpatterns "
encodethe legal feature combinationsfor
current, voltage and resistance (Smolensky1988, 19). In short, there is a
subpatternfor every legal combination of qualitative changes(65 subpatterns
, or mowledge atoms, for the circuit in question).
At first sight, it might seem that the system is merely a units-and-
connectionsimplementation of a lookup table. But that is not so. In fact,
connectionistnetworks act as lookup tables only when they are provided
with an overabundanceof hidden units and hence can simply memorize
input-output pairings"
. By contrast, the system in question encodeswhat
"
Smolenskyterms soft constraints, i.e., patterns of relations that usually
obtain betweenthe various featureunits (microfeatures). It thus hasgeneral
mowledge of qualitative relations among circuit microfeatures.But it does
nothave the generalmowledge encapsulatedin hardconstraintslike Ohm's
law. The soft constraints are two -way connectionsbetween feature units
and mowledge atoms, which inclinethe network one way or another but
do not compelit , that is, they can be overwhelmedby the activity of other
'
units (that s why they are soft). And as in all connectionistnetworks, the
systemcomputesby trying simultaneouslyto satisfy as many of thesesoft
constraints as it can. To see that it is not a mere lookup tree of legal
combinations, we need only note that it is capable of giving sensible
answersto (inconsistentor incomplete) questionsthat. have no answer in
a simple lookup table of legal combinations.
The soft constraintsare numericallyencodedasweighted inter-unit con-
r,ection strengths. Problem solving is thus achievedby " a seriesof many
node updates, each of which is a microdecision basedon formal numerical
rules and numericalcomputations" (Smolensky1986, 246).
The network has two properties of specialinterest to us. First, it can be
shown that if it is given a well-posed problem and unlimited processing
Informational holism 117

time, it will always give the correct answeras predicted by the hard laws
of circuitry. But, asalreadyremarked, it is by no meansbound by suchlaws.
Give it an ill -posed or inconsistentproblem, and it will satisfy as many as
it can of the soft constraints (which are all it really knows about). Thus,
"
outside of the idealised domain of well-posed problems and unlimited
"
processingtime, the system ' gives sensibleperformance (Smolensky1988,
19). The hard rules (Ohm s law, etc.) can thus be viewed as an external
theorist' s characterizationof an idealizedsubsetof its actualperformance(it
is no accidentif this brings to mind Dennett' s claimsabout the intentional
stance- seeDennett 1981).
Second, the network exhibits interesting serialbehavior as it repeatedly
tries to satisfy all the soft constraints. This serial behavior is characterized
by Smolenskyasa set of macrodedsions eachof which amountsto acommitment
of part of the network to a portion of the solution." Thesemacrodecisions
"
, Smolenskynotes, are approximately like the firing of production
rules. In fact, these productions' 'fire' in essentiallythe sameorder as in a
'
"
symbolic forward- chaininginferencesystem (Smolensky1988, 19). Thus,
the network will look as if it is sensitiveto hard, symbolic rules at quite a
Anegrain of description. It will not simplysolve the problem " in extension"
as if it knew hard rules. Even the stagesof problem solving may look as if
'
they are causedby the systems running a processinganalogueof the steps
in the symbolic derivations availablein the competencetheory.
But the appearanceis an illusion. The system has no knowledge of the
objects mentioned in the hard rules. For example, there is no neat subpattern
of units that can be seento stand for the generalidea of resistance ,
'
which figuresin Ohm s law. Instead, somesetsof units standfor resistance
at Rl ' and other sets for resistanceat R2. In more complex networks the
coalitions of units that, when active, stand in for a top-level concept like
resistanceare, as we saw, highly context-sensitive. That is, they vary
'
according to context of occurrence. Thus, to use Smolenskys own example
, the representationof coffee in such a network would not consist of
a single recurrent syntactic item but a coalition of smaller items (microfeatures
) that shift according to context. Coffee in the context of a cup
may be representedby a coalition that includes the features(liquid) and
(contacting-procelain). Coffeein the context of jar may include the features
'i
(granule) and' (contacting-glass ). There is thus only an approximateequivalence
of the coffeevectors acrosscontexts, unlike the " exactequivalence
' "
of the coffee tokens across different contexts in a symbolic processing
"
system ( 1988, 17). By thus replacing the conceptualsymbol
" " Smolensky
coffee with a shifting coalition of microfeatures , the so-called dimension
shift, suchsystemsdeprive themselvesof the structuredmental representations
deployedin both a classicalcompetencetheory and a classicalsymbol-
processingaccount (level 2). Likewise, in the simple network described,
118 6
Chapter

there is no stable representationthat stands for resistance(just as in the


famous past-tensenetwork there is no stable, recurrent entity that stands
for verb stems[seechapter9]).
It seems , then, that by treating subsymbolicallythe entities spoken of
in our conceptual-level descriptions, we buy the flexibility , shading, and
generallack of rigidity and brittlenessrequiredof a systemif its subsequent
behavior is ever to warrant the ascription to it of a genuine grasp of concepts
. Symbolic flexibility of understandingis brought about by the increased
low -level variability of the POPapproach. In this way suchsystems
may avoid the excessiverigidity and lack of insight in conventional AI so
thoroughly bemoanedby Dreyfus (see chapter 2). Notice also that the
subsymbolicmodel remainsformal in the senseoutlined in chapters1 and
2. It is a microfunctionalist theory as defined in chapter 2, section 6. That
is, it specifiesa systemonly in terms of input- output profiles for individual
units and thus is not crucially dependent on any particular biological
substrate. But the entities figuring in the formal profile do not correspond
to, or otherwise nearly preserve, the boundariesof any conceptual-
level description of thought. This is all good news, especiallyin the light
of our strictures against projecting consciousconceptualcategoriesback
into the head (seechapter 3). But we must now consider a slightly tricky
question concerning the nature of the relation between POP substructures
and the various desirablesymbolic properties they seemcapableof
supporting.

5 UnderpinningSymbolicFlexibility
POP approaches, we saw, are well suited to modeling the kind of symbolic
flexibility associatedwith human understanding. But what kind of relation
does the locution " well suited" so comfortingly gloss1 Here are some
possibilities.
(1) Only a POP style substructurecan support the kind of flexible
understandingrequired(uniqueness ).
2
( ) Maybe non-POP systems can do the job . But semanticallytransparent
es
approach simply cannot achieve such flexibility (qualified
liberalism).
(3) There is nothing specialabout a POP capacity to support flexible
understanding(unqualifiedliberalism).
Justwhich option we choosemust depend, in part, on what we understand
" "
by a POP approach. We could mean:
(a) A model based finnly on one of the current algorithmic forms
deployed by McClelland, Rumelhart, and the POP ResearchGroup
Informational holism 119

(e.g ., a model using a Boltzmannlearningalgorithm or the generalized


delta rule).
(b) A model that is fonnally specified by a description of simple units
in parallel, excitatory and inhibitory connectionsand by value-passing
rules of one kind or another, and for which a semanticinterpretation
of that formal level of specificationinvolves a dimensionalshift away
from the categoriesand entities involved at the ordinary level of task
" "
analysis. (Call this architecturalPOP. )
(c) A model (any model) that supportsthe specialqualitiesof flexible
retrieval, holistic storage and learning, and context-sensitiveshading
of meaning. (Call this " functional POP." )
The (a) reading is surely far too strong. No one seriouslybelievesthat the
algorithms currently studied within POP tell the final connectionist story
about neuralcolnputation. And a (c) reading seemstoo vague. There seems
no good reasonto expect the label " parallel distributed processing" to be
naturally applicableto just any systemcapableof supporting flexibility and
holism of the relevantkind. This leavesus with the (b) reading, architectural
"
POP. A POP approach" meansan approachusing formal, network models
whose semantic interpretation as a network of simple units and value-
passingrules involves a dimensionalshift away from top -level conceptual
entities and down to more fine-grained, microfeatures.This is the reading I
adopt throughout this book.
What, then, of our earlier options? The uniquenessclaim (option 1)
seemsimpossibleto support. Sincewe have no idea of the range of com-
putational strategiesthat might remain to be discovered, no one can say in
advancethat only a POP approachcan achievesomegoal. The nearestwe
get to any such argument is the following . Supposeyou are restricted to
generatingcomputationalmodelsthat are realizablein the kind of biological
hardwarethat humanspossess . Becauseof what we know of the speedof
suchhardware, it looks as if some tasks(primarily the perceptualcompletion
and interpretation tasksdiscussedin chapter5) could be performedas
fast as they are only if some kind of parallelismis involved. Our current
focus, however, is on the kind of flexible understandingof meaning embodied
in contextual-shading effects. And the problem domain here is by
no meansas well understood as that of say, low -level vision. So as yet it
would be foolish even to endorsea qualified version of option 1, i.e., one
"
reducing the uniquenessclaim "
to unique if the system must be implemented
on a neural network.
This observation, however, should not be used as an argument for un-
qualified liberalism(option 3). For it is by no meansclear that the kind of
flexible deployment of knowledge and continuous gradations of meaning
found in POP.are fully available to semanticallytransparentsystems,
120 6
Chapter
i.e., systems in which "projectible syntactic entities act as analoguesfor
the entities invoked in a top-level analysisof a task and are manipulated
by the computational operations that define the system. For the flexible,
context-sensitive retrieval and deeply holistic storage of data found in
PDP approaches is a direct function of the way suchapproaches treat top-
level conceptualentities, e.g., building "ball " (section 3 above) or " coffee"
(Smolensky1988) out of microfeaturecomplexes whose precis.e constitution
varies accordingto the details of a particular context of occurrence .
To make these points vivid , recall the "example of the network given
sentencesinvolving balls and breakage. The network looked as if it had
made the following inference: if most objects used for breaking things are
hard and a ball is used for breaking things, contradict the central caseand
representthe ball as being hard. We can imagine this inferenceformalized
into a generalrule. If most Is areg and :r is an f, then :r is to be represented
as having the feature g . All well and good. In some cases , at least, a
semanticallytransparentmodel following a rule like this would exhibit the
kind of shading of meaning we require. But note the following . First, we
needto specify what we meanby " most." So our flexible systemwill get a
little stiff around the edges. A POP system with continuous strengths of
activation of microfeature units can avoid this stiffness. Second, we can
depend on such tactics and stay faithful to a semantically transparent
methodology only if the softening rules specify operationson featuresthat
are likely elementsof top-level re8ectionon a task (e.g ., the feature"hard" ).
If we desireeven finer gradationsof meaning, we will need sets of microfeatures
for these featuresalso, until ultimately the individual units won' t
be semanticallytransparentat all. Recallalso (chapter 5) that a feature of
some POP systemsis that they generate their own microfeature-detector
units as required. Differencesin patternsof activation of these self-taught
supermicrofeatureunits may underpin very fine differencesin meaning.
This deep 8exibility seemsunavailableif we insist that our computational
'
operationsapply to semanticallyinterpretableentities of a kind we pick out
in advance. Third, even if we put this worry aside, the sheerproliferation of
.
rules and heuristics needed to even approximate human flexibility and
shadesof understandingwill be enormous (recall Dreyfus's observations
reported in Chapter 2).
Now supposewe managein a serial (but not semanticallytransparent)
system to simulatethe required range of flexibility , shadingpotential, etc.
Eventhenthere is a suspicionthat the simulatedholism thus achievedwill
never be as deepas the natural holism of a PDP system. For there will
always remain conceptually important differences in the counterfactual
behavior of the systems. Thus, supposeyou have a serial model with a
discreterepresentationof a squareand a discreterepresentationof a rectangle
. And we add a rule that says that squaresinherit the properties of
Informational holism 121

rectangles. In a POP system this might be achievedby having a single


network with a distributed representationof squareand rectangle stored
superpositionallyover a set of microfeatureunits that represent, e.g., that
the sidesmeet at ninety- degreeangles. We simply couldnot reachinto such
a systemand delete its knowledge of the ninety-degreefeature of squares
without also affecting its knowledge of the ninety-degree feature of redangles
. Contrast the artificial "holism " of the serial model. Here, there is a
possibility of deleting the rule for knowledge of squareswhile leaving the
knowledge of rectanglesuntouched. Of course, a serialapproachmay well
have other resourcesat its disposal. The point is that the holism of the POP
model is an ineliminablefeature of its mode of data storage and is thereby
robust across counterfactual cases , which may shatter the more fragile
holism of a more conventional version.

6 PDPand theNatureof Intelligence


There is a standard responseto any attempt to establish a qualitative
differencebetween PDP and conventional approaches. It relies on the supposed
portability of software. Parallelprocesses canalways be simulatedin
serialfashion if we are either unconcernedwith how long they take to run
or willing to imaginearbitrarily fast serialmachines(possibly exceedingthe
"
speedof light itself). This fact gives rise to the following kind of remark: A
program running on a parallel machinethat produced some sort of intelligence
will also run on a serial machine, and this is enoughto show the
hardwarei" elevantfor explainingthe nature, if not the evolution , of that
" . kind of
particularintelligence( Krellenstein 1987 , 155 ; )
my emphasis This
argument , I think, underlies much of the resistanceto PDP approaches
encounteredin mainstreamcognitive science. But the issuesinvolved are
'
by no meansas cut and dried as Krellensteins bald statementleadsus to
expect. Granted, the mereuseof parallelhardwareis hardly going to be the
decisivefactor in the explanationof someparticular intelligence. Indeed, if
we ignore time and speedconstraints, anything a real PDP machinecanrun
could be run on a serialsimulation of the machine, that is, on a virtual PDP
machinecreatedon a serial processor(e.g ., the SymbolicsLISP Machine).
Of course, time and speedmay not be irrelevant featureswhen it comesto
assessingthe intelligenceof a system. We do not allow that someonewho
takeshours to solve a problem by the serial enumerationof all the possibilities
has thereby demonstratedthe sameintelligence as someonewho
solves it quickly by a highly selectiveand well-chosensearchprocedure.
As we saw in chapter four, time and the efficient use of resourcesare
at a premium for natural intelligence (and where else is our idea of
intelligence to be grounded?). Thus, it is possibleto dispute the claim that,
" if the
program runs too slowly on the serial machine to be useful we
122 6
Chapter

would not say that it no longer demonstratesintelligencebut only that it


is too slow, or that the particular approach, though successful , is impractical
"
(Krellenstein1987, 155).
However, I shallnot pursuethis
. line. Instead
, for the purposeof argument
I shall accept the point about irrelevanceof hardware. For much of the
interest of POP, it seemsto me, lies not in the notion of parallel hardware
so much as in the particular propertiesof any virtual machine(whatever its
hardwarebase) running parallelcooperativealgorithms of the kind we have
been discussing. The particular properties I have in mind (some of which
we have alreadymet) canbe grouped under the headingsof " Searches" and
"
Representation ."

Searches
Even the most conventional work in AI acceptsa relationship between
degree of intelligence and efficiency of search. The point of heuristic
searches (chapter 1, section 4) is precisely to increasethe intelligence
of the
systemby reducing the extent of the searchspaceit must traverseto solve
a particularproblem. Someproblemsrequire Anding the best way of simultaneousl
satisfying a large number of soft constraints. In suchcasesPOP,
we saw, provides an elegant solution to the problem of efficient, content-
sensitive searches. A searchis driven by partial descriptionsand involves
simultaneouslyapplying competing hypothesesagainst one another. It is
not dependant on fully accurate or even consistent information in the
partial description. POP methods thus introduce a qualitatively different
kind of searchinto AI . This kind of searchmay not alwaysbe the best or
most efficient option; as always, it all dependson the nature of the space
.
involved.

Representation
We have seenhow POP models (or those POP models which interest us)
employ distributed representationsin superpositionalstorage. The deepin-
formaHonalholism of POPdependson this fact. Besidesexplainingshadings
of meaning, discussedabove, this mode of representaHon has important
side effects. Some of these were discussedinchapterS (e.g ., graceful
degradaHon, generalizaHon, content-addressibleretrieval). One very important
side effect that I have not yet fully discussedis crosstalk.
Cross talk is a disHncHve POP pathology that occurs when a single
network processes mulHple patterns with common features. Becausethe
patterns have common features, there is a tendency for the overall pattern
of activaHon appropriate to one pattern to suffer interference from the
other pattern becausethe system tries simultaneously to complete the
other pattern. At Hmes I have praised this property by calling it " free
generalization." When it occursin somecontexts, however, it is a sourceof
Infonnational holism 123

errors. Thus, a systemthat seeksto recognizewords might occasionallyfall


into error when two simultaneouslypresentedwords sharesome features,
" " ' " ' " " "
e.g ., sand and lane might get readas iand and sane (seeMcClelland
1986, 139). Or in a model of skilled typing, the keystrokesappropriate to
two successiveletters may interfere and causeoverlap or exchange(see
Rumelhartand Norman 1982).
Cross talk might be regardedas an irritating side effect of parallel distributed
representationwere it not for two facts. First, the kind of error
causedby crosstalk is familiar to us in many areasof life. (Recallonceagain
the difficulty of remembering the phone numbers of two friends if the
'
numbersare very similar. We tend to mix up elementsof each. That s cross
talk.) It turns out that rather fine-grained experimental data on human
performance , including such error profiles, can be explained and even
-
predicted using POP models (see McClelland 1986, 139 140). This is
presumablygood news for POP consideredas an attempt model human
to
processingmechanisms .
But secondand more important, the sourceof such occasionalerrors is
alsothe sourceof much of the power and 8exibility of such systems. The
tendencyto generalizeis one exampleof this. Another even more powerful
effect is to provide a computationalunderpinning for the capacityto think
analogically, to seeparts of our knowledge as exemplifying patterns and
structuresfamiliar to us from other parts. This tendencyto seeone part of
an experiencein terms of another, to seelinks and similaritiesand patterns
all around us, is surely at the heart of humancreativity. It is what Hofstad-
" "
ter (1985) calls 8uid thinking.
'
At its best cross talk may underpin such classicanecdotesas Kekules
discovery" of the benzine
' ring after dreaming of snakesbiting their own
tails, or Wittgenstein s capacity to seea court room simulation of a motor
accidentusing model vehiclesas indicating a deeptruth about the natureof
language. At 'its worst, crosstalk pathologically amplified may explain the
schizophrenics tendency to seedeeplinks betweenevery item of daily experience
. Either way, the idea of an inEonnationalsubstructurethat forcesus
continuouslyto seekand extend patterns(the very substructureresponsible
for crosstalk) is psychologicallyhighly attractive. For, asHofstadterputs it:

People perceive patterns anywhere and everywhere, without knowing


in advancewhere to look. Peoplelearn automaticallyin all aspects
of life. Theseare just facetsof common-sense. Common-senseis not
" "
an areaof expertise, but a general- that is, domain-independent-
capacity that has to do with fluidity in representationof concepts, an
ability to sift what is important from. what is not, an ability to find
unanticipated analogical similarities between totally different concepts
. (1985, 640).
124 Chapter 6

Nobody , Hofstadter included, claims that current POP systemsprovide


good models (or even hints) of how we come to exhibit all these properties
. Especiallydifficult and important is the problem of how we fix on
the important similaritiesbetweentwo conceptsor domainsand ignore the
unimportant ones. Nonetheless, crosstalk does seemto suggestthe beginnings
of a partial answerto the more generalquestion of how we can find
unanticipated similarities, and, as we saw, POP also provides someaccount
of the fluidity of our representationsof concepts. These are featuresthat
depend crucially on the architectureof the POP approach, that is, on the
(possibly virtual) setup of units, output functions, patterns of connectivity,
propagation rules for spreadingactivation, and learning rules. In this sense,
some of the most interesting qualitative advantagesof POP models are
gained at a level much closer to the implementationdetail (though not yet
the hardwarelevel) than is standardlyconsideredimportant. That is, they
are a direct function of the capacity of this architectureto support distributed
, superpositionallystored representations .
Now , it may be that other, as yet undiscoveredarchitectureshave the
sameproperty, and if so, the uniquenessclaim (section 4, option 1) would
have to be whole-heartedly rejected. But it certainly does not look as if a
semanticallytransparentapproachto programming will be able elegantly
to support the desired properties. At the end of the day, therefore, a
qualified liberalism (section 4, option 2) seemsto be warranted. Some
may choose to see POP as just suggesting new ways of implementing
conventionalAI approaches, like scriptsand frames. But by being rendered
in the POP distributed, superpositionalstyle, these approaches gain con-
siderably in power and flexibility . Indeed, they gain flexibility to such an
extent that it often seemssensible to view the standard constructs as
Smolensky does, as useful approximations to the fine-grained picture
presentedby POP. Either way, it remainstrue to say that the useof a POP
architectureopens up new and qualitatively different avenuesof searches
and representationto those so far explored in conventional AI .

7 An EquivalenceClass of Algorithms

POP wo ~ d have maximum philosophical significance just in case the following


result could be established. The result may be put like this:
There exists a formal specification that picks out algorithmic forms
that are instances of a POP model of one kind or another , and for
conceptual reasons it is clear that any system capable of exhibiting the
kind of rich and flexible behavior that warrants talk of that system as
knowing , meaning , understanding , or having thoughts must rely on
one of those algorithmic forms .
Informational holism 125

"
Call this the equivalence -class conjecture." Roughly, it states that POP
descriptionscould in principle be used to specify an equivalenceclassof
algorithmic forms, and deploying some memberof that classis a constitutive
requirementof bona fide thinking. It will be seenthat this conjecture
consists essentially in a combination of the uniquenessclaim (rejected
earlier) and the claim that the kind of flexible, context-sensitiveresponse
madepossibleby a POP architectureis essentialto thinking.
The latter claim, it seemsto me, is well founded. Many philosophical
worries about creating artificial intelligence were based precisely on an
observedlack of flexible, commonsensedeployment of stored information
and on a lack of fluid, analogicalreasoning.( Recallthe discussionof Dreyfus
in chapter 2, and seeHofstadter 1985.) Likewise, the capacityto shadethe
meaningof various high level conceptsand the tendencyto learn concepts
in large, mutually dependentgroups may reasonably be seenas part of the
very idea of graspinga concept. The featuresthat POPwork seemscapable
of supporting in time may thus be philosophically significant parts of an
accountof understanding.
It might be argued that this is to give our notion of understandingan
overly anthropocentrictwist. Humansand higher animalsmay well exhibit
the featuresmentioned. But why should it be requiredof all understanders
that they conform to our model? To this line of thought I can only reply
that the concept of understandingwas formed to describethe behavior of
humansand higher animals. It neednot be indefensiblyanthropocentricto
believe that certain features of these uses must carry over to any case
where the conceptis properly used.
Suppose, then, that we agree that essentialto understandingare some
of the featuresthat a POP approachlooks well suited to underpinning. For
its truth the equivalence -classconjecturewould then require, that we find
conceptual reasons why only membersof the POP classof modelscan support
suchfeatures. It is this part of the conjecturethat looks insupportable.
Conceptualties may be felt to exist between, say, the use of distributed
representationswith a microfeaturesemanticsand the kind of deep holism
and flexible understandingwe require. But this alone is insufficientto count
as a POP model according to our definition (section 4 above). What we
cannot rule out in advanceis the possibility of some as yet undiscovered
but distinctly non-POP architecturesupporting distributed representations
and superpositionalstorageand henceproviding all the requiredfeaturesin
a new way. If this were to prove possible" we would be in the midway
position describedin chapter 3, section 7, that seemsto blur the divide
betweenthe constitutive and the causal. The dreamthat POP can specify a
classof algorithms essentialto thinking is preciselythe dream of finding a
formal unity among the set of substructurescapableof supporting thought.
Of course, it may turn out that the only physicallypossibleway of achieving
126 6
Chapter
distributedrepresentationand superpositionalstorageto the requireddegree
involves the use of a (possibly virtual) value-passingnetwork. If we could
seejust why this should be so (as a result of, say, physical limitations on
processingimposed by the speedof light ), we would have quite a strong
result. The POP substructurewould be revealed as a naturally necessary
condition for the flexible behavior (including linguistic behavior) that is
conceptuallyessentialto the ascription of thoughts.
Briefly, the various possibilitieslook like this.
.
. The weakest claim is that a POP substructure
interesting supports
flexible understanding and behavior, and these are essentialfor an
ascription of understanding.
. The intermediateclaim is that a POP substructureis naturally necessary
for flexible understandingand behavior, and these are essential
for an ascription of understanding.
. The strong claim is that a POP substructureis
required on conceptual
grounds (i.e., of
independently physical limitations like the speed
of light ) for flexible understandingand behavior, and theseare essential
for an ascription of understanding.
I believe it is too early to try to force a choice between theseclaims. As
we saw, it seemsthat the strongest claim presently eludesus. And even
the intermediateclaim is suspect. That said, I believe philosophicalinterest
still attachesto the POP appraoch, since it begins to demonstratehow a
physical system is at least able to support various features that must be
exhibited by any system that warrantsdescriptionin a mentalisticvocabulary
. There are those whose idea of philosophicalinterest would placesuch
a result outside the sphereof their proper concerns. For such philosophers
nothing short of the truth of the equivalence -classconjecturewould motivate
a claim of philosophical significance.Here we must simply differ and
be content to get as clear as we can both about the possible relations
between POP models and thoughts and about the grounds for expecting
2
any particular relation to hold.
Chapter 7
The Multiplicity of Mind : A Limited Defence of
ClassicalCognitivism

1 Of Cloudsand ClassicalCognitivism
Every silver lining has a cloud, and POP is no exception. There are classes
of problemsto which POP approaches are apparently ill suited. Theseinclude
the serial-reasoningtasksof logical inference, the temporal-reasoning
tasksof consciousplanning, and perhapsthe systematic-generativetasksof
'
languageproduction. POP seemsto be natures gift to pattern recognition
tasks, low -level vision, and motor control.! But as we proceed to higher,
more-abstract tasks, the POP approach becomes less and less easy to
employ. This is, of course, just what we would expect on the basisof our
earlierconjectures.A POP architecturemay have beenselectedto facilitate
carrying out evolutionarily basic tasks involving multiple simultaneous
satisfactionof soft constraints. Vision and sensorimotorcontrol are prime
examplesof suchtasks. Other tasks- especially the relatively recenthuman
achievementsthat classiccognitivism focuseson- involve complexsequential
operationsthat may requirea systemto follow explicit rules. Conscious
reasoningabout chessplaying, logic, and consciousattempts to learn to
drive a car are .examplesof such tasks. Where the conscious-reasoning
aspectsof such tasksare concerned, the standardarchitectureof classical -
cognitivist models offers an excellent, design-oriented aid to their solution.
In these models an explicitly programmed CPU (central processingunit)
performs sequentialoperations on symbolic items lifted out of memory.
The architectureis perfectly suited to the sequentialapplication of explicit
rules to an ordered seriesof symbol strings.
Suchsequential,rule-following acrobaticsare not the forte of POP. They
may not be beyond the reachof a POPapproach, but they certainly do not
come naturally to it. It may be significant to notice, however, that these
'
sequentialrule-following tasks arent our forte either. They are the tasks,
that humanbeingsfind hardest, the oneswe tend to fail at. And often, after
we ceaseto find them hard (after we are good at chessor logic or driving
a car), we also ceaseto have the phenomenalexperienceof consciouslyand
sequentiallyfollowing rules as we perform them.
128 Chapter
7

The thought I develop in this chapterconcernsa possiblemultiplicity of


virtual cognitive architectures
. The idea is that for some aspectsof some
reasoningtasks, we might be forced to emulate a quite different kind of
computing machine. For example, to perform consciousdeductive reasoning
, we might emulatethe architectureof a serial Von Neumannmachine.
This idea is by no means new, but its full significanceis not generally
appreciated. The picture of multiple architecturesforms half of a partial
defenceof classicalcognitivism. (For the other half, see the appendix, in
which I provide an independent argument for a multiplicity of styles of
cognitive explanation, a multiplicity demandedwhateverthe architectural
facts are.)

2 Against Uniformity
All too often, the debate between the proponents and doubters of POP
approaches assumesthe aspectof a holy war. One reasonfor this, I suspect,
is an implicit adherenceto what I shallcall the generalversion of the uniformity
assumption. It may be put like this:
Everycognitive achievementis psychologically explicableusing only
the formal apparatusof a single computationalarchitecture.
I shall say more about the terms of this assumptionin due course. The
essentialidea is easy enough. Some classicalcogniHvists believe that all
cognitive phenomenacan be explainedby modelswith a single set of basic
types of operation (seethe discussionin chapter 1, section 4; insectionS
below; and in chapter 8). This set of basic operations definesa computa-
tional architecturein the senseoutlined in chapter 1, section4. Against this
view some POP theorists seemto urge that the kinds of basic operation
made available by their models will suffice to construct accuratepsychological
models of all cognitive phenomena(see chaptersS and 6 above).
Eachparty to this dispute thus appearsto endorseits own version of the
uniformity claim. The Classical-cognitivist version is:
Every cognitive achievementis psychologicallyexplicableby a model
that canbe desaibedusing only the apparatusof classicalcogniHvism.
The POP version is:
Every cognitive achievementis psychologicallyexplicableby a model
that can be describedusing only the apparatusof POP.
The argumentI develop will urge that we resist the uniformity assumpHon
in all its guises. Instead, I endorsea model of mind that consistsof a multitude
of possibly virtual computaHonalarchitecturesadaptedto various task
Themultiplicityof mind 129

demands. Each task requires psychological models involving distinctive


setsof computationallybasicoperations.
My goal is to cast doubt on the assumptionwithout going to the opposite
extreme(as is characteristicof the holy -war protagonists) and suggesting
that POP models are neverpsychologically relevant but always have
the statusof meredetailsof implementation. Both positions are represented
in the literature debating the nature of the relation between connectionist
models and conventional AI . Thus, Broadbent(1985) arguesthat psychological
explanation involves appeal only to what function is being computed
and not to how it is being computed. He further arguesthat subject
to speedconstraints, connectionistand conventionalAI can both compute
all the samefunctions (i.e., anything that a universal Turing machinecan
do), so there can be no specialpsychologicalinterest in developing a con-
nectionist model rather than a more conventional one. Something of this
debate has been repeated in a recent exchangein the journal Cognitive
Science (seeThagard 1986 and Krellenstein1986). The trouble seemsto lie
in the ill -defined notion of computing the samefunction. Does a machine
that computes " 8 x 7" by retrieving a stored answer to " 7 x 7" and
adding 7 computethe samefunction asone which adds 7 eight times? They
do in the sensethat both functions present the sameinput-output profile
(i.e., input: ?8 x 7, output: 56). But the way they achieve their goal is
different. And the differencewill affect such fine-grained details as speed
and breakdownpattern. The machinethat storesthe answerto " 7 x 7" and
adds 7 is in all probability faster. And if through damageit lost its capacity
to add, it would still know the answerto " 7 x 7" at least, whereasits more
"
conventionalcousin would not. Now , it seemsreasonableto supposethat
one of the tasks of psychological modeling is to offer a computational
account of relative-speedprofiles, error patterns, and breakdown patterns
in human performance. The simple idea that computing the samefunction
is just getting input and output to match for some central range of tasks
does not do justice to such fine-grained details. As I remarkedearlier in a
'
somewhatdifferent context, it ' s not what you do, it s how you do it that
is of interest, particularly where the how explainsa further seriesof whats.
Thus, discrete storage of data and the kind of informationally holistic
storagediscussedin the previous chaptermay well dictate the samerange
of performanceon many tasks. But the speed, pathology, and fine-grained
performance(e.g., shadingof meanings) of the holistic version may well be
different. For this reasonone cannot argue from the fact that two systems
computethe samefunction, that POPmodelsare psychologicallyirrelevant.
For preciselythesekinds of reasonsPOP researchersspeakof the microstructure
of cognition. But this way of speaking is a little dangerous
(particularly in the light of their favored analogies). For it beginsto suggest
a questionableadherenceto the unifonnity assumption . For example, we saw
130 Chapter7

how, according to their favored analogy, we should conceivethe relation


between connectionistand conventional AI along the lines of the relation
betweenNewtonian theory and quantum theory.
It is worth repeatinga representativepassage .
.
It might be argued that conventional symbol processingmodels are
macroscopicaccounts, analogousto Newtonian mechanics , whereas
our models offer more microscopicaccounts, analogousto Quantum
theory. Note that over much of their range, thesetwo theoriesmake
precisely the same predictions about behavior of objects in the
world. . . . However, in some situations Newtonian theory breaks
down. In these situations we must rely on the microstructural account
of Quantum theory. (Rumelhartand McClelland 1986, 125; my
emphasis )
This, it seemsto me, would be a reasonableanalogy just in casein some
situations Newtonian theory was held to be the correct physical explanation
of the phenomenonin question. But this is not so. Rather, the claim is
that in some situations Newtonian theory works, although it is incorrect
in detail, and quantum theory explains why . The following unifonnity
assumptionis warrantedin the physicalcase: for every physicalevent there
will exist a (possibly horrendouslycomplex) microstructuralquantum-level
accountthat would 6gure in any completephysical model of the phenomenon
in question. But what is thus true of the relation between Newtonian
theory and quantum theory is not true of the relation between conventional
and connectionistaccountsaspsychological modelsof humanachievements
, if the main conjectureof this chapteris correct (seethe next section).
The claim is often made that " in the subsymbolic paradigm, serial,
symbolic descriptionsof cognitive processingare approximate-descriptions
of the higher level properties of connectionist computation" (Smolensky
'
1987, 103). Or again, We view macrotheoriesas approximations to the
.
underlying microstructure which the distributed model . . . attempts to
"
capture (Rumelhart and McCelland 1986, 125). In all these casesmy
worry is that taken one way, these comments and the general analogy
seemto imply that as psychological models, conventional accountsmust
alwayshave the status of (perhapsuseful) appro.rimationsto the real, POP
story. For reasonsI develop below, this view threatensto underestimate
the value of conventional models and thus invites a holy war that the AI
community can ill afford. In the next section I shall argue that the relation
betweenconventional and connectionistAI may not be uniform acrossthe
cognitive domain after all. For some tasks, I suggest, the conventional
account may be psychologically complete in a way in which Newtonian
physicsis never physically complete.
Themultiplicityof mind 1.31

3 Simulatinga VonNeumannArchitecture
"
In a recent paper (Clark 1987) I speculatedthat the human mind might
effectively simulatea serial, symbol-processing, Von Neumannarchitecture
for somepurposes(largely evolutionarily recent tasks, as discussedin chapter
'
4). If that is the case, I asked, wouldn t it follow that for suchtaskssome
classicalcognitivist computationalaccountcould prove to be correct- not
-
just approximately correct but correct , tout court? In short, if we suppose
'
that suchsimulation goes on, isn t the uniformity assumptionsimply false,
and the relation between conventional and connectionist AI not uniform
acrossthe cognitive domain?
When writing that paper, I had no idea about how such simulation
might occur. I simply relied on the idea that there is nothing unusualin the
simulation of one architectureby another. The pervasiveidea in computer
scienceof a virtual machine is precisely the idea that a machine can be
programmedto behaveas if it were operating a different kind of hardware
(see, e.g ., Tannenbaum1976). Since then, however, things have come to
seema little more concrete. Thus in Rumelhart, Smolensky, et al. 1986 we
find somefascinatingspeculativeideason the humancapacityto engagein
various kinds of conscious, symbolic reasoning. Thesespeculationscan be
used, I believe, to give substanceto the claim that we might occasionally
simulatea Von neumannarchitecture.
'
The POP Group raisethe following questions. if the humaninformation-
' '
processingsystemcarriesout its computationsby settling into a solution
rather than applying logical operations, why are humans so intelligent?
How can we do science , mathematics , logic etc.? How can we do logic if
the basic operations are not logical at all?" They suggestan answer that
involves a neat computational twist. Our capacity to engage in formal,
"
serial, rule-governed reasoning , they speculate , is a result of our ability to
createartifacts- that is, our ability to createphysical representationsthat
we can manipulate in simple ways to get answers to very difficult and
"
abstractproblems. (Both passagesare from Rumelhart, Smolensky, 1986,
44.)
Before seeing how this solution works, it is worth expanding on the
problem for humanintelligenceand POP that it is intended to solve. In the
first of the two quoted passagesthe problem is about our ability to do
science , mathematics , and logic. Earlier in the same section (Rumelhart,
Smolenky , et al. 1986 , 38) the problem areas identified are conscious
thought , serial processing, and the role of language in thought. Very
generally , it seems to me, there are two kindsof human capacity that POP
modelsat first glanceare hard put to captureor illuminate. Theseare:
(1) Processes of serial reasoningin which the ordering of operations
is vital ,
132 7
Chapter

(2) Process
esof generativereasoningin which unboundedset of structures
may be producedby the applicationof rulesto a data base.
Consciousplanning. logic, and much advancedabstractthought seemsto
involve capacity 1. The prime example of capacity 2 would seem to be
languageproduction. The kind of story discussedbelow is best suited to
explaining the sequentialconsciousphenomenaadverted to in capacity 1.
Perhapsit can be extendedto cover the generativephenomenaaswell, but
that, I believe, is a much harder issueand one I make no claimsto address
here. POP models, like the model of sentenceprocessingdiscussedin the
previous chapter, seembest suited to modeling some aspectsof language
understanding . Languageproduction, insofar as it involves finding and combining
the right constituentsin the right way to expressa message , raises
a whole host of other issuesthat lie largely beyond the scopeof this book.
As far as sequential,consciousthought is concerned, POPapproaches so
far offer a two -facetedaccount. First and most simply, there is seriality in
POP. Sinceat one time a network occupiesat most one state (a pattern
of activation over the simple units), there will be a sequenceof suchstatesas
exogenous and endogenousinputs cause it to becomeactive and then relax
into new stable states. Considered during the short periods of value-
passingactivity , it is a parallel distributed system. Consideredover longer
stretches of time, it can be seenas a sequenceof discretestates. This gives
the POP theorist the beginnings of an angle on consciousexperience.The
idea being that " the contents of consciousnessare dominated by the relatively
stablestatesof the system. . . . Consciousnessconsistsof a sequence
of interpretations- each representedby a stable state of the system"
(Rumelhart, Smolensky, McClelland, and Hinton 1986, 39). We are often
not consciousof, say, the processof finding a good metaphor, making a
pun, various creative leaps in scienti6cdiscovery (more on which below).
But we are consciousof, say, applying modusponensto two lines of a
logical proof or planning a sequenceof events. The POP account would
begin to explain this by positing that relaxation occursduring the unconscious
fast phenomenaand by treating consciousphenomenaas a perceived
sequenceof the resultsof suchrelaxation steps.
But the sequentiality of states alone is insufficient to cover even the
processes of serial reasoningassociatedwith capacity (1). For, in effect, all
we have so far is a kind of stream-of-consciousnessdisplay. In casesof
logical reasoning, long multiplications and so on, the ordering of operations
is vital. How are such orderings achieved? Here is where the second
facet of the account comes in, namely, the use of artifacts as physical
representations .
Consider a simple example. SupposeI ask you to take every second
number in a spoken series, add 2 to it , and sum up the total. Most of us
The multiplicity of mind 133

would find the task quite difficult. But supposeI allow you to use pen,
paper, and the Arabic numerals. The task becomessimple. For the series
7, 4, 9, 5, 2, 1, 6, 9 isolateevery secondnumber(4, 5, 1, 9), add 2 (6, 7, 3, 11),
and sum the total (27).
Rumelhart, Smolensky, et al. develop a similar exampleinvolving long
multiplication. Most of us, they argue, can learn to just seethe answerto
somebasicmultiplication questions, e.g., we can just seethat 7 x 7 is 49.
This, they suggest, is evidenceof a pattern-completing mechanismof the
usualPOP variety. But for most of us longer multiplications presenta different
kind of problem. 722 x 942 is hard to do in the head. Instead, we
avail ourselves(at least in the first instance- see below) of an external
formalism that reduces the bigger task to an iterated series of familiar
relaxation steps. We write:
722
x 942

and go through a seriesof simple pattern completing operations (2 x 2,


2 x 2, 2 x 7, etc.), storing the intermediateresultson the paperaccording
to a well-devised scheme. In a highly revealing comment, the authors go
on to say, "This is real symbol processingand, we are beginning to think,
the primary symbol processingthat we are able to do. Indeed, on this
"
view, the external environment becomesa key extension to our mind
(Rumelhart, Smolensky, et al. 1986, 46). (On the importance of external
symbolismsto human thought, I cannot resist quoting a wonderfully selfnegating
commentmadeby a student encounteringthe attractionsof POP
"
for the first time. He said, It was only when I started to write my ideas
down that I realized that explicit representationscounted for so little at
all." ) This general strategy of using external representationsrecallsnicely
the earlier argument (chapter 4, section 3) that biologiCally sound com-
putational accountsneed to investigate ways of exploiting environmental
structuresto aid cognition. Moreover, it suggestsan interesting perspective
on the Von Neumannarchitectureitself, more on which shortly.
It is, of course, true that we can learn to do long multiplication in our
heads. And here we encounter the Anal twist to the story. In-the-head
sequentialreasoning , the authors argue, is made possible only by constructing
a mentalmodelof the external structureswhose actual physical
manipulation enabledus to learn to perform such sequentialoperationsin
the first place. Thus, they suggest, we may solve syllogismsby constructing
Venn diagramsin our heads. We can do this only becausewe were first
able to construct or observesuchdiagramsin external, physicalform.
The notion of a mental model in play here is just the idea of a network
that takesas input some specificationof an intended action (e.g ., multiply
134 7
Chapter
7 x 7 as if using an external medium) and giv .es as output a resultant state
of the imaginedworld (e.g ., an image of the number" 49" suitably arranged
on paper as part of a longer multiplication). (A more detailed account of
how such mental modeling proceedscan be found in the same chapter,
p. 40- 44.)
In sum, the speculation(and it is no more than a speculation) is that three
capacitiescombine to allow human beings (who are assumedto be POP
devicesat root ) to perform complex, sequential, symbol-processingtasks.
Theseare:
(1) a basicPOP pattern-matching capacity,
(2) a capacity to mentally model our environment,
(3) a capacity to physically manipulateour real environment, and to
perceive the effeds of such manipulations(adaptedHorn Rumelhart,
Smolensky, et al. 1986, 44).
"
They add, mysteriously, especiallyimportant is our ability to manipulate
the environment so that it comesto representsomething," which is enough
to raiseany philosopher's hackles(seesection4 below).
The thought, anyway, is that capacities 1 to 3 enable us to reduce
sequential,symbol-processingproblemsto a form that is POP-tractable. On
this accountthe useof the external environmentexpandsboth the rangeof
taskswe can conquer and ( by enabling us to use relevant mental models)
the kinds of mental reasoningof which we are capable. The picture may be
capturedin a metaphor: Thought parasitisesthe world and returns, nourished
and enlarged, to the head. Rumelhart, Smolensky, et al. seemto assumethat
this use of the external world always occurs within the experienceof an
individual human. But there is no reason to rule out the possibility that
models of the external would sometimesbecome incorporated into our
innate hardwareas a result of the usualprocesses of natural selection. If so,
in some domainswe may be born to reasonas if we had experiencedmanipulatin
a real external environment and construded a mental model of
it. This might open the way to some contact with , e.g ., Chomskianconjectures
concerning languageacquisition (although the powerful learning
algorithms deployed by POP seems, if anything, to point in the opposite
diredion ).
Notice, however, that even if some mental models have evolved, our
basicarchitecturewould remaina POPsystem, though cwmingly configured
to makepossiblecertain kinds of sequential, symbolic thought. We would
not have an architecturepurposely built for suchreasoning. The historical
snowball effed investigated in chapter 4 works to kludge an architecture
chosenfor speedyperceptualand sensorimotorprocessinginto something
capableof somekinds of sequential, consciousreasoning.
The multiplicity of mind 135

All of this suggestsan interesting angle on Von Neumannarchitecture,


with its main memory and the logical, manipulativecapacitiesof a CPU. In
basicPOP approaches (i.e., onesnot involving mental modeling) there is, in
effect, no distinction between the processingstructure and the data being
processed . It is the activation pattern of the processing structure that
encodesthe data. Von Neumannarchitecturesseparatethe two in just the
way an environmentally embedded POP systemwould separatethe external,
symbolic structuresHorn the POP operations on such structuresinvolved
in sequential reasoning. Mental modeling is somehow parasitic on this
separation. Perhaps , then, it makessenseto seeVon Neumannarchitecture
as mistakenly modeling in-the-head computation on computation that in
humansconsistsof both an in-the-head componentand (to begin with ) an
in-the-world component. if so, the mistake of sequential, symbolic cQgni-
tivism is to treat all thought as depending only on manipulating something
like external symbolic structuresaccording to rules. The mistake is
to model all thought on our gross manipulationsof real, external symbolic
'
structures. Thus, we may half agreewith Simons comments: Humansand
" achievetheir
computer systems intelligenceby symbolising external and
internal situations and events and by manipulatingthose symbols. They all
employ about the same symbol manipulating processes. Perhapsthat particular
invariance arose because computerswere made (unintentionally) in
"
the image of man ( 1980 37
, ) . Conventional computersmay well have been
's of external, physical structures.
made in the image of man exploitation
The slip is to believe the form of in -the -headoperationsis entirely given by
such a model . The rigidity and limitations of classicalcognitivist models
' reluctance to see such models as intelligent (seechapters
and philosophers
1 to 3) may all stem Horn this one error. The new start provided by POP is
to seemuch in-the-head computation as having the qualitatively different
form of relaxation proceduresand to seeother computation as involving
the cunning manipulationof external structures(or mental models thereof)
to our own ends. In precisely this sense, then, we finally reject the SPSS
hypothesisoutlined in chapter1. The manipulationof gross symbolic structures
is merely ingeniousicing on the computationalcake. In the absenceof
a POP substrateof powerful pattern-matching operations, such manipulations
fail to instantiate thoughts (see sectionS below). We thus uphold
'
Searles belief that suchgross manipulationsare not sufficientfor thought,
though without his mysteriousbiological alternative (chapter2).
But for all that, classicalcognitivism is not too bad off. For it looks as if
'
the POP Group s own conjecturesunderminethe uniformity assumptionas
a psychologicalexplanation. For a certain range of tasks we can now see
the world (or a mental model of it ) as the memory of a conventional Yon
Neumannmachineand our physical capadties(in the first instance) as the
manipulative capacitiesof the CPU. In short, for some tasks we seemto
136 7
Chapter

simulatea conventionalarchitecture. If so, why shouldn't a corred psychological


accountof humanperformanceof suchtasksbe given at the classical -
cognitivist level1That is to say, why not seethe classicalaccountas psychologica
accuratein suchcasesand not merely asa good approximation
to an accurateaccount?
An example may help. When we simulate a connectionist architecture
on, say, a serial Symbolics LISP machine, we don't speak of the serial
account as explaining, in any psychologically relevant way, the microstructure
of connectionistcognition. The serialSymbolicsmachine, in such
cases , is indeed just an implementation detail. In the casesin which POP
mind and manipulationof the world yield a virtual Von Neumannarchitecture
'
, shouldnt we likewise treat the POP substrateas a psychologically
irrelevant implementationdetail? If we should, the uniformity assumption
fails. For those tasks the correct level of psychological explanation is
indeed that of classicalcognitivism, though for other tasksit is not.
There are, however, complications. As I remarkedearlier, implementation
doesaffed the behavior of the simulatedmachineboth in speedand
breakdown profiles. I shall return to this point in sedion 5 below. First, a
few worries about the story so far.

4 A Lacunain theAccountof RealSymbolProcessing


The point of this section is just to emphasizesome worries touched on
earlier. The path to real-symbol processingseemsto have been traverseda
little too quickly. PerhapsRumelhart et al. are correct when they write,
"
Especially important here is our ability "to manipulate the environment
so that it comes to represent something. (Rumelhart, Smolensky, et al.
1986, 45). But how is this important ability achieved? As far as I can see,
they provide no real account.
Consider the exampleof long multiplication. Most of the work is done
by understandingwhat it is to have an external formalism that represents
numbers. Such understandingseemsas mysterious in a POP approachas
ever it was in a classicalcognitivist one. Monkeys, for instance, seemto
have good, basicPOP pattern-matching capacities.And we can give them
on a plate, as it were, the external representationalformalismsof language
and number. But the extent to which they succeedin exploiting such for-
malismsremainslimited. Here is a computational lacuna that will require
more than mutterings about mental models to resolve. The question is,
What are the prerequisitesfor a system to come to use external structures
in the rich representationalfashion of human beings? This question cuts
acrossthe boundariesof evolutionary biology , philosophy, and AI , and it
is, I suspect, of absolutely the utmost importance. If I had any idea of how
to solve it , I'd be writing a very different book.
The multiplicity of mind 137

One thing does seem clear. The apparatusthat Rumelhart et alegive


'
won t quite do. The difficulty of seeing how we construct external for-
malismsin the first place is solved, they say, by the fac;ts of cultural transmission
(Rumelhart, Smolensky, et ale1986, 47). Representationalsystems,
they rightly point out, arenot easyto comeby . Thosewe have grew gradually
out of a long historical processof alternation and addition to simple
systems. This is Me . They are other examples, no doubt, of the pervasive
principle of gradualistic holism, examined in chapter 4. But it leavesuntouched
the real problem of how we can recognizeanything as an external
representationalfonnalism at all. It is here, I believe, that the most profound
problemslie.
And there are further difficulties. Even if one hasboth an externalrepresentational
fonnalismand an understandingof what it is for somesquiggles
to representsomething, still the very deployment of the fonnalism looks
more problematic than Rumelhart et aleallow. Take the example of long
multiplication. It is not simply a matter of perfonning an iterated seriesof
basicpattern-matching operations. For as I remarkedearlier, we must store
the intennediateresults on paper (or in our mental model) according to a
well-devisedscheme . That is, if we compute 7 x 7 as part of a long multiplication
" "
, we do not simply store 49; rather, we store a 9 in one placein
a sequenceon paper and carry on 4 to the next operation. What kind of
control and storagestructuresare necessaryfor this? And does POP alone
enableus to instantiatethem?
All these issues, I believe, are important lacunasin speculationsabout
real symbol processing. For the purposesof this chapter, I am assuming
they can be overcome. But we should bear them firmly in mind, especially
in the light of the suggestionthat understandingthe mind requiresunderstanding
, with corred psychologicalmodel
a numberof virtual architectures
associatedwith each(seebelow).

5 Full Simulation, Intuitive Processing RuleInterpreter


, and the Conscious
'
In the previous chapter (section 4) we met with SmolenskyS notion of a
virtual machine that he called a consciousrule interpreter. And we saw
how, accordingto Smolensky, classicalaccountswould be lessapproximately
valid for tasks performed using this virtual machine than for tasks run
directly on a so-calledintuitive processor. Notice, however, that this claim
upholdsthe uniformity assumptionbut weakensits impact by allowing the
classicalaccountsto be accuratemodels of competencein idealizedconditions
(again seechapter6, section4; for the completestory, seeSmolensky
1988).
Smolenskymay well be right about the existenceof a consciousrule
interpreter deployed in many casesof conscious, linguistically or logically
138 7
Chapter
formulated reasoning. But if the accountin section3 above is at all correct,
this is not yet the whole picture. For it does not allow for the kind of case
in which real, external entities (or mental models thereof) play the role of
gross symbols manipulatedaccording to explicit, linguistically formulated
rules and heuristics. Smolenskythinks that the classicalaccountis approximately
valid in certain cases
, becausewithin limits the behavior of a POP
system will often match that predicted by a classicalmodel. This match
increases(though it is never perfect) as the dimension shift between the
entities invoked at the level of task analysisand the semanticinterpretation
of the units of the network decreases , i.e., the match increaseswith increasing
semantictransparency.
In thesecasesthe approximate validity of conventional models is thus
merely a matter of a limited imput- output equivalence . In the specialkind
of casetreated in section 3 above, in contrast, the conventional model is
a realistic model of the processing structureof a certain extended virtual
machine. This is very different from being a model of the input-output
structure of a machine. In the specialkind of casein which actual discrete
environmental structures(or mental models thereof) are manipulated according
to explicitly formulated rules or heuristics, we have a virtual
machine that recapitulatesthe processingsteps of a conventional model.
That is, the kinds of operations we perform on real, external symbolic
structures(and hence the kinds we use in any mental model of the same)
are just the operationsfound in a conventional processor, e.g ., completely
copying a symbol "
from one location to another, deleting, adding whole
" a
e. .
symbols( g , cup to list), and matchingwhole symbols. In thesespecial
cases , therefore, the conventionalmodel is not any kind of appro.rimationto
the truth; it is the truth.
In effect, then, I wish to add a third architecturalpossibility to a spectrum
formulated by Smolensky. Smolensky's picture of the relation of conventiona
models to POP models looks like this:
. Tasksinvolving the intuitive processorare a rough
approximation
of systemperformance.
. Tasksinvolving the consciousrule interpreter are a good approximation
of systemperformance.
To this picture tentatively add:
. Tasks involving the use of external structures as objects for a
consciousrule interpreter are an exact descriptionof (possibly virtual)
system structureand performance.
There are important twists to the story which remainto be described. In
particular, we need to considermixedtasks, i.e., tasks(if there are any) that
Themultiplicityof mind 139

involve more than one of thesevirtual machines.We also needto consider


the qualitative effects of implementing a symbol processor in a parallel
'
distributed architecture. The POP group s insistencethat the cognitivist
account, though often useful, must always have the status of some more
or less accurateapproximation to the true computational, psychological
story is, I suggest, based on a subtle misreading of the moral of such
cases . I return to these matters in the next chapter. For the present, the
lesson is straightforward. If the account just developed is at all plausible
, any assumptionsof uniformity are premature. There may be a full
spectrum of relations between POP and cognitivist models of cognition
accordingto whether the task aspectunder study involves intuitive processing
, consciousreasoning , or a full simulation of a gross symbol-processing
mechanism .

6 BACON, an Illustration
Let me illustrate the position outlined above by consideringsdenti6c discovery
once again. A cognitivist model of sdenti6c discovery is given by
the BACON program, outlined in chapter 1, section 4 (for full details see
Langleyet al. 1987). To recapitulatebriefly, BACON derivessdenti6c laws
from bodies of data. Roughly, it works on recorded observationsof the
values of variablesand seeksfunctions relating such values. In its search
for suchfunctions it follows somesimpleheuristicsthat suggestwhat functions
to try Arst and how to proceed in casesof difficulty . The program
doesnot start out with any theoreticalbias or expectationsconcerningthe
outcome; it simply seeksregularities in data. As a kind of control experiment
, Simon employed a graduatestudent who knew nothing of Kepler's
third law (seechapter 1) and give him the sets of data upon which Kepler
worked.2 Told to And a function relating one column of 6gures(in fact, the
radii of planetary orbits) to another (in fact, the periods of planetary
revolution), it took the student 60 hours to discover the law. The BACON
-
program is much quicker, but the procedureis the same: a serial, heurisHc
guided searchfor a function relating x and .v, uninformed by any understanding
of the signi6canceof x, .v, or the enterpriseof sdentmc investigation
itself.
Simon notes, however, that there are elements in the processof real
sdenti6c discovery that are not easily amenableto such an approach. He
'
cites the exampleof Flemings spotting the signi6canceof the mouldy petri
dish. But any classicflash of insight will do. We might point to Stephen-
son's (allegedly) watching his kettle boil and conceiving the idea of the
'
steamlocomotive or someones studying the behavior of thermodynamic
systemsand conceiving the ideas behind POP. These aspectsof sdenti6c
140 7
Chapter

discovery have two prime characteristics


, which should by now attract our
attention.
. The flash of insight is
typically fast. The ideajust comesto us, and
we have no consciousexperienceof working for it.
. The flash of insight involves
using rather abstract perceived patterns
in one domain of our experienceto suggestways of structuring
our ideasabout someother apparently far removed domain.
In the light of thesecharacteristicsit is not be absurdto suggestthat some
POP .mechanismis operating in such cases. Simon appearsto reject this
idea, insisting that these very fast processes are in no way fundamentally
different from the processes of serial, heuristic searchused by BACON
(a full quote is given in chapter 1, section4). This time, it seems , it may be
the conventional theorist who is too quickly assuminguniformity across
the cognitive domain.
A better position is well explainedby D. Norman: " Peopleinterpret the
world rapidly, effortlessly. But the developmentof new ideas, or evaluation
of current thoughts proceedsslowly, serially, deliberately. Peopledo seem
to have at least two modesof operation, one rapid, efficient, subconscious ,
the other slow, serialand conscious" (1986, 542). According to this model
(which was also endorsedby Smolensky; seechapter6, section3), the com-
putational substrateof humanthought comprisesat least two strands. One,
the fast, pattern-seeking operations of a POP mechanism , the other the
slow, serial, gross-symbol-using, heuristic-guided searchof classiccognitivism
. If my earlier speculationsare at all correct, this latter strand may at
times be dependent on a virtual symbol-processingarchitecture, possibly
created by our capacity to exploit real environmental structures(but the
genesisof the simulation is not what is at issuehere). Were it not for their
strange commitment to a single functional architecture (see chapter 1),
sucha picture ought to be quite acceptableto theoristslike Langley, Simon,
Bradshaw, and Z ytkow . Certainly, they actually set out to model the
processes of slow, conscious, serial reasoning. For" example, in defenceof
the seriality of their approach, they suggest that for the main processes
in any task requiringconsciousattenHon, short term memory serves as a
severebottleneck that forces processes to be executed serially" (Langley
et al. 1987, 113; my emphasis). It seems, then, as if the folk-psychological
" "
category of scientific discovery fails to pick out a single computational
kind. Instead, it papers over the difference between conscious toil and
unconsciousinsight to fix on the productof both processes (new scientific
,ideas), a product of undoubted significancein human life. This is just
what the conjectures of chapter 3 should lead us to expect. The folk-
psychological view of the cognitive terrain is a view from within the
environmentally and culturally rich mesh of human practices. It has no
Themultiplicityof mind 141

-
specialinterest in fixing on regularities and differencesin the causalcom
putational substrateof our thought except insofar as theseare of immediate
practicalsignificance.
Suppose,then, that we were to acceptsucha division within the domain
of scientific discovery. Would it not then be fair to seek psychological
modelsof the slow, serialcomponentwithin a classiccognitivist framework
and models of the fast componentwithin POP? An overallmodel of human
scientificthinking needsto includeboth and to addressthe issueof how the
resultsof eachcan be fed to the other in a cooperativeway. Still, it would
not follow that the slow serial model is a mere approximation of that
component.
In sum, I am advocating that cognitive scienceis an investigation of a
mind composedof many interrelating virtual machineswith correct psychological
models at eachlevel and further accountsrequiredfor the interrelations
betweenstIch levels. Only recognition of this multiplidy of mind,
I suspect, will save cognitive sciencefrom a costly holy war between the
3
proponentsof POP and the advocatesof more conventional appr~aches.
Chapter8
StructuredThought, Part 1

1 Weightingfor Godot?
Are some cognitive competencesbeyond the explanatory reach of any
POP model? Does the weighting game degenerateat some specifiable
l
point into waiting for Godot? Somephilosophersand cognitive scientists
believe so. The putative problem concernsthe systematicity of the processing
requiredfor suchsophisticatedcognitive achievementsas language
production and understanding. Two kinds of argument are advancedto
convince us that connectionismis unable to penetrate these systematic
domains. One is a lively (but ultimately implausible) set of arguments
detailed in Fodor and Pylyshyn 1988 and Fodor 1987. These arguments
seek to support a classical -cognitivist model of thought. The other, associated
with an influential critique of connectionismby Pinker and Prince
(1988), aims to justify at least a classicalstructuring of the componentsof
information-processingmodels of higher cognitive functions. The arguments
here are plausible and important but, I shall argue, are unable to
support any strong conclusionsabout the limits of POP.
The itinerary goes like this. The current chapter focuseson the arguments
of Fodor and Pylyshyn. These are shown to be generally uncom-
pelling Systematicityof effect may well argue in favor of systematicity of
.
cause. But classicalcognitivism involves much larger claims, which Fodor
and Pylyshyn give us no reasonto accept. The chapterendsby associating
their error with a pervasivefailure within the cognitive-sciencecommunity
to distinguish two kinds of cognitive science. One of these involves the
attempt to model the complex, holistic structure of thought (thoughts
here simply arethe contentful statesdescribedusing propositional-attitude
- -
ascriptions). The other is the attempt to develop modelsof the in the head,
computationalcausesof the intelligent behavior that warrantssuchthought
ascriptions. The projects, I shall argue, are distinct and nonisomorphic.
Chapter 9 goes on to consider some edited highlights of the Pinker and
Prince paper and then returns afresh to the question of mixed models of
cognitive processes, first raisedin chapter 7.
144 Chapter 8

2 TheSystematicity
Argument .
Fodor and Pylyshyn 1988 is a powerful and provocative critique aimed at
the very foundations of the connectionist program. In effect, they offer
the mend of connectionisman apparently fatal dilemma. Either connec-
tionism constitutes a distinctive but inadequatecognitive model, or if it
constitutes an adequate cognitive model, it must do so by specifying
an implementation of distinctively classicalprocessingstrategiesand data
structures. I shall argue that the critique of Fodor and Pylyshyn is basedon
a deep philosophicalconfusion.
I begin with an imaginary anecdote, the point of which should become
apparentin due course. One day, a famousgroup of AI workers announced
the unveiling of the world 's first, genuine, thinking robot. This robot, it was
claimed, really had beliefs. The great day arrived when the robot was put
on public trial. But there was disappointment. All the robot could do, it
seemed , was output a single sentence : "The cat is on the mat." (It was
certainly a sophisticatedmachine, since it generally respondedwith the
sentencewhen and only when it was in the presenceof a cat on a mat.)
Here is an extract from a subsequentinterchangebetween the designersof
the robot and some influential membersof the mildly outraged academic
community.
Designers : Perhapswe exaggerateda little . But it really is a thinking
robot. It really doeshave at leastthe singlebelief that the cat is on the
mat.
Scoffers : How can you say that? Imagine if you had a child and it
could produce the sentence"The cat is on the mat" but could not use
I " " I "
the words Icat, lion, and Imat in any other ways. Surely, you
would concludethat the child had not yet learnedthe meaningof the
words involved.
Designers : Yes, but the child could still think that the cat is on the
mat, even if shehasnot yet learnedthe meaningsof the words.
Scoffers : Agreed, but the caseof your robot is even worse. The child
would at least be capableof appropriate perceptualand behavioral
responsesto other situationslike the mat being on the cat. Your robot
exhibits no suchresponses .
Designers : Now you are just being a behaviorist. We thought all that
stuff was discreditedyears ago. Our robot, we can assureyou, has a
data structure in its memory, and that structure consistsof a set of
" " II "
distinct physical tokens. One token stands for the, one for cat,
" " " " " '
one for lIis, one for on, and one for mat. Unless you rebehav -
iorists, why ask more of a thought than that?
'
Scoffers: Behavioristsor not, we cant agree. To us, it is constitutiveof
Structuredthought 145

having the thought that a is b to be able to have other thoughts


involving a and be .g ., cis b, a is d, a is not b, and so on. To have a
thought is to be in a stateproperly described' by the ascriptionof a set
of conceptsand relations. And you cant have a conceptin a semantic
'
vacuum.You cant know what addition is if all you can do is output
" 2 2 = 4." Possessionof a
+ conceptinvolves a large and structured
set of abilities to do things, either internally or externally. This is not
'
any kind of peripheralbehaviorism; it s just a reflection of the' actual
nature of thought ascription. As a matter of fad , we think that s what
'
the businessof thought ascription is all about. It s a way of making
global senseof a wholesetof dispositionsto behave.
'
The moral of the story is just this. You cant get away with ascribing the
thought that a is b to a systemunlessyou canalso get away with ascribing
otherthoughts involving a and b to it (in actual or counterfactualcircumstances
). (This observation pervadesmuch recent philosophy of language.
The general, global, holistic nature of belief ascription is well describedin
various works by D. Davidson (seethe Davidson (1984) collection). And
the point about the needto be able to entertainmanythoughts involving a
and b to be capableof entertaining any is made explicit as the generality
constraint in Evans1982, l00 - 10S(seechapterthree above).
'
Considernow the mainstay of Fodor and Pylyshyn s assaulton connec-
tionism: the requirementof systematicity. The argumentgoes like this.
Observation: Normal linguistic competenceof a native speakeris systematic
" "
. Speakerswho know how to say John loves the girl generally
" "
also know how to say The girl loves John.
Explanation: Linguistic competenceinvolves grasp of compositional
semantics. The speakerlearns to construct meaningful sentencesby
combining meaningful atomic parts in a particular way. Thus, competence
" " " " " " " "
with John, loves, the, and girl , along with competence
with subject-verb-object constructions, immediately yields a
" "
capacityto produce The girl loves John.
Sentenceshave genuineconstituentand constructivestructure, and this fact
explainsthe phenomenonof systematicity.
Fodor and Pylyshyn then propose an exactly analogousargument for
thought. It goes like this.
Observation : Normal (human and animal) c.ognitive competenceis
' with punctateminds, e.g ., creatures
systematic.You don t find creatures
"
whose cognitive capacities consist of the ability to think seventy-
"
four unrelated thoughts (Fodor and Pylyshyn 1988, 40). Creatures
who can think that John loves the girl can typically also think that the
girl loves John.
146 Chapter 8

Explanation : Thoughts, like sentences, have constituent structure.


Thinking that John loves the girl involves having some relation to
an internal representationalstructure with proper parts standing for
" " "" " " "
John," loves, the, and girl and with somekind of combinatorial
syntactic structuring.
In sum, there will be two mental representations, one correspondingto the
thought that Johnloves the girl and one correspondingto the thought that
the girl loves John. And there will be some systematicrelation between
them such that " the two mental representations , like the two sentences
,
must be madeof the sameparts' (Fodorand Pylyshyn 1988, 39). Fodor and
" '
Pylyshyn conclude: If this explanationis right (and there don t seemto be
any others on offer), then mental representationshave internal structure
and there is a languageof thought. So the architectureof the mind is not a
connectionistnetwork" (1988, 40). The anticonnectionistconclusionis not
yet compelling, of course. It dependson a lemmato the effect that connec-
tionist work, in contrast, posits unstructuredmental representations
. Fodor
and Pylyshyn certainly endorsesucha lemma. They write, " Connectionists
propose to design systems that can exhibit intelligent behavior without
storing, retrieving or otherwise operating on structured symbolic expressions
(1988, 5). (Call this the lemmaof unstructuredrepresentations.)
The overall form of argumentis now visible.

Thought is systematic.
So internal representationsare structured.
Connectionistmodels posit unstructuredrepresentations.
So connectionist accounts are inadequate as distinctive cognitive
models.
Classicalaccounts, by contrast, are said to posit internal representations
with rich syntactic and semanticstructure. They thus reach the cognitive
parts that connectionistscannot reach.

3 Systematicity
and StructuredBehavior
This argumentis deeply flawed in at least two places. First, it misconceives
the natureof thought asaiption, and with it the significanceof systematidty.
Second, it mistakenly infers lack of compositional, generative structure
from lack of what I shall call conceptual-level compositional structure.
Thesetwo .mistakesturn out to be quite interestingly related.
The point to notice on the nature of thought ascription is that system-
atidty , as far as Fodor and Pylyshyn are concerned , is a contingent
, empirical
fact. This is quite clear from their discussionof the systematidty of infra-
verbal, animalthought. Animals that can think aRb, they claim, cangenerally
thought 147
Structured
" " "
think bRaalso. But they allow that this neednot be so. It is, they write, an
empirical question whether the cognitive " capacitiesof infraverbal organisms
are often structured that way (1988, 41). Now , it is certainly true
that an animal might be able to respond to aRb and not to bRa. But my
claim is that in such a case(cetmsparibus) we should conclude not that it
" " "
has, say, the thought a is taller than b but cannot have the thought b is
"
taller than a. Rather, its patent incapacity to have a spectrumof thoughts
involving a, b, and the taller-than relation should defeat the attempt to
ascribeto it the thought that a is taller than b in the first place. Perhapsit
hasa thought we might try to describeas the thought that a-is-taller-than-
b. But it doesnot have the thought reportedwith the ordinary sententialapparatus
of our language. For grasp of sucha thought requiresa grasp of its
componentconcepts, and that requires the generality constraint.
' " satisfying
" observation that you don t
'
s
In short, Fodor and Pylyshyn empirical
find creatureswhose mental life consistsof seventy-four unrelatedthoughts
" " '
is no empiricalfact at all. It is a conceptual fact, just as the thinking robot s
failure to have a single, isolated thought is a conceptualfact. Indeed, the
one is just a limiting caseof the other. A radically punctatemind is no mind
at all.
Theseobservationsshould begin to give us a handleon the actualnature
of thought ascription. Though.t ascription, as we saw in chapter 3, is a
meansof making senseof a whole body of behavior (actual and counterfactual
). We ascribea networkof thoughts to accountfor and describea rich
variety of behavioral responses . This picture of thought ascription echoes
the claims made in Dennett (1981). The folk-psychological practice of
"
thought ascription, he suggests, might best be viewed as a rationalistic
calculusof interpretation and prediction- an idealizing, abstract, instru-
"
mentalisticinterpretationmethod that has evolved becauseit works (1981",
" instrumentalism
48). If we put asidethe irrealistic overtones of the term
(a move Dennett himselfnow approvesof- see Dennett 1987, 69- 81), the
" "
general idea is that thought ascription is an abstract, idealising, holistic
process, which therefore need not correspond in any simple way to the
detailsof any story of in-the-headprocessing. The latter story is to be "told
"
by what Dennett (1981) calls sub-personal cognitive psychology. In
short, there need be no neat and tidy quasireductivebiconditional linking
in-the-head processingto the sentential ascriptionsof belief and thought
madein daily language. Instead, a subtle story about in-the-headprocessing
must explain a rich body of behavior (actual and counterfactual , external
a
and internal), which we thenmakeholistic senseof by ascribing systematic
network of abstractthoughts.
It may now seemthat we have succeeded in merelyrelocatingthe systematicity
that Fodor and Pylyshyn require. For though it is a conceptualfact,
and henceasunmysteriousto a connectionistas to a classicist,that thoughts
148 Chapter 8

are systematic, it is a plain old empircalfact that behavior (which holistically


warrants thought ascriptions) is generally as systematicas it is. If behavior
wasn't systematic, the upshot would be, not punctateminds, but no minds.
But that it is systematicis an empirical fad in need of explanation. That
explanation, according to Fodor and Pylyshyn, will involve wheeling
out the symbolic combinatorial apparatusof classicalAI . So doesn't the
classicistwin , though one level down, so to speak1No , at leastnot without
an independent
' argument for what I called conceptual-level compositional
structure. It s time to say what that means.
One pivotal difference between classicalaccounts and those that are
genuinely and distinctively connectionist lies, according to Fodor and
Pylyshyn, in the nature of the internal representationsthey posit. Recall
that classicistsposit internal representationsthat have a semanticand syntactic
structuresimilar to the sentencesof a natural language. This is often
" - but not connedionist theories-
put as the claim that classicaltheories
' "
a
postulate language of thought (Fodor and Pylyshyn 1988, 12). And
what that amounts to is at least that the internal representation, like a
sentenceof natural language, be composedof parts that, together with
syntactic rules, determine the meaningsof the complex strings in which
they 6gure. It is further presumedthat theseparts will more or lessline up
with the very words that 6gure in the sentencesthat report the thoughts.
Thus, to have the thought that John loves the girl is to stand in some
relation to a complex internal token whose proper parts have the context-
" " ' "
independentmeaningsof John, 10ves, and so on. This is what it is to
have a conceptual -level compositionalsemanticsfor internal representations .
Distributed connectionists, in contrast, were seen not to posit recurrent
internal items that line up with the parts of conceptual-level descriptions.
"
Thus, The coffee is in the cup" would, we saw, have a subpattern that
stands for " coffee." But that subpattern will be heavily dependent on
context and will involve microfeaturesthat are specific to the in-the-cup
context. We need not dwell further on the details of this differencehere.
For our purposes, the important point is simply this: There is no independent
argument for the conceptual-level compositionality of internal
representations . And without one, systematidty does not count against
connectionism.
Let us see how this point works. Fodor and Pylyshyn require a kind
of systematicity that arguesfor a languageof thought, i.e., for a system
of internal representationswith conceptual-level compositionality. One
approximation to suchan argument in their text is the following : 'it is . . .
' ' ' ' ' ' ' '
only insofar' as the girl loves '
and John make the samesemanticcontribution
to Johnloves the girl that they maketo ' Thegirl loves John' that
understandingthe one sentenceimplies understandingthe other" (Fodor
and Pylyshyn 1988, 42). If the locusof systematicityin needof explanation
Structuredthought 149

lay in thought-ascribing sentences , then this would indeed constitute the


requiredargument . But the systematicityof thought-ascribingsentencesis,
we saw, a conceptual matter. Finding thoughts there at all requiresthat the
ascriptive sentencesform a highly structured network. What is not a
conceptual matter is the systematicity of the behaviorthat holistically
wa" antsascriptionsof thoughts. But here there is no obvious pressurefor
a system of internal representationsthat themselves have conceptual-level
systematicity . All we need is to be shown an internal organization that
explainswhy being a able to
interestingly respond to a blue squareinside
a yellow triangle, for example, should also be able interestingly to respond
to a yellow squarein a blue triangle. And connectionistmodels, invoking,
e.g., various geometric microfeaturesas a means of identifying squares
and triangles, can do just this. And they can do so even if the resultant
systemhasno single internal state that constitutesa recurrentand context-
" " " "
independentrepresentation square triangle, and so on. Likewise, in
of
the room example(reported in chapter5, section4) we are shown a model
that canrepresentbedroomsand living rooms as setsof microfeatures.And
it is no mysterious coincidencethat the model could thereby representa
large fancy bedroom (one with a sofa in it ). It could do so becauseof the
recurrenceof many microfeaturesacrossall three cases . Highly distributed,
microfeaturalsystemswill thus exhibit all kinds of systematicbehavioral
competencewithout that competencerequiring explanation in terms of
conceptual-level compositionality.
In sum, the systematicity of thoughts is a conceptualrequirementif we
are to be justiAed in Anding thoughts there at all. What standsin need of
empiricalexplanation is not the systematicity of thoughts but the systematicity
of the behavior, which grounds thought ascription. Such systematicity
indeed suggestsrecurrent and recombinableelements. But there is
no reason to suppose these have to have a conceptual-level semantics .
(Indeed, given the holistic nature of the thought ascriptions from which
conceptual-level entities are drawn, this looks unlikely.)
The lemmaof unstructuredrepresentations , upon which the anticonnec-
tionist force of the argument from systematicity depends, is thus unsupported
. All that is supportedis, if you like, a lemmaof no conceptual-level
structure. But once we cease to be blinded by the glare of sentential
thought ascriptions, any lack of conceptual-level structure ceasesto be a
problem and begins to look suspiciouslylike an advantage.
This shows, incidentally, why a certain kind of defenseof Fodor and
' '
Pylyshyn s position won t work. The defense(put to me by Ned Block)
claims that the implausibility of Anding neat in-the-head correlates to
conceptual-level structures is irrelevent to the claims of a language of
thought. For what the language-of-thought hypothesisclaims, accordingto
this defense, is just that there must be somesystematicdescription of the
ISO Chapter
8

thought that ligures in a computationalaccountof the mind of the thinker.


In other words, if there is a systematicdescriptionin Englishof a creature's
thought, we need to postulate an intemallanguage that correspondspoint
by point to somesystematicdescriptionof that thought, thougl\ it neednot
correspondto the particular desaiption given by the English sentence.
Sucha defencewon' t work againstthe criticism I am advancing. For my
criticism flows directly from the picture of thought ascription developed
above (and in chapter3). The upshot of this position is that as a holist (and
indeed, a kind of behaviorist) about thought ascription, I would deny that
any systematicdescription of a thought as picked out by our daily talk of
thoughts is likely to be a good guide to actualin-the-headprocessing. This
is becausemy accountdrives a wedge between real in-the-head statesand
thought reports. The items that our daily talk picks out as thoughts are not,
on my account, good candidatesfor instantaneousbrain states. So even if
you redescribethose items in someother way, the problem remains. For it
is a problem about the very objects in need of computationalexplanation.
I deny that daily thought reports isolate those objects. As a result, my
'
scepticismstandseven against Block s version of the systematicity claim.
No description of thoughts reported in our daily talk will capture systematic
fads about in-the-headprocessing.

4 CognitiveArchitecture
Fodor and Pylyshyn also criticize connectionistsfor confusing the level of
psychologicalexplanationand the level of implementation. Of course, the
brain is a connectionistmachineat one level they say. But that level may
not be identical with the level of description that should occupy anyone
interestedin our cognitivearchitecture . For the latter may be best described
in the terms appropriate to some virtual machine (a classicalone, they
believe) implementedon a connectionist substructure. A cognitive architecture
"
, we are told, consists of the set of basic operations, resources ,
functions, principles, etc. .. . whose domain and range are the representational
statesof the organism" (Fodor and Pylyshyn, 1988, 10). Fodor and
'
Pylyshyn s claim is that such operations, resources , etc. are fundamentally
classical; they consist of structure-sensitiveprocesses definedover internal,
classical, conceptual-level representations . Thus, if we were convinced of
the need for classicalrepresentationsand processes, the mere fact that the
brain is a kind of connectionistnetwork ought not to impressus. Connec-
tionist architecturescan be implemented in classicalmachinesand vice
versa.
This argument in its pure form need not concernus if we totally reject
Fodor and Pylyshyn' s reasonsfor believing in classicalrepresentationsand
processes. But it is, I think, worth pausing to note that an intermediate
Structuredthought 151

position is possible. Supposewe acceptedthe idea (suggestedin chapter 7)


that for somepurposesat leastthe brain simulatesa classicalmachineusing
'
a connectionist substructure. Even then, I suggest, it doesnt follow that
the connectionistsubstructureconstitutespsychologicallyirrelevant implementation
detail. For one benefit of connectionistresearchhas surely been
to show how psychologically interesting properties can emerge out of
what looks like mere implementationdetail from a classicalperspective.
' "
To take an example, considerSmolenskys (1988, 13) idea of a subconceptually
"
implementedrule-interpreter. This is, in effect, a classicalsymbol
processorimplementedusing a connectionist machine. Now consider the
task of generating mathematicalproofs. Implementing the classicalrule
interpreter on a larger connectionist substructure, Smolensky suggests,
might permit it to accessand use some characteristicPOP operations. For
example, it may generatethe applicablerule by using a flexible, context-
sensitive, best-match procedure. Once generated, however, the rules could
be applied rigidly and in serial fashion by the classicalvirtual machine.
"
Thus, the serialsearchthrough the spaceof possiblestepsthat is necessary
in a purely symbolic approachis replacedby intuitive generationof possibilities
. Yet the preciseadherenceto strict inferencerules that is demanded
by the task can be enforcedby the rule interpreter; the creativity of intuition
"
can be exploited while its unreliability can be controlled (Smolensky
1988, 13).
It seems , then, that a connectionistimplementationof a classicalmachine
(the rule interpreter) may involve representationsin ways that, if they are
to be explained at all, require referenceto the details of connectionist
modesof storageand recall. Fodor and Pylyshyn can of course, deny that
the explanation of such properties is a matter of proper psychological
interest. But this surely does not ring true. To take just a single case, to
explain semanticallyspecificdeficits (roughly, aphasiasin which classes of
-
knowledge are differentially imparled e.g ., the loss of namesof indoor
objectsor of fruits and vegetables), Warrington and McCarthy (1987) propose
an explanation that is both connectionist (or associationist ) in spirit
and dependson factors that, to a classicist, would look remarkably like
" mere "
implementationdetail. Roughly, they suggestthat a connectionist
model of the development and storageof semanticknowledge may account
for the fractionationsobserved.
At the very least, it is surely a mistaketo think (as Fodor and Pylyshyn
appearto do) in terms of one task, one cognitive model. For (as suggested
in chapter 7) our performanceof any top-level task (e.g., mathematical
proof) may require computational explanation in terms of a number of
possibly interacting virtual machines, some classicaL some connectionist,
and someof an unknown architecture. This multiplicity will be mirrored in
the explanation of various task-related pathologies, aphasias , and so on.
152 8
Chapter
What is implementation detail relative to one aspect of our performance of
a particular task may be highly relevant (psychological , representation -
involving ) detail relative to other aspects of the same task.
What this opens up is the possibility of a partial reconciliation between
the proponents of classical AI and connectionists . For some aspects of our
performance of some tasks, it may well be entirely correct (i.e., nonapproximate
) and necessary to couch a psychological explanation in classical terms.
'
Perhaps some of Fodor and Pylyshyn s arguments serve to pick out those
aspects of human performance that involve the use of a virtual classical
machine (for example , their comments on the difficulties that beset certain
kinds of logical inference if we use context -relative representations [ 1988,
46 ]). But since their completely general argument for the systematicity of
thought fails, these casesall constitute a much smaller part of human cognition
than they expect . And even when a classical virtual machine is somehow
implicated in our processing , its operation may be deeply and inextricably
interwoven with the operation of various connectionist machines.
In a recent talk, G. Hinton hinted at a picture of high -level cognition that
would facilitate just such a reconciliation . The idea is to give a system two
different internal representations of everything . One of these would be a
single feature representing, e.g ., Mary (Hinton calls this the reduced description
). The other would be a fully articulated microfeatural representation of
Mary (the so-called expanded description ). And it would be possible to go
on from one of these to the other in some nonarbitrary fashion. The idea
was not elaborated , and I shall not speculate too much on it here. But it is
worth just noting that such a model would seem to provide (in the reduced
descriptions ) the kind of data structure upon which any virtual classical
machine would need to operate . But it provides such structures within the
context of an overall system in which the existence and availability of
the expanded , connectionist representation is presumably crucial to many
aspects of performance .

5 Two Kindsof CognitiveScience


Fodor and Pylyshyn' s mistakeis to project the systematicity of ascriptions
of propositional attitudes directly back onto a matching syntactic systematicity
in brain computation. The root causeof this is a failure to understand
the nature and goals of thought talk itself. (Again, recall chapter 3.)
This samemistakeis, I believe, the causeof an upsetting pathology within
the cognitive sciencecommunity. The pathology makesitself felt in all-too-
frequent exchangesalong the following lines: in discussionfollowing a
paper on an AI model of some aspect of human thought a questioner
addresses the speaker.
Structuredthought 153

Questioner : Are you claimingthat that's how humanbeingsthink then?


'
Speaker : Oh no. Absolutely not. Of course , our brains don t use a
'
predicatecalculus. In fact, it s unlikely that the algorithms we usebear
any relation to logical calculi at all.
Questioner : So your project is really part of technological AI . You
want someprogram to get a certain input-output mapping right , but
'
you don t really careabout how humansthink.
'
Speaker : Well, not exactly. Really, it s hard to say what the project is,
'
becauseit is human thought that we re trying to model.
Such exchangesare by no meansuncommon. They are not restricted to
fledgling AI workers, nor are they the exclusivehallmark of conventional
cognitivists.
Here is an admittedly highly speculativehypothesis about the causeof
such confusion. The received wisdom is that AI comes in two varieties:
technologicalAI , in which the goal is simply to get a machine to do
somethingwith no commitment to producing a model of human psychology
, and psychological AI (or cognitive science), in which the goal is to
produce computational model of human or animal psychological states
a
and processes.
But supposethat the argumentsdevelopedin chapter3 have someforce.
Suppose,that is, that thought ascription is essentiallya matter of imposing
a holistic interpretation upon a large body of behavior in an environmental
context. The individual thoughts thus ascribedare perfectly real, but they
are not the kind of entities that have neat, projectible, computational
analoguein the brain. What then becomesof the project of psychological
AI or, more generally, cognitive science ?
The radical conjectureI would like briefly to pursue is this. Cognitive
scienceturns out to encompasstwo projects, each laudableand legitimate
but absolutely distinct. These projects coincide with two ways of understanding
the notion of a psychologicalmodel. A psychologicalmodel may
be a model of the complex structure of human (or animal) thought i.e., the
holistic network of ascriptionsof contentful states. Or it could be a model
of the computationaloperationsin the brain that in part makepossiblethe
rich and varied behavior we describe using propositional-attitude talk.
Contrary to what Fodor thinks, thesemodels will typically be nonisomor-
phic, though there may be exceptions(seebelow). In short, there are two
kinds of cognitive science: descriptive cognitive scienceand causalcognitive
science.

attempts to give a formal theory or model


Descriptivecognitivescience
of the structure of the abstract domain of thoughts, using the computer
program as a tool or medium.
154 Chapter 8

Causal cognitivescienceattempts to give an account of the inner


computationalcausesof the intelligent behaviors that form the basis
for the ascriptionof thoughts.
Notice that descriptivecognitive scienceis not the sameas technological
AI , since descriptive cognitive science cares a great deal about actual
human psychology. Nor is it the sameas what Searle(1980) calls " weak
AI ." For weak AI usesthe computer as a tool to develop formal models of
brain causesof intelligent behavior. Weak AI is a weak version of causal
cognitive sciencethat does not believe that merely instantiating a formal
model of the brain causesof intelligent behavior will yield a system with
intentional states(seechapter 2). Incidentally, the distinction between descriptive
and causalcognitive scienceyields quite a neat perspectiveon
'
Searles generalorientation. For Searleobjectsthat a computermanipulating
formal tokens is at most manipulating the formal shadows of thought
(again, see chapter 2). We can now reconstruethis claim as a claim that
manipulating the descriptions of thought contents is very different from
replicating the inner causesof intelligent behavior. Running a program
written according to the paradigm of descriptive cognitive scienceis not
the way to instantiate thoughts. Searleis right to see this but wrong to
concludethat the differencelies in our biological makeup.
My radicalclaim, then, is that a great deal of good and important work
within cognitive scienceis unwittinglydescriptivecognitive science . Failure
to distinguish the two projectsleadsto the confusionand disarraydepicted
in my imagined dialogue. Anyone engagedin the descriptive task is pri-
'
marily interestedin human thought but nonethelessisnt giving a model of
.
the computationalbrain causesof intelligent behavior. r 0 makethis idea of
a descriptive cognitive scienceclearer, I shall first draw a parallel with the
study of grammarand then offer someexamplesof the approachin action.

6 Grammars , and Descriptivism


, Rules
The cognitive status of grammar is a vexed question to which I do not
pretend to do justice here. Instead, my purposeis to use e.ristingpositions
on the statusof grammaras an illustration of descriptivecognitive science.
There are three broad positions on the nature of a grammarfor a natural
language. The first is propositional psychologicalrealism.
If a is a competent speakerof a language, a's competenceis causally
explained by unconsciousknowledge of the rules of a grammar for
the language. These rules are internally representedby structuresin
a' s head that have the syntax of the natural language sentences
describingthe rules.
Structuredthought lSS

Propositionalpsychologicalrealismthus saysthat a good grammarshould


be psychologically real and written in the form of explicit rules in alanguage
of thought. .
The secondposition is structuralpsychologicalrealism.
'
If a is a competent speakerof a language, a s competenceis causally
' -
explained by the fact that a s information processingcapacitiesare
structured in a way suggestedby the form of a grammar for the
language.
Structuralpsychological realismdrops the requirementof explicit, sentential
coding of grammaticalrules. It merely requiresthat a good grammarfix
(perhapsnonuniquely) on a functional decompositionof the languageproduction
systemof the brain. For example, if the grammarinvolves a system
of rules for regular casesand a list of exceptions to such rules, it will be
good just in casethere is a true in-the-head, information-processingstory
that posits both a system of lexical accessthat lists exceptions and a
distinct, nonlexical component. Katz adopted this position when he wrote:
"
Componential distinctions between . . . syntactic, phonological and semantic
componentsmust rest on relevant differencesbetweenthree neural
submechanisms of the mechanismwhich stores the linguistic description.
The rules of eachcomponent must have their psychologicalreality in the
"
input-output operations of the computing machinery of this mechanism
(1964, 133).
The third position on the nature of grammarfor natural languageis
A good grammarfor a languageis any theory that yields all and only
the sentencescharacterizedas grammaticalby a competent speaker
of the language.4 Such a grammar need not be unique, nor need it
suggestthe form or content of any psychologicallyrealistictheory of
languageproduction or understanding.
" " "
( Note that psychologically realistic here and elsewheremeans fixing"
on actual structural or computational featuresof in-the-head processing.
Becauseof the ambiguity of the notion of a psychologicalmodel, this usage
could be misleading. But it is standardin the literature, and it seemsto be
unduenit -picking to demur.) Descriptivismhasno truck with the questions
of psychologicalreality at all. It treats the set of grammaticalsentencesas
given and seeksa formal characterizationof that set. A robot appraisedof
such a characterizationwould produce only grammaticalutterances. But
human languageproduction and understandingcould work along entirely
different lines, and the descriptivist would be unphased.Sucha position is
adopted by Stich (1972) and more" recently by Devitt and Sterelny (1987).
Thus, Stich writes that a grammar describescertainlanguage-specificfacts:
facts about the acceptabilityof expressionsto speakersand facts about an
156 Chapter 8

ability or capacityspeakershave for judging and classifyingexpressionsas


having or lacking grammatical properties and relations. . . . It is perhaps
misleading to describe[the grammarian] as constructing a theory of the
languageof his subjects. Ratherhe is building a description of the facts of
"
acceptabilityand linguistic intuition (1972, 219- 220).
The debate between the psychological realists and the descriptivists
has on occasionbeen needlesslyacrimonious. The realists(Fodor [ 1980b],
Chomsky and Katz [ 1974]), accusethe descriptivists of flouting general
canonsof scientific practice. The descriptivists (Stich [1971, 1972], Devitt
and Sterelny [1987, 142- 146]) accusethe realistsof misapplying inference
to the best explanation. The debate (nicely summarizedin Devitt and
"
Sterelny 1987) has gone roughly like this. First volley : A good grammar
for English will generateall and only the grammaticalstrings of English.
The best explanation of the fact that competent speakersgenerate and
judge grammaticaljust grammaticalstrings is that they internally represent
the grammar in some way. Grammar is thus psychologically real." The
"
descriptivists return the ball by saying, Psycholinguistic evidence seems
to play little role, so far, in the determination of grammars. If that pattern
continues, a good grammarmay well be found that is simple, elegant, and
gets the right strings. But why treat this as a basisfor an inferenceto the
psychologicalreality of grammar? After all,
First, we would want evidencethat G [the grammar] was a candidate
for psychologicalimplementation; that the transformationalprocesses
it implicatedwere within the computationalambit of the mind. Second ,
the very eleganceand simplicity of G is rather more evidenceagainst,
than evidencefor, it being the grammar our brain is built to use . . . ,
[since] adaptations are typically not maximally efficient engineering
solutions to the problems they solve. Finally, . . . the fact that G is
'
maximally efficient and elegant from the grammarians point of view
does not entitle us to supposeit is optimal from the brain' s point of
view. (Devitt and Sterelny 1987, 145- 146)
Notice how the descriptivists' responsefits with the general principles of
evolutionary designraisedin chapter4. And notice also that the final point
concerningthe need to gear psychologically realisticmodels to constraints
imposedby the structureof the brain is highly conduciveto a connectionist
approachto modeling the brain basisof grammaticalcompetence.
For all that, however, the dispute ought not to call for a choice. Rather,
we should concludeonly the following : (1) The grammarsactually being
constructedby working linguists are unlikely to be psychologically real.
Nonetheless, they are useful descriptionsof real properties of natural languages
. (2) Theoristswhose goal is the construction of modelsof the brain
basisof grammaticalcompetencewill need to focus not only on the data
thought 157
Structured

and grammarsof (1) but also on the structureof the brain, psycholinguistic
evidence,and even, perhaps,evolutionary conjecturesconcerningthe origins
of speechand language(see, e.g., Tennant 1984). In short, what is neededis
clarity concerningthe goalsof various studies, not a _victory of one choice
of study over another. Devitt and Sterelnystrike a nice balance, concluding
"
that linguists are usefully studying not internal mechanismsbut the truth-
"
conditionally relevant syntactic properties of linguistic symbols (1984,
146), while nonethelessallowing that such studies may illuminate some
general featuresof internal mechanismsand hence (quite apart from their
intrinsic interest) may still be of useto the theorist concernedwith brain
structures.
What is thus true of the study of grammaris equallytrue, I suggest, of the
study of thought. Contentful thought is what is desaibed by propositional-
attitude ascriptions. These ascriptionsconstitute a classof objects susceptible
to various formal treatments, just as the sentencesjudged grammatical
constitute a class of objects susceptibleto various formal treatments. In
both cases , computationalapproaches can help suggestand test suchtreatments
. But in both casesthesecomputationaltreatmentsand a psychologically
realisticstory about the brain basisof sentenceproduction or holding
propositional attitudes may be expectedto come apart.

7 Is NaivePhysicsin theHead?
There is a type of work within cognitive scienceknown variously as naive
physics, qualitative reasoning, or the fonnalization of commonsenseknowledge
. I want to end this chapter by suggestingthat most of this work, at
least in its classicalcognitivist incarnations, may fall under the umbrella of
what I am calling descriptive cognitive science. If so, this is a clear casein
which descriptive cognitive scienceand causalcognitive sciencehave got
badly confused. For people working in the field of naive physics typically
conceiveof their work asvery psychologically realistic, in contradistinction
to much other AI work.
The basicidea behind naive physicsis simple and hasalreadybeenmentioned
in chapter 3, section 6. Naive physics is an attempt to capture the
kind of commonsenseknowledge that mobile, embodied beings need to
get around in the real world. We all know a lot about tension and rigidity :
't
you can push an object with a pieceof string. And we know about liquidity
, solidity, elasticity, spreading , and so on. The list is endless. An ~ the
'
knowledge is .
essential Without it we couldn t spreadmarmite on toast,
predict that beer will spill off a smooth unbounded table top, or drag a
a
shopping trolley along bumpy road. But the project still underspecified
is .
I gave the goal as that of capturing commonsenseknowledge. But the
-
metaphorof capturing is always dangerous, for it leavesthe criteria of suc
158 8
Chapter

cessdeeply obscure. In the present case" capturing commonsenseknowledge


"
could mean either exhibiting the structure of the set of knowledge
'
ascriptionswarranted by a being s practical capacitiesto get around in the
world (the descriptive option), or exhibiting the structure or program of a
computational brain mechanismthat enablesthe being to get around the
world (the causaloption).
There can be little doubt about which project most naive physiciststake
themselvesto be engagedin. Here are a few quotes from recent articles.
'
We want the overall pattern of consequences producedby [our] theory to
correspond reason
ably faithfully to our own intuition in both breadth and
detail. Giventhe hypothesis that our own intuition is itselfrealisedasa theoryof
this kind insideour heads , the [naive-physics] theory we construct will be
with this inner " '
equipotent theory ( Hayes1985a, 5i my emphasis ). We
should . . . concentrate. . . on the details of what must be in the headsof
thinkers. . . . When we know what it is that people know, we can begin to
make realistic theories about how they work. Because work largelyby
" " they
usingthis knowledge(Hayes 1985a, 35i my emphasis ) . The motivations for
a
developing qualitative physics stem from in
outstanding problems psychology
, education, artificial intelligence and physics. We want to identify
the coreknowledgethat underlies physical intuition " (de Kleer and Brown
1985, 109i my emphasis).
'
Examplescould be multiplied (see, e.g ., articles in Hobbs and Moore
1985 and in Hallam and Mellish 1987). Not all naive physidsts insist on
'
psychologicalreality as much as Hayes. J. R. Hobbs writes, We , at leastin
the short term, are happy to have any theory, regardlessof how accurately
it modelspeople, provided it is formally adequate" (1985, p. xvi ). The same
slant is evident in David Israel's well-motivated question, "Psychologically
realistic or not, are logical formalismsappropriate media for representing
our commonsenseknowledge of the worldr ' (1985, 430). Both Hobbs and
Israel are thus open to viewing their enterprise as descriptive cognitive
science.
That said, many naive physicists do take themselvesto be modeling
computational brain processes. Yet there is surely room here for some
doubts precisely analogousto those raised in the previous section in the
context of grammaticalcompetence.Justas in the caseof grammar, we face
a situation in which human agents are visibly competent at a certain kind
of problem solving. And just as we can elidt grammaticalintuitions from
subjects, so too can we elidt naive physical intuitions. As Hayes (1985, 31)
points out, basic physical intuitions are surprisingly easy to extract from
subjects. A theory of grammarwill offer an elegant formal schemewithin
which to derive suchintuitions. And a theory of naive physicswill likewise,
aim at an elegant formal schemefrom which the intuitive conclusions
follow for some domain. But the worry is the same in each case. Just
Structured
thought 159
becausewe find an eleganttheory in which to formally representor derive
the intuitive consequences , why assumethat humancompetenceis explained
by our internally representingthat theory to ourselves?
Consider for example, Hayes's work (198Sb) on a naive physics for
understandingliquids. The methodology involves attempting a " taxonomy
"
of the possiblestatesliquid canbe in and integrating this with rules about
movement, change, and liquid geometry. The propositions and rules discovered
and encodedin the theory include a sped8cationof 8fteen states
of liquid and seventy-four numberedaxioms written in predicatecalculus.
The upshot is a formal system capableof supporting somequalitative reasoning
about the behavior of liquids.
The working assumptionof the causalinterpretation of naive physics is
that human competencein this area is due to our having internalized a
theory having such axioms. An alternative is to supposethat the axioms
offer a formal description of the spaceof the intuitions we in fact generate
in some other way, perhapsby meansof a more direct simulation of the
propertiesof liquids in which the formal syntacticelementsdo not admit of
any neat mapping onto entities and relations defined in natural language.
This dislocationof the syntax of computationalactivity from the semantics
of natural-languagesentencesused to describeits products is just what we
find in, for example, connectionist systems. And such systemsare highly
adaptedto modeling problem domainsinvolving the simultaneoussatisfaction
of many soft constraints (e.g ., vision). Uquid behavior may well be
suited to just sucha dynamic model. If so, the intuitions of naive subjects
will be a poor clue to the computational structure of the system whose
output consistsof those very intuitions.
It may be thought that the distinction between descriptionand causeis
just a new way of phrasing Marr' s well-known distinction between task
'
analysisand algorithm (level 1 versus level 2). This is a mistake. Marr s
picture is one in which the details of task analysis provide a structural
blueprint for the algorithmic account. On the model suggested , there need
be useful no relation between the structure of a task analysis and the
underlying algorithm (image here a descriptionof our naive physical competence
and a connectionistmodel of it ).
In sum, there is at least as much room to doubt the causalcredentialsof
naive physics as there is to doubt the causalcredentialsof grammars. In
eachcasewe may well be modeling relationsbetweenthe productsof brain
processand not the brain processitself. Of course, there will be important
relations between the two. If we are to guesshow we do something, we
had better know what the somethingis in somedetail. But it should at least
be controversial to just assumethe kind of direct, semanticallytransparent
relations posited by Hayes for naive physics, by Fodor for thought in
general, and by Katz for grammar. In general, if a product is describablein
160 Chapter 8

a particular , systematic way that ought not to be taken as conclusive


evidence for a similarly articulated computational cause in the brainis

8 Refusingthe SyntacticChallenge
Just to round off this chapter, let me say a word about what Fodor calls
intentional realism, i.e., the belief (sic) that beliefs and desiresare real and
are causesof actions. I suspectthat Fodor is driven to defend the position
that computationalarticulation in the brain mirrors the structure of ascriptions
of propositional attitudes by a fear that beliefsand desirescanonly be
causesif they turn up in formal guise as part of the physical story behind
intelligent behavior. But this neednot be the case. If belief and desiretalk is
a holistic net thrown over an entire body of intelligent behavior, we need
not expect regular syntactic analoguesto particular beliefs and desiresto
turn up in the head. All we need is that there should be somephysical,
causal story, and that talk of beliefs and desires should make sense of
behavior. Suchmaking sensedoes involve a notion of cause, since beliefs
do causeactions. But unless we believe that there is only one model of
causation, the physical, this needn't causeany discomfort (see also the
argumentin the appendix).
Fodor's approach is dangerous. By accepting the bogus challenge to
produce syntactic brain analogues
'
to linguistic ascriptions of belief contents
, he opens the Pandoras box of eliminative materialism. For if such
analoguesare not found, he must conclude that there are no beliefs and
desires. The mere possibility of such a conclusion is surely an effective
reductioad absurdum of any theory that gives it housespace.
Chapter9
StructuredThought, Part 2

1 Good News and Bad News

First, the good news. POP, as illustrated in chapters 5 to 7, affords an


approachto computational modeling that should be attractive to anyone
engagedin what I have calledcausalcognitive science.That is, it shouldbe
attractive to thosewho seekto model the in-the-headcomputationalcauses
of intelligent behavior. Its principal merits include the power of its learning
algorithms, its fine-grained shadingof meaning, free generalization, and the
flexibility that goes with distributed representationsof microfeatures.
Now the bad news. POP, as illustrated in chapters5 to 7, affords an
approachto computational modeling that should be unattradive to anyone
engaged in what I have called causalcognitive science. That is, it
should be unattractive to those who seekto model the in-the-head com-
putational causesof intelligent behavior. Its principal demeritsinclude the
power of its learning algorithms, its fine-grained shadingof meaning, free
generalization,and the flexibility that goeswith distributed representations
of microfeatures.
All this is not as contradictory as it sounds. The very properties of
POPmodels that are advantageousin someproblem domainsare disadvantageous
in others, just as being well adaptedto survive underwatermay be
a major disadvantagewhen beachedon dry land.
Hints of such a dark side to POP were dropped in chapter 7. It is now
time to brave the demons. I begin by outlining a particular POP model in
section 2. I then report an influential critique of that model (Pinker and
Prince 1988) in section3 and raisesomequite-generalworries in section4.
I then give considerableattention to the moral of the story. In the closing
section (section 6) I reject Pinker and Prince's moral and put something
more ecumenicalin its place.

2 ThePast-Tense
-AcquisitionNetwork
The particular POP model that Pinker and Prince use as the focus of their
attack is the past-tense-acquisition network described in Rumelhart and
162 Chapter 9

McClelland 1986 (216- 271). The point of the exercisefor Rumelhartand


McClelland was to provide an alternative to the psychologically realistic
interpretation of theories of grammar described briefly in the previous "
chapter. The counter-claim madeby Rumelhartand McClelland is that the
mechanismsthat processlanguageand makejudgments of grammaticality
are constructedin such a way that their performanceis characterisableby
[grammatical] rules, but that the rules themselvesare not written in explicit
form anywherein the mechanism "
(1986, 217).
-
Thus construed, the past tense-acquisition network, would aim to provide
an alternative to what I called propositional psychologicalrealism in
chapter 8, section 6, i.e. the view that grammaticalrules are encodedin a
sentential format and read by some internal mechanism . But this, as we
saw, is a very radicalclaim and is by no meansmadeby all the proponents
of conventional symbol-processingmodels of grammaticalcompetence . It
turns out, however, that this POPmodel in fact constitutesa challengeeven
to the weaker, and more commonly held, position of structural psychological
realism. Structural psychological realism is here the claim that the
in-the-head information-processing system underlying grammatical competence
is structured in a way that makes the rule-invoking description
"
exactlytrue. As Pinker and Princeput it , Rules couldbe explicitly inscribed
and accessed , but they also could be implemented in hardware in such a
way that every consequenceof the rule-systemholds. [If so] there is a clear
sensein which the rule-theory is validated" (1988, 168).
The past-tense network challengesstructural psychological realism by
generating the systematic behavior of past-tense formation without respecting
the information-processingarticulation of a conventional model.
At its most basic, such articulation involves positing separate , rule-based
mechanismsfor generating the past tense of regular verbs and straightforward
memorizationmechanismsfor generatingthe past tenseof irregular
verbs. Call theseputative mechanismsthe no~lexical and the lexical components
"
respectively. On the proposed POP model, The child need not
decide whether a verb is regular or irregular. There is no question as to
whether the inflected form should be stored directly in the lexicon or
derived from more generalprinciples. . . . A uniform procedureis appliedfor
"
producing the past tense form in every case (Rumelhartand McClelland
1986, 267).
One reasonfor positing the existenceof a rule-based, nonlexical component
lies in the developmentalsequenceof the acquisitionof past tense
competence. It is this developmentaldata that Rumelhartand McClelland
are particularly concernedto explain in a novel way. The data show three
'
stagesin the developmentof a child s ability to correctly generatethe past
tense of verbs (Kuczaj 1977). In the first stage the child can give the correct
form for a small number of verbs, including some regular and some
thought 163
Structured

irregular ones. In the second the child over regularizesi she seems to
" "stage
have learned the regular -ed ending for English past tenses and can give
this ending for new and even made-up verbs. But she will now mistakenly
" "
give an -ed ending for irregular verbs , including ones she got right at
stage one. The overregularization stage has two substages, one in which
" " " "
the present form gets the - ed ending (e.g ., come becomes comed ) and
" " " " " "
one in which the past form gets it (e.g ., ate becomes ated and came
" "
becomes camed ). The third and final stage is when the child finally gets
" "
it right , adding -ed to regulars and novel verbs and generating various
irregular or subregular forms for the rest.
Classical models , as Pinker and Prince note , account for this data in an
intuitively obvious way . They posit an initial stage in which the child has
effectively memorized a small set of forms in a totally unsystematic and
unconnected way . This is stage one. At stage two , according to this story ,
the child manages to extract a rule covering a large number of cases. But
the rule is now mistakenly deployed to generate all past tenses. At the final
stage this is put right . Now the child uses lexical , memorized , item -indexed
resources to handle irregular cases and nonlexical , rule -based resources to
handle regular ones.
Classical models , however , typically exhibit a good deal more structure
than this bare minimum (see, e.g ., the model in Pinker 1984). The processing
is decomposed into a set of functional components including a lexicon of
structural elements (items like stems, pre Axes, suffix es, and past tenses), a
structural rule system for such elements, and phonetic elements and rules.
A classical model so constructed will posit a variety of mechanisms that
represent the data differently (morphological and phonetic representations )
with access and feed relations between the mechanisms. In a sense, the
classical models here are transparent with respect to the articulation of
linguistic theory . Distinct linguistic theories dealing with , e.g ., morphology
and phonology are paired with distinct in -the-head, information -processing
mechanisms.
The POP model challenges this assumption that in - the -head mechanisms
mirror structured , componential , rule -based linguistic theories. It is not
necessary to dwell in detail on the Rumelhart and McClelland model to see
why this is so. The model takes as input a representation of the verb constructed
entirely out of phonetic microfeatures . It uses a standard POP
pattern assooator to learn to map phonetic microfeature representations of
the root form of verbs to a past-tensed output (again expressed as a set of
phonetic microfeatures ). It learns these pairings by the usual iterated process
of weight adjustments described in previous chapters. The basic structure
of the model is thus: phonetic representations of root forms are input
into a POP pattern associator, and phonetic representations of past forms
result as output . !
164 Chapter9

The information processingstructure of the classicalmodel is thus dissolved


. One kind of mechanismis doing all .the work both for the regular
and irregular forms (recall the quote from Rumelhartand McOelland 1986,
267). And none of the system's computational operations are explidtly
definedto deal with suchentities as verb stems, prefixes and suffixes (note
that this is not just a lack of labels; the systemnowhere accordsany spedal
statusto the morphological chunksof words that suchlabelspick out). As
noted by Pinkerand Prince, the radicalimplicationsof sucha model include
. The use of a direct
phonetic modification of the root without any
abstractmorphological representation,
. The elimination of any process
dealing spedally with lexical items
as a locus of idiosyncrasy,
. The use of a qualitatively identical system for
regular and irregular
occurrences(adaptedfrom Pinker and Prince 1988, 95).
There is thus a quite-extensive dissolution of the structure of a classical
model. Not only do we fail to find any explicit tokening of rules such as
" add ' '
-ed to form regular past tenses," but more important, we don't even
find any broad articulation of the system into distinct components, one
dealing with rule-based behavior and another dealing with exceptional
items.
To its undeniablecredit the Rumelhart and McClelland model is able
to generatemuch of the required behavior (e.g ., the three stagesof development
) without any such structuring. In so doing it relies on the usual
distinctive properties of POP models, that is, on automatic shading of
meaning, blending, and generalization (see chapters 5 to 7). Thus, for
example, it finally deals with new casesas if they were regular verbs
becausethis is the correct generalizationof the overall thrust of its training
" "
input data. The -ed ending, we might say, hasby then worn down a very
deepgroove indeed. Nonetheless , the specialcontext provided by inputting
a known irregular root can override this groove and cause the correct
irregular inflection but only after sufficienttraining. The model thus goes
through a stageof overregularizing and learnsin time to get it right . Most
impressively, the model also producesthe secondkind of overregulariza-
tion error observedat stagetwo: it also overregularizes by adding " -ed" to
the pasttenseof irregular verbs, produdng errors like " camed," " ated." The
explanation of this must lie in the system's blendingtwo known patterns
"
from eat to " eated" (the regular " -ed" ending) and from " eat" to " ate,"
"
and theseyield " ated" (seePinker and Prince 1988).
The POP model thus recapitulatesthe three stagesof development as
follows:

Stage1. There is simpleencodingof a variety of present-past pairings.


Structured thought 165

Stage2. The automaticgeneralizationmechanismextracts


" -ed"
a regularity
implicit in the data and then knows the standard ending. For a
while this pattern swamps the rest, and causesoverregularization.
Further training begins to remind the system of the exceptions. But
" "
now we find a blend of the -ed pattern and the exception patterns,
" ated" -
yielding type errors.
Stage3. Further gradual tuning puts it all right . The exceptionsand
the regular patternspeacefullycoexist in a single network.
All of this is just rosy, but darknesslooms just around the comer.

3 ThePinkerand PrinceCritique
Pinker and Prince (1988) raise a number of objections to a POP model of
'
childrens acquisitionof the past tense. Someof thesecriticismsare specific
to the particular POP model just discussed " , while the others are at least
sugg ~ stive of difficulties with any nontn vial POP model of such a skill.2 I
shall only be concernedwith difficulties of this last kind. Suchcasescan be
roughly groupedinto four types. Theseconcern(1) the model's overreliance
on the environmentasa sourceof structure(2) the power of the POP learning
algorithms (this relates to the counterfactualspaceoccupied by such
models, a spacethat is argued to be psychologicallyunrealistic), (3) the use
of the distinctive POP operation of blending, and (4) the use of microfeature
representations .

Overreliance on theenvironment
The Rumelhartand McClelland model, we saw, made the transition from
stage1 (rote knowledge) to stage2 (extraction of regularity). But how was
this achieved? It was achieved, it seems,by Arst exposing the network to a
population mainly of irregular verbs (10 verbs, 2 regular) and thenpresenting
it with a massiveinflux of regular verbs (410 verbs, 344 regular). This
suddenand dramatic influx of regular verbs in the training population is
'
the sole causeof the model s transition from stageone to stagetwo. Thus,
"The model' s shift from correct to over
regularizedforms does not emerge "
from any endogenousprocess: it is driven diredly by shifts in the input
(Pinker and Prince 1988, 138). By contrast, some developmentalpsychologists
(e.g ., Karmiloff-Smith [1987]) believe that the shift is causedby an
internally driven attempt to organize and understandthe data. Certainly,
there is no empiricalevidencethat a suddenshift in the nature of the input
population must precedethe transition to stage 2 (seePinker and Prince,
1988, 142).
The general point here is that POP models utilize a very powerful
-
learning mechanismthat, when given well chosen inputs, can learn to
166 9
Chapter

produce almost any behavior you care to name. But a deep reliance on
highly structured inputs may reduce the psychological attractivenessof
such models. Moreover, the spaceof counterfactualsassociatedwith an
input-driven model may be psychologically implausible. Given a different
set of inputs, these models might go straight to stage 2, or even regress
from stage2 to stage 1. It is at least not obvious that humaninfants enjoy
the samedegreeof freedom.

Thepowerof the learningalgorithms


This is a continuation of the worry just raised. The power of POP systems
to extract statisticalregularitiesin the input data, it is argued, is simply too
'
great to be psychologically realistic. Competent speakersof English cant
easily learn the kinds of regularity that a POP model would find unproblematic
. Sucha model could learn what Pinker and Prince describeas " the
"
quintessentialunlinguisticmap relating a string to its mirror-imagereversal
(1988, 100). Human beings, it seems,have extremedifficulty learning such
regularities. But a good explanation of language acquisition, Pinker and
Princerightly insist, must explain what we cannotlearn as well as what we
can. One way to explain such selective learning capacitiesis to posit a
higher degree of internalorganization geared to certain kinds of learning.
Suchorganization is found in classicalmodels. The price of dissolving such
organization and replacing it with structured input may be a steep reduction
in broaderpsychologicalplausibility.

Blending
We saw in section 2 above how the model generateserrors by blending
two such patterns as from " eat" to " ate" and from " eat" to " eated" to
" " '
produce the pattern from eat" to ated: By contrast a conventional rule-
basedaccountwould posit a mechanismspecificallygeared to operate on
the stems of regular verbs, inflecting them as required. If this nonlexical
" "
componentwere mistakenly given "
ate as a stem, it would simply inflect
it sausage -machine fashion into ated: ' The choice, then, is between an
explanationby blending within a single mechanismand an explanationof
misfeedingwithin a systemthat hasa distinct nonlexicalmechanism . Pinker
and Prince (1988, 157) point to evidencewhich favors the latter, classical
option.
If blending is the psychological processresponsible, it is reasonableto
expect a whole classof such errors. For example , we might expect blends
of common middle-vowel changesand the " -ed" ending (from " shape" to
" " " " " "
shipped and from sip to sepped). Children exhibit no ~uch errors. If,
on the other hand, the guilty processis misfeedto a nonlexicalmechanism ,
we should expect to find othererrors of inflection basedon a mistakenstem
" " "
(from went" to wenting ). Children do exhibit sucherrors.
Structuredthought 167

Microfeaturerepresentations
The Rumelhartand McClelland model relies on the distinctive POP device
of distributed microfeaturerepresentation.The use of sucha form of representation
buys a certain kind of automatic generalization. But it may not
be the right kind. The model, we saw, achievesits ends without applying
computational operations to any" syntactic entities with a semantics
" or " suffix." Insteadproledible
given by such labels as stem , its notion of
stemsis just the center of a state spaceof instancesof strings presentedfor
inAectioninto the past tense. The lack of a representationof stemsas such
deprivesthe system of any meansof encoding the generalidea of a regular
" "
past form (i.e., stem + ed ). Regular forms can be produced just in case
the stem in a newly presentedcaseis sufficiently similar to those encountered
in training runs. The upshot of this is a much more constrainedgen-
eralization than that achievedwithin a classicalmodel, which incorporates
a nonlexicalcomponent. For the latter would do its work whateverwe gave
it as input. Whether this is good or bad (as far as the psychologicalrealism
of the model is concerned) is, I think, an open question. For the moment, I
simply note the distinction. (pinkerand Princeclearly hold it to be bad; see
Pinkerand Prince 1988, 124.)
A more generalworry , stemmingfrom the sameroot, is that generaliza-
tion basedon pure microfeaturerepresentationis blind. Pinker and Prince
note that when humansgeneralize, they typically do so by relying on a
theory of which microfeaturesare importantin a given context. This knowledge
of salient featurescan far outweigh any more quantitative notion of
similarity based simply on the number of common microfeatures. They
write, "To take one example, knowledgeof how a set of perceptualfeatures
'
was caused. . . can override any generalizationsinspired by the object s
featuresthemselves:for example, an animal that looks exactly like a skunk
will nonethelessbe treated as a raccon if one is told that the stripe was
"
paintedonto an animal that had racconparentsand raccoonbabies (pinker
and Prince 1988, 177). Human generalization, it seems, is not the same
as the automatic generalization according to similarity of microfeatures
found in POP. Rather, it is driven by high-level knowledge of the domain
concerned.
To bring this out, it may be worth developing a final example of my
own. Consider the processof understandingmetaphor, and assumethat a
successfulmetaphor illuminates a target domain by meansof certain features
of the home domain of the metaphor. Supposefurther that both the
metaphorand the target are eachrepresentedas setsof microfeaturesthus:
( MMF1, . . . , MMFII) and ( TMF1, . . . , TMFII) ( MMF = metaphor microfeature
, TMF = target microfeature). It might seem that the necessary
capacityto conceiveof the target in the terms suggestedby the metaphor
is just another example of shading meaning according to context, a ca-
168 9
Chapter
'
pacity that as we ve seen, POP systemsare admirably suited to exhibit.
Thus, just as we earlier saw how to conceiveof a bedroom along the lines
suggestedby inclusion of a sofa, so we might now expect to seehow to
conceiveof a raven along the lines suggestedby the contextual inclusion
of a writing desk.
But in fact there is a very importance difference. For in shading the
meaning of bedroom, the relevantmicrofeatures(i.e., sofa) were already
specified. Both the joy and mystery of metaphorlies in the lack of any such
specification. It is the job of one who hearsthe metaphor to find the salient
featuresand thento shadethe target domain accordingly. In other words,
we need somehow to fix on a salient subsetof ( MM Ft , . . . , MMF " >. And
such fixation must surely proceed in the light of high-level knowledge
concerningthe problem at hand and the target domain involved. In short,
not all microfeaturesare equal, and a good many of our cognitive skills
depend on deciding accordingto high-levelknowledgewhich ones to .
attend
to in a given instance.

4 Pathology
And the bad newsjust keepson coming. Not only do we have the charges
of the Pinker and Prince critique to worry about. There is also a body of
somewhatrecalcitrantpathological data.
Consider the disorder known as developmental dysphasia. Developmental
dysphasicsare slow at learning to talk, yet appear to suffer from
no sensory, environmental, or general intellectual defect. Given the task
of repeatinga phonological sequence , developmentaldysphasicswill typically
return a syntactically simplified version of the sentence.For example,
" ' " " " " "
given He cant go home, they produce He no go or He no can go.
The simplifications often include the loss of grammatical morphemes-
suffixes marking tense or number- and generally do not affect word
stems. Thus "bees" may become"bee," but " nose" does not become" no,"
( Theabove is basedon Harris and Coltheart 1986, 111.) The existenceof a
deficit that can impair the production of the grammaticalmorphemeswhile
leaving the word stem intact seemsprima fade to be evidencefor a distinct
nonlexical mechanism . We would expect such a defidt whenever the nonlexical
mechanismis disengagedor its output ignored for whatever reason.
Or again, consider what is known as surfacedyslexia.3 Some surface
dyslexics lose the capadty correctly to read aloud irregular words, while
retaining the capadty to pronounceregular words intact. When facedwith
an irregular words, suchpatientswill generatea regular pronoundation for
it. Thus, the irregular word " pint" is pronounced as if it rhymed with
" "
regular words like mint. This is taken to support a dual-route accountof
reading aloud, i.e., an accountin which a nonlexical componentdealswith
Structuredthought 169

"
regular words. If the reading systemdoes include thesetwo separateprocessing
components, it might be possiblethat neurological damagecould
impair one component whilst leaving the other intact, to produce [this]
" Harris and Coltheart 1986, 244).
specific pattern of acquired dyslexia (
Suchdata certainly seemsto support a picture that includesat least some
distinct rule-basedprocessing, a picture that on the faceof it is ruled out by
single-network PDP models.
However, caution is needed. Martin Davies has pointed out that sucha
conclusionmay be basedon an unimaginative idea of the ways in which a
single network could sufferdamage(Davies , forthcoming, 19). Davies does
not develop a specificsuggestionin print,4 but we can at least imagine the
following kind of case. Imaginea single network in which presentedwords
must yield a certain level of activation of someoutput units. And imagine
that by plugging into an often-repeatedpattern, the regular words have, as
it were, worn a very deepgroove into the system. With sufficienttraining,
the system can alsolearn to give correct outputs (pronoundation instructions
) for irregular words. But the depth of groove here is always lessthan
that for the regularwords, perhapsjust abovethe outputting threshold. Now
imaginea kind of damagethat decrementsall the connectivity strengthsby
10 percent. This could move all the irregular words below the threshold,
while leaving the originally very strong regular pattern functional. This
kind of scenariooffers at least the beginnings of a single network account
of surfacedyslexia. For someactualexamplesof the way PDP modelscould
be used to account for pathological data, see Mcdelland and Rumelhart
1986, which dealswith various amnesicsyndromes.
Pathologicaldata, I conclude, at best suggests a certain kind of classical
structuring of the human information-processing system into lexical and
nonlexical components. But we must concludewith Davies that such data
is not compelling in advanceof a thorough analysisof the kinds of breakdown
that complex PDP systemscan exhibit. It seems, then, that we are
left with the problemsraisedby the Pinker and Princecritique. In the next
section I shall argue that although theseproblemsare real and significant,
the conclusionsto which they lead Pinker and Prince are by no means
commensuratewith their content.

5 And theMoral of the StoryIs . . .


A direct responseto the Pinker and Prince criticisms could no doubt be
constructed. It could be argued, for example, that what lets a particularPOP
network in question down is just its choiceof microfeaturesand that many
" "
of the other general criticismsflow Horn that. Thus, it might be that the
microfeaturesthat nature actually focuseson in languageacquisition constrain
our learning in the ways required, (e.g ., by making pairings of words
170 9
Chapter
with their mirror-image reversals effectively unlearnable ).5 The lack of
certainkinds of blending errors might be explainedin the sameway. As for
fixing on salient features for generalization, perhapsa self-programming
network could be constructed that, in a sensitive context-driven way,
amendsconnectivity weights to suit current needs. Similarly, when one
natural featureis of great biological importance, we might expect to And a
high weight on the activations to which that unit gives rise. Pinker and
Prince complain that a network that automatically generalizesby microfeatures
couldn' t help but treat in the same way two snakessimilar in
appearance' but one poisonous and the other not. But this is simply not
true; the weight on the " poisonous" microfeaturecould be so high as to
have dramaticeffectswhenever that unit is active.
I shall not, however, pursue such lines of response. In the long term, a
more-indirect responsewill be both more effective and more interesting.
The indirect responseis to grant the general form of Pinker and Prince's
worries (sharedby many cognitive scientists), but to contest the moral of
the story itself.
The worries of section3 together suggestthe need for
(1) More information-processingstrudurein a POPmodel of language
acquisition, e.g ., a morphological as well as a phonetic component,
(2) Somekind of control structureable, e.g., to specify salientmicrofeatures
fQr inductive generalization,
(3) Somecapacityfor labeling and variablebinding to allow, e.g ., the
representationof the generalidea of a verb stem.
There is, I submit, nothing especially radical here. Many POP theorists
recognizethe need for modular or hierarchicalorganization to satisfy need
(1) and control and variable binding to satisfy needs (2) and (3). These
demandsare all perfedly explicit in Nonnan 1986, 539- 543. To the degree
that suchfacilitiesturn out to be requiredfor high-level cognitive modeling,
there will have to be somebroadly classicalcomponentialstructuring of the
information-processingsystemsconcerned.
But it by no meansfollows that any such model will be a mere implementatio
of a classicaltheory. Yet this is exactly what Pinker and Prince
lead us to expect. They write, 'if the subcomponentsof a traditional
accountwere kept distinct in a POP model, mapping onto distinct subnetworks
or pools of units with their own inputs and outputs, or onto distinct
layers a multilayer network, one would naturally say that the network
of
simply implemented the traditional account" (Pinkerand Prince 1988, 179).
"
Or again, Subsymbolism. . . will not be indicatedif the principal structures
of . . . hypothetical improved modelsturn out to be dictated by higher-level
theory rather than by micronecessities . To the extent that connectionist
Structuredthought 111

models are not mere isotropic node tangles, they will themselveshave
propertiesthat callout for explanation. We expect that in most casesthese
-
explanationswill constitute"the macro theory of the rules that the system
would be said to implement ( Pinker and Prince 1988, 171). There are two
claimshere that need to be distinguished.
(1) Any POP model exhibiting some classicalcomponential structuring
is just an implementationof a classicaltheory.
(2) The explanationof this broad structuring will typically involve
the useof classicalrule-basedmodels.
Claim (1) is clearly false. Evenif a large connectionist system needs to
-
deploy a complete, virtual, symbol processingmechanism(recallchapter7),
it by no meansfollows that the overall system produced merely implements
a classicaltheory of information processingin that domain. This is
probably best demonstratedby someexamples.
Recall the example (chapter 8, section 4) of a subconceptuallyimplemented
rule interpreter. This is a virtual symbol processor- a symbol processor
and rule-userrealizedin a POPsubstructure.Now take a task suchas
the creation of a mathematicalproof. In such a case, we saw, the system
could use characteristicPOP operations to generate candidaterules that
would be passedto the rule interpreter for inspection and deployment.
Sucha system has the best of both worlds. The POP operations provide
an intuitive (best-match), context-sensitive choice of rules. The classical
operationsensurethe validity of the rule (blendsare not allowed) and its
strict deployment.
Somesuchstory could be told for any truly rule-governed domain. Take
chess,for example. In sucha domain a thoroughly soft and intuitive system
would be prone to just the kinds of errors suggestedby Pinker and Prince.
The fact that someonelearnsto play chessusing piecesof a certain shape
ought not to causeher to treat the: bishopsin a new set as pawns because
of their microfeaturesimilarity to the training pawns. Chessconstitutes a
domain in which absolutehard, functional individuation is calledfor; it also
demandscategorical and rigid rule-following . It would be a disaster to
allow the microfeature similarity of a pawn to a bishop to prompt a
blending of the rules for moving bishopsand pawns. A blend of two good
rules is almost certain to be a bad one. Yet a combinedPOP and virtual
symbol-processingsystemwould again exhibit all the advantagesoutlined.
It would think up possible moves fluidly and intuitively , but it could
subject these ideas to very high-level scrutiny, identify pieces by hard,
functional individuation and be absolutely precisein its adherenceto the
explicit rules of the game.
As a secondexample, considerthe problem of understandingmetaphor
raised earlier. And now imagine a combined POP and virtual symbol-
172 9
Chapter

processing( VSP) system that operates in the following way. The VSP
system inspectsthe microfeaturerepresentationof the metaphor and the
target. On the basis of high-level knowledge of the target domain it
choosesa salient set of metaphor microfeatures. It then activates that set
and allows the characteristicPOP shadingprocessto amendthe representation
of the target domain as required.
Finally, consider the three-stage- developmentalcaseitself, and imagine
that there is, as classicalmodels suggest, a genuine distinction between
lexical and nonlexical processingstrategies. But suppose, in addition, that
the nonlexicalprocessis learnedby the child and that the learning process
itself is to be given a POP model. This yields the following picture:
Stage1. Correct use, unsystematic. This stageis explainedby a pure
POP mechanismof storageand recall.
Transition. A POP model involving endogenous(and perhapsinnate)
structuring, which forces the child to generatea nonlexical processing
strategy to explain to itself the regularitiesin its own language
production
Stage2. Overregularizationdue to suddenrelianceon a newly formed
nonlexical strategy
Transition . A POP model of tuning by correction
Stage 3. Normal use. The coexistenceof a pure POP mechanismof
lexical accessand a nonlexical mechanismimplementedwith POP
If some suchmodel were accurate(and something like this model is in fact
'
contemplatedin Kanniloff-Smith 1987), we would not havea classicalpicture
of development, although we might have a classicalpicture of adult use.6
To sum up, the mere fact that a system exhibits a degree of classical
structuring into various components(one of which might be a rule interpreter
) does not force the conclusionthat it is a mere implementationof a
classicaltheory. This is so because(a) the classicalcomponentsmay call
and accesspowerful POP operations of matching, search, blending and
generalization and (b) the developmental process by which the system
achievessuchstructuremay itself requirea POPexplanation. daim (1) thus
fails. It may be, however, that to understandwhy the final system must
have the structure it does, we will need to think in classical , symbol-
manipulatingterms. This secondclaim (claim 2, p. 171) is consideredin the
next section.

6 TheTheoretical
Analysisof MixedModels
What are the theoreticalimplicationsof mixed POP and VSP models ?
Onethought, whichwe havealreadyrejected , is that any suchmodelmust
Structuredthought 173

be a mereimplementationof classicalinformation processingin the domain.


An equally radicaland equally misguidedthought is that classicalmodelsin
suchcasesare at best approximationsto the true story told at the level of
a POP analysisof units and passingvalues. The mistakein both casesis to
think in terms of one model for each task. For as we saw in chapter 7
section 6, we individuate tasks according to our particular interests. But
our performanceof any top-level task may need to be computationally explained
in terms of a number of interacting virtual machines.And someof
thesevirtual machinesmay needto be understoodin classicalterms, e.g., in
terms of their performing classicaloperationson items suchas verb stems,
suffixes, numbers, phonemes, morphemes, English words, and so on. And
others may require understandingin connectionistterms, e.g ., in terms of
operations on microfeaturesthat are not semanticallytransparent. Thus,
recall Fodor's argument in chapter 8. It is debatable, but perhaps these
arguments succeedin demonstrating the need for somecomputational
operations defined over sentential items (and hence semantically transparent
"
items). This seemsplausiblein the caseof what he calls causaltrains
"
of thought, conscioussequencesof sententially formulated mental states
(seeFodor 1987, 147). But it would hardly follow from this that all or even
most of our cognitive activity is best explainedin suchterms.
When a single task is carried out by a complex of virtual machinesall
implementedin a POParchitecturebut somesimulatingclassicaloperations
of hard matching and serialprocessing, a good psychologicalmodel of the
'
informationprocessinginvolved will haveto be multiplex.RecallSmolenskyS
ideaof a mathematicalproof generatorthat hasan intuitive componentand
a classicalcomponent. In that case, the processof finding candidaterules to
apply involves best-match operationsrequiring a POP explanation. But the
final selectionand deployment of the candidaterules requireexplanationin
termsof classicaloperations. Moreover, asclaim 2 implies, this overallsetup
itself needsto be understoodas a responseto the genuine, hard, rule-based
nature of mathematicalproof.
Or again, recall the mixed model of metaphorunderstanding~A theoretical
analysisof the systemrequiresa classicalaccountof the choiceof salient
microfeatures(including errors in choice), a POP account of shading the
target domain, and a POP account of the distributed microfeaturerepresentation
of target and metaphor. Moreover, mixed systemswill be susceptible
to various kinds of breakdown. Somebreakdownpatternsmay be
explicable only by adverting to the underlying POP substrate used to
implement a symbol processor; others may make senseat the level of
virtual symbol-processingoperations; and still others may affect the pure
POP component itself. We might speculatethat breakdownsof this last
kind (the loss of some pure POP processing power that leaves virtual
174 9
Chapter

symbol-processingcapacitiesintact) are behind some of the fascinating


casesreported in Sacks(1986).7
Mixed models thus require multiplex forms of psychologicalor compu-
tational explanation. Not just different cognitive tasks,but different aspects
of the sametask now seemto need different kinds of algorithmic explanation
. Since humans must &equently negotiate some truly rule-governed
problem domains(e.g ., chess, language, mathematics ), someform of mixed
model may well be the most effedive explanation. The apparentsuccessof
thoroughly soft POP systemsin negotiating some such domains(e.g., the
model of past-tenseacquisition) may be due to a concealedbolt -on symbol-
processing unit, us. In the model of past-tense acquisition the system
received stems and then inflected versions becausewe chose to divide
the verbs up like that. Pinker and Prince describe this choice as relying
"
on intuitive protolinguistics." So in that sense, even the Rumelhart and
McClelland system has a bolt -on symbolic component. At any rate, if
mixed models are required (for whatever reason), the consequencesmust
include the generalfailure of the uniformity principle (seechapter 7). More
specifically, they must include:
. The rejection of the claim that any model exhibiting classicalcomponentia
structure is a mere implementationof a classicaltheory,
. The rejection of the claim that any classicalaccount is at best an
approximation to a correct POP-basedaccount.
Instead, corred explanationsmust be gearedto the virtual machineresponsible
for particular aspectsof performing the task. All of this is nicely
ecumenical , I'm sure.
It would be boring, however, to closewithout makingat leastone inflammatory
claim. The powerbehind our gross symbol-processingcapacities -
-
a factor that makesus thinkers and, e.g., BACON not may well be the
subsymbolic, pattern-matching power of somethinglike a POP mechanism
operating within us. There is a strong intuition that manipulating gross
symbolic structuresmodels the lonn of some of our thought but somehow
leavesout the content. The intuition is often put by saying that such
programshave no understandingof what the symbol manipulationsmean.
Perhaps,then, understandinginvolves spontaneouslyseeingpatterns, spot-
ting similarities,shadingmeanings , and so on. (Thisposition is most strongly
advancedin Hofstadter 1985.) Of the two modesof thought treated in this
book, it would seemthe POPmode is in somesenseprimary. This certainly
fits our normal usage. Many of us allow that lower animalshave thoughts
of somekind. They are plausibly seenas advanced, complex POP machines
that havenot yet developedour capacitiesfor symbolic representation.Yet
we deny thoughts to BACON and SHROLU, programsthat certainlymanip-
Structuredthought 175

ulate gross symbolic representationsbut lack any rich pattern matching


substructure.
If this picture is correct, we should maintain a dual thesis concerning
explanation and instantiation. We should hold that good psychological
explanations will often involve mixed modelsand hencewill requireanalysis
in both POPandclassicalsymbol-manipulatingterms. But we may alsohold
that instantiating any contentful psychological state requiresnot just the
manipulationof gross symbolic structuresbut also accessto the output of
a powerful subsymbolic processor. A virtual symbol processorprovides
guidanceand rigor; the POP substrateprovides the Auldity and inspiration
without which symbol processingis but an empty shell. In words that Kant
never used: subsymbolic processingwithout symbolic guidance is blind;
symbolic processingwithout subsymbolicsupport is empty.
Chapter10
theJigsaw
Reassembling

1 ThePieces
All the pieces of the jigsaw are now before us, and their subgroupings
are largely complete. SemanticallytransparentAI models have been described
and comparedwith highly distributedconnectionistsystems.Various
worries about the power and methodology of both kinds of work have
been presented. The possibility of mixed models of cognitive processing
has been raised, and the nature of folk-psychologicaltalk and its role in a
scienceof cognitive processinghas been discussed . Along the way I have
criticized the argumentsin favor of Fodor's radical cognitivism, and I was
forced to distinguish two projectswithin cognitive science:one descriptive
and involving the essentialuse of classicalrepresentations ; the other concerned
with modeling the computationalcauses of intelligent behavior and
typically not dependenton suchrepresentations . At the end of the previous
chapter I also drew a distinction internal to causal cognitive science: the
distinction between the project of psychological explanation(laying out
the computational causesof intelligent behavior) and that of instantiation
(making a machinethat actually has thoughts). These two projects, I suggested
, may come apart. This final chapter(which also functions as a kind
of selective summary and conclusion) expands on this last piece of the
jigsaw and tries to display as clearly as possible the overall structure of
what I have assembled . In effect, it displays a picture of the relations of
various parts of an intellectualmap of the mind.
One word of warning. Since I should be as precise.as possible about
what each part of this intellectual map is doing, for the duration of the
chapter I shall largely do away with shorthand talk of representations ,
beliefs, and so on to describecontemporarycomputer models(recall chapter
6, section 2, and chapter 5, footnote 4). At times this will result in
languagethat is somewhatcumbersomeand drawn out.
178 Chapter 10

2 Buildinga Thinker
What does it take to build a thinker? Somephilosophersare scepticalthat a
sufficientcondition of being a thinker is satisfying a certain kind of fonnal
description (see chapter 2). Such worries have typically focused on the
kindsof fonnal descriptionsappropriate to semanticallytransparentAI . In
one sensewe have seen virtue in such worries.l It has indeed begun to
seem that satisfying certain fonnal descriptions is vastly inadequateto
ensurethat the creaturesatisfying the descriptionhasa cognitive apparatus
organized in a way capable of supporting the rich, flexible actual and
counterfactualbehavior that warrants an ascription of mental states to it.
(Apologies for the lengthy fonnulation- you were warned!) Somereasons
for thinking this were developedin chapter6, where I discussedthe holism
and flexibility achievedby systemsthat usedistributed representationsand
superpositionalstorage.
In short, many worries can usefully be targeted in what I am calling the
project of instantiation . They can be recast as worries to the effect that
satisfying the kind of fonnal description that specifiesa conventional,
semantically transparent program will never isolate a class .of physical
mechanismscapableof supporting the rich, Rexibleactualand counter /actual
behavior that warrants ascribing mental statesto the system instantiating
suchmechanisms . The first stagein an accountof instantiationthus involves
the description of the general structure of a mechanismcapableof supporting
suchrich and flexible behavior at thegreatestpossiblelevelof abstraction
from particularphysicaldevices . Searleseemsto believe that we reach
this level of abstraction beforewe leave the realms of biological description
(seechapter2). I seeno reasonto believe this, although it could con-
ceivably turn out to be true. Instead, my belief is that somenonbiological,
microfunctional description, such as that offered by a value-passingPOP
approach, will turn out to specify at leastone classof physical mechanisms
capable of supporting just the kind of rich and flexible behavior that
warrantsascribingmental states.
This is not to say, however, that its merelysatisfying some appropriate
fonnal descriptionwill warrant calling somethinga thinker. Instead, we need
to imaginea set of conditionsjointly sufficientfor instantiatingmental states,
one of which will involve satisfying somemicrofunctional description like
those offered by POP. I spoke of systemsthat could be properly credited
with mental statesif they instantiated such descriptions. And I spoke also
of a mechanismthat, suitably embodied, connectedand locatedin a system
would allow us to properly describeit in mentalistictenns. Theseprovisos
indicate the secondand final stageof an accountof instantiation.
Instantiating a mental state may not be a matter of possessinga certain
internalstructure alone. In previous chapterswe discoveredtwo reasonsto
Reassemblingthe jigsaw 179

believe that the configuration of the external world might figure among
the conditions of occupying a mental state. The first reason was that
the ascription of mental states may involve the world (chapter 3). The
content of a belief may vary according to the configuration of the world
(recall the twin earth cases[chapter 3, section 4]). And some beliefs (e.g .,
those involving demonstratives) may be simply unavailablein the absence
of their objects. What this suggestsis that instantiatingcertainmental states
may involve being suitably located and connectedto the world. (It does
not follow , as far as I can see, that a brain in a vat can have no thoughts at
all.)
We also noted in chapters4 and 7 a secondway in which external facts
may affect the capacity of a system to instantiate mental states. This is
the much more practical dimension of exploitation. A system (i.e., a brain
or a POP machine) may need to use external structuresand bodily operations
on such structuresto augment and even qualitatively alter its own
processingpowers. Thus, suppose that we accept that stage one of an
instantiationaccount(an accountof brain structures) involves a microfunctional
specificationof something like a POP system. We might also hold
that instantiating somemental states(for example, all those involving conscious
, symbolic and logical reasoning) requiresthat suchsystemsemulate
a different architecture. And we might believe that suchemulation is made
possibleonly by the capacity of an embodiedsystem located in a suitable
environment to exploit real-world structures to reduce complex, serial
processingtasksto an iterated seriesof POP operations. POP systemsare
essentially learning devices, and learning devices (e.g., babies) come to
occupy mental statesby interacting with a rich and varied environment.
For these very practical reasonsthe project of full instantiation may be
as dependenton embodiment and environmental structure as on internal
structure.
Most important of all, I suspect, is the holistic nature of thought ascription
. Thoughts, we may say, just are what gets ascribedusing sentences
expressingpropositional attitudes of belief, desire, and the like. Suchascriptions
are made on the basisof whole nexusesof actual behavior. If this is
the case, to have a certain thought is to engage in a whole range of
behaviors, a range that, for daily purposes, is usefully codified and explained
by a holistically intertwined set of ascribed beliefs and desires.
Sincethere will be no neat one-to-onemapping of thoughts so ascribedto
computational brain states (see chapters3 and 8 especially), it follows a
fortiori that therewill be no computationalbrain statethat is a sufficientcondition
of having that thought. The project I have calleddescrip Hvecognitive
sciencein effect gives a formal model of the internal relations of sentences
usedto ascribesuchthoughts. This is a usefulproject, but instantiating that
'
kind of formal description certainly wont give you a thinker. For the
180 10
Chapter

sentencesmerely describeregularitiesin the behavior and arenot gearedto


pick out the syntactic entities that are computationally manipulated to
producethe behavior.
To sum up, the project of instantiation is just the project of creating a
systemproperly describedasoccupying mental states. And it involves two
stages(to be pursuedcooperatively, not in series). Stage 1 is the description
, at the highest possiblelevel of abstraction, of a classof mechanisms
capableof supporting the kind of rich, flexible, actual and counterfactual
behavior neededto warrant the use of a mentalisticvocabulary. That level
of description will turn out, I believe, to be a microfunctional one. It may
well turn out to involve in part a microfunctional specificationof POP
systemsin terms of value passing, thresholds, and connectivity strengths.
I also urged that no highly semanticallytransparentmodel can fulfil the
requirementsof stage 1 of an instantiation, despite the claim made by
Newell and Simon that such approaches capture the necessaryand sufficient
conditions of intelligent action (see chapter 1, sections2, 3, and 4).
Stage 2 of the project of instantiation will involve the embodying and
environmentalembeddingof mechanismspicked out in stage 1. Only once
thesesystemsare embodied, up, and running in a suitably rich environment
will we be properly warranted in our ascriptionsof mental states.

3 Explaininga Thinker
The project of instantiation and the project of psychologicalmodeling and
explanationare different. This may seemobvious, but I suspecta great deal
of confusion within cognitive scienceis a direct result of not attending to
this distinction.
First and most obviously, the project of instantiation requiresonly that
we delimit a classof mechanismscapableof providing the causalsubstructure
to ground rich and varied behavior of the kind warranting the ascription
of mental states. There may be many suchclassesof mechanisms , and
an instantiation project may thus succeedwithout Arst delimiting the class
of mechanismsthat human brains belong to. But we may put this notion
aside at least as far as our interest in POP is concerned. POP is certainly
neurally inspired and aims to increase our knowledge of the class of
mechanismsto which we ourselvesare in somesignificantway related.
Secondand more importantly, even if the microfunctional description
that (for the instantiation project) delimits the classof mechanismsto which
we belong is entirely specifiedby a POP-style account, correct psychological
models and explanations of our thought may also require accounts
couchedat many different levels. To bring this out, recall my account of
Marr' s picture of the levels of understandingof an information processing
task (chapter1, section5). Psychological explanation , accordingto Rumelhart
thejigsaw 181
Reassembling
"
and McClelland (1986, 122- 124) is committed to an elucidation of the
" -
algorithmic level, i.e., Marr' s level 2. For the story at this level the level
that specifiesthe mode of representationand actual processingsteps-
provides the explanation of such phenomenaas speed, efficiency, relative
easein solving various problems, and graceful degradation (performance
with noise, inadequatedata, or damagedhardware). That is, the story at
the algorithmic level provides the explanation of the performancedata
with which real psychology is typically interested.
Supposewe accept this broad characterizationof the level of psychological
, a singlecomputationalmodel
interest. It will not follow that in general
servesto explainall suchdata relatingto the perfonnance of a given task. One
reasonfor this hasdirectly to do with the notion of virtual machines.Thus,
imagine a POP system engagedin the full or partial simulation of a more
conventional processor(e.g ., the environment manipulator for full simulation
and the mathematicalprover for partial simulation). In suchcaseswe
will needto advert to at leasttwo algorithmic descriptionsof the systemto
explain various kinds of data. The relative easewith which the system
solvesvarious problemsand the nature of the transformationsof representations
involved will often require an accountcouchedin terms of the top-
level virtual machine, e.g., a production system or a list processor. But
speedand graceful degradationwill need to be explainedby adverting to
an algorithmic descriptionof the POP implementationof someof the functions
found in the top level virtual machine. The thought is that not only
may different tasksrequire different forms of computational explanation,
but different kinds of data pertaining to a single task may likewise require
severaltypes of computationalmodels.
At this point someonemight object as follows. It may be convenientto
use a classicalserial model at times. But becauseof the underlying POP
implementation, a full and correct psychologicalexplanationalways can in
principle be given in POP algorithms alone.
This is a generalreductionist argumentthat in the extremeis sometimes
thought to threaten the integrity of the entire project of psychological
explanation. But the specterof reduction need not be feared. For explanation
is not just a matter 9f showing a structurethat is sufficientto induceor
constitute a certain higher-level state or process. It is also a matter of
depicting the structure at the right level. And the right level here is determined
by the need to capturegeneralizations about the phenomenapicked
out by the sciencein question. The general point has been made often
enough (see, e.g., Pylyshyn 1986, chapter I ), and I shall not labor it here.
Instead, I shall merely sketch the relevant instances. Consider the casesof
full simulation Horn chapter 7. Here we have ( by hypothesis) a POP
substructuresupporting the mode of representationof the input and output
and the processingstepsand pattern-matching characteristicsof a regular
182 Chapter 10

Yon Neumann processor running a semantically transparent program.


Other computationalsubstructures(e.g., a real Yon Neumannmachine) can
certainly support the samefeatures. And it is precisely thesefeaturesthat
detennine someof the psychologically relevant perfonnancedata, such as
the relative easeof solving various problems(which is keyed to the mode
of representingthe problem).
Supposewe tried to give a psychological explanation of this aspectof
the perfonnanceof the system , using only the fonnal apparatusof a POP
specification.We would merely succeedin obscuringthe fundamentalpsychologic
similarity of the systemsbuilt out of the various substructures
adverted to above. Of course, where the perfonnancediffers (e.g., in style
of degradation and speed), we will need a psychological explanation to
explain the differencealso. So for those other aspectswe may indeedneed
an algorithmic specificationat the POP level.
The basic point is well expressedby Hilary Putnam when he writes,
" "
Explanation is not transitive (1981, 207). What explainsthe simulation in
any given caseneednot itseHbe a good explanationof what the simulation
explainsif we are to retain the requisitedegreeof generality.
We can give essentiallythe sametreatment to approximationsby conventiona
accounts(seechapter6, section3). Insofar as a particular system
nonaccidentally approximatesthe behaviorof someother (e.g., a POP system
that behaveslargely as if it had the syntactic substructuresdescribedin a
conventional model), it may be said to partially simulatethe other system.
Now imaginea rangeof systemsall with different fonnal substructuresbut
all of which nonaccidentallypartially simulatethe behavior predictedby a
conventional model. And suppose, moreover, that the range of casesfor
which the conventionalmodel is accurateis the samein all of these. In that
case, I am inclined to say, there is a genuine psychologicalgeneralization
that is in need of explanation and that would evade our grasp unlesswe
avail ourselvesof the model provided in the conventional account.
In general, then, my claim is that the notion of a singlefonnal algorithmic
level appropriate to psychological explanation is misplaced. For different
tasksand different aspectsof the sametask may positively requirea variety
of algorithmic models detailing the processingundertakenby various virtual
machines.If so, then somepsychologicalexplanationis properly given
at the level of conventional, semanticallytransparentserial programs. But
other phenomena(e.g., creative leaps, flashesof insight, jokes coming to
us, analogicalunderstanding , perception, fast expert problem solving, and
so forth ) seem to require psychological models at the level of POP (or
microfunctional) accounts.Yarious neuropathologicaldata may also require
explanation at this level. Thus, the project of psychological explanation
may involve the constructionboth of microfunctional, POPaccountsand in
somecasesthe construction of serial, symbol-processingaccounts.
Reassemblingthe jigsaw 183

4 SomeCaveats
Part of my project is to assessthe importanceand role of POP models in
understandingthe humanmind. And the conclusionseemsto be that such
modelshave a major part to play in eachof the two projects (instantiation
and explanation) just distinguished. Sucha conclusion, however, needsto
be qualifiedin at least two regards.
First, POP mechanismsmay turn out to be just one among many kinds
of mechanismscapableof supporting the rich, flexible actual and counterfactual
behaviordemandedof a genuinecognizer. Thus, even if semantically
transparentapproaches lack the capacity to ground such behavior (as I
suspect), it doesnot follow that all thought requiresa POP substrate.
Second, the particular algorithms currently being explored by POP theorists
are almost certainly still inadequateto the task. The brain seemsto
employ many kinds of parallel cooperativenetworks, using different kinds
of units and connectivity patterns. And this variety may be essentialto its
power but is not yet present in POP work, which usesa simple, idealized
neuronlike unit. In a recent article on the computation of motion, two
"
leading theorists comment, Nerve cells exhibit a variety of information-
processingmechanisms ; the nerve cell membraneproducesand propagates
""
many different types of electricalsignals. One can think of the McCulloch
and Pitts model [seechapter 5 above] as equating a neuron with a single
transistorwhereasour model suggeststhat neuronsare more like computer
"
chips . with hundredsof transistorseach (Poggio and Koch 1987, 42 and
48). The idealized neurons of current POP models, it is fair to say, are
only a little finer and exhibit only a little more variety than the original
McCulloch and Pitts versions. Current models, then, may suffer many
severelimitations until workers in the field are in a position to introduce
greaterdetail and variety.
In a similar vein, the learning algorithms currently in favor (the generalized
delta rule and Boltzmann machine learning procedure) are most
probably inadequatein variousways. For example, as Rudi Lutz haspointed
out, the generalizeddelta rule requiresus in effect to tell the machinewhen
it is to learnand when it is simply to behaveon the basisof what it already
knows. Yet this explicit switching of modesis quite counterintuitive aspart
of any psychologicalmodel of humanlearning.
In short, the very broad brushstrokes of POP, the general idea of a
parallel, value-passingarchitectureencodinginformation in distributed patterns
of activity and connectivity, will probably constitute its positive
contribution to understandingthe mind, not the particular algorithms and
idealizedneurons currently under study. ( This is not, of course, any criticism
of the current work; creating such models and then trying to understand
their limits is the best way of improving upon them.)
184 10
Chapter

What the broad brushstrokesof POPgive us is an indication of one way


in which a physical, computational mechanismmay satisfy the range of
constraints on biological cognition developed in previous chapters. The
potential capacityof POP to satisfy theseconstraintsensuresits continued
philosophical and psychological interest. Such constraints were seen to
include:
. Robustness(a toleranceof local hardwaredamage),
. Fastsensoryprocessing,
. Sensibleaction when given partial or inconsistentdata,
. Economy of storageand retrieval,
. A capacity to deal with unanticipatedsituations (e.g ., to generalize
along unexpecteddimensions),
. Generalflexibility in the useand recovery of stored data,
. A powerful learning capacity,
. Rule-describablebehavior without explicit, fixed rules,
. Continuity with the kind of architecturesdictated in evolutionarily
basiccases(i.e., the constraintsimposedby the gradualisticholism of
evolutionary change),
. The capacity to shade meanings according to context, to create
schemataon demand, and so on.
Theseconstraintsand capacitiesform an interlinked, often overlapping set.
Taken together, they amount to a demandfor a computationalsubstructure
that supports maximally plastic and adaptablebehavior while simultaneously
sustainingthe extraction and storageof regularitiesand similaritiesin
its input. The capacity of a POP architectureto satisfy both needsin a fast,
natural, and economical way is a stunning achievement. In doing so, it
suggestsfor the first time just how a physical, computational mechanism
might support the sensible,flexible, open-endedbehavior that philosophers
have rightly demandedof any systemthat warrantsa descriptionin a men-
talistic vocabulary.
Epilogue
The Parableof the High-LevelArchitect

'
This little story won t make much senseunless it is read in the context
provided by chapter4, sectionS, and with an eye to the generaldistinction
1
betweensemanticallytransparentand semanticallyopaquesystems.
One me day, a high-level architect was idly musing (reciting Words-
worth) in the cloisteredconfinesof Kings CollegeChapel. Eyesraisedto that
- "
magni6cantceiling, she recited its well publicized virtues ( that branching
roof, self-poisedand scoopedinto ten thousandcells, where light and shade
"
repose. . . ). But her musingswere rudely interrupted.
From a far comer, wherein the fabric of reality was oh so gently parting,
"
a hypnotic voice commanded: High level Architect, look you well upon
the splendoursof this chapel roof. Mark well its regular pattern. Marvel
at the star shapesdecoratedwith rose and portcullis. And marvel all the
more as I tell you, thereis no magichere. All you see is complex, physical
architecture such as you yourself might re-create. Make this your project
: go and build for me a roof as splendid as the one you see before
you."
The high-level architectobeyed the call. Alone in her Aneglassand steel
office, shereflectedon the qualitiesof the roof shewas to re-create. Above
all, she recalledthose star shapes , so geometric, so perfect, the vehicle of
" " "
the rose and portcullis design itself. Those shapes , sheconcluded, merit
detailed attention. Further observation is called for. I shall return to the
"
chapel.
There ensuedsome days of patient observation and measurement . At
the end of this time the architect had at her command a set of rules to
locate and structure the shapesin just the way observed. Theserules, she
felt sure, must have been followed by the original designer. Here is a small
'
extract from the high-level architects notebook. .
I To createceiling shapesinstruct the builder (ChristopherPaul Ewe?)
! as follows:
if (build-shapes
) then
[(space-shapes(3-foot intervals ,
(align-shapes(horizontal ,
186 Epilogue

(arrange-shapes(point-to -point ,
(locate-shapes(intersection-of-pillar-diagonals ].
Later she would turn her attention to the pillars, but that could wait.
When the time came, shefelt, somemore rules would do the trick. Shehad
an idea of one already. It went " If (locate-pillar) then (make-pillar (45 ,
star-shape . It was a bit rough, but it could no doubt be reAned. And of
course, there'd be lots more rules to discover. " I do hope," she laughed,
" that
ChristopherPaul Ewe is able to follow all this. He'll need to be a Ane
"
logical thinker to do so. One thought, however, kept on returning like a
bad subroutine. 'Why are things arrangedin just that way? Why not have
some star shapesspacedfurther apart? Why not have some in a circle
insteadof in line? Just think of all the counterfactualpossibilities. What an
"
unimaginativesoul the original architectmust have been after all.
'
Fortunately for our heroines career, this heresy was kept largely to
herself. The Society for the Examination and Reconstitution of Chapels
gave her large researchgrants and the project of building a duplicate
ceiling went on. At last a prototype was ready. It was not perfect, and the
light and shadowhad a subtly different feel to it. But perhapsthat was mere
supersition. C. P. Ewe had worked well and followed instructions to the
letter. The fruits of their labors were truly impressive.
One day, however, a strange and terrible thing happened. An earthquake
(unusual for the locale) devastated the original chapel. Amateur
video, miraculouslypreserved, records the event. The "high-level architect,
upon viewing the horror, was surprisedto notice that the star shapesfell
and smashedin perfect coincidencewith the sway and fall of neighbouring
" " '
pillars. How strange, she thought; 1 have obviously been missing a
certain underlying unity of structure here." The next day she added a
new rule to her already massivenotebooks: 'if (pillar-falls) then (make-fall
"" " "
(neighboring-star-shape)). Of course, the architectadmitted, sucha rule
'
is not easyfor the builder to follow . But it s nothing a motion sensorand
somedynamite can't handle."
212 Notes to pages 160- 185

lack of interest in actual brain processingmechanismsby adverting to groUndsof broad


theoretic simplicity (seeStich 1972, 211).
S. An exception may be the caseof conscious thought . We may consciously deploy, say, a
mental model of ftuid 8ow to understand electricity or consciously apply grammatical
rules in the early stages of speaking a second language or consciously reason by
adducing a seriesof sententially formulated facts and rules. In suchcases,it seemsnatural
to assumethat at least some level of computational organization in the brain doesinvolve
operations on formal tokens having a projectible semanticsthat can be given using the
conceptsand relations appearingin sententialformulations. As for the other cases,which
form the bulk of daily life, we can only wait and seewhat develops.

Chapter9
1. Theactualstrudureof themodeliscomplicated
in various waysnotgennane
to present
concerns
. SeeRumelhart andMcClelland
1986for a fulla ount.
2. A trivial modelwouldbe onethat merelyuseda PDPsubstrateto implementaconventional
theory. But therearecomplications ; seesections.
here
3. Thisexampleis mentionedin Davies , forthcoming, 19.
4. Thanksto Martin Daviesfor suggestive conversations concerningtheseissues
.
s. I owe this suggestionto JimHunter.
6. This point wasmadein conversation by C. Peacocke .
7. Forexample , Sacl
<sreportsthecaseof Dr. P., a musicteacherwho, havinglost theholistic
ability to recognizefaCel', makesdo by recognizingdistinctivefacialfeaturesandusing
theseto identify individuals . Sackscommentsthat the processingthesepatientshave
intactis machinelike, by whichhemeanslike a conventional computermodel. As heputs
it:
dassicalneurology... has alwaysbeen mechanical .... Of coune, the brain is a
machineanda computer.. . , but our mentalprocess eswhichconstituteour beingand
our life, are not just abstractand mechanical [they] involve not just classifyingand
categorising , but continualjudging and feelingalso. If this is missing , we become
computer -like, as Dr. P. was.... By a sort of comicand awful analogy , our current
cognitiveneurologyandpsychologyresembles nothingsomuchaspoorDr. P.I (Sacks
1986, 18- 19)
Sacksadmonish escognitivesdencefor being" too abstractandcomputational ." But he
as well have said " too - "
might rigid, rule
-bound
, coarsegrained , andserial
.

Chapter 10
1. This is not to saythat the philosophers
who raisedthe worrieswiDagreethat they are
. They won't.
bestlocalizedin the way I go on to suggest

Epilogue
1. Thestoryisinspiredbytwosources
: theGouldandLewontin ofadaptationist
critique
thinking in chapter
, reported 4, andDouglas 'sbriefcomments
Hofstadter onoperating
systems , 641-642
(1985 ).
Appendix
BeyondEliminativism

1 A Distributed Argument

The main body of the text contains , in a somewhat distributed array , an


argument against the use of connectionist models to support the position
known as eliminative materialism . In this appendix , I gather those distributed
threads and weave them into an explicit rejection of eliminativism .
The appendix expands on various hints given in chapters 3 and 10 and
connects these, in a somewhat unexpected way , with the idea of mixed
symbolic and connectionist models , introduced in chapter 7. As a bonus , it
introduces a new and interesting way of describing connectionist systems
with a statistical technique known as cluster analysis.
The appendix begins by laying out various types of descriptions of
connectionist systems of the kind we have been considering (section 2). In
section 3 it goes on to expand on the idea (chapter 10) of explanations
that seek to group systems into equivalence classes defined for various
purposes. Each such grouping requires a special vocabulary , and the constructs
of any given vocabulary are legitimate just insofar as the grouping
is interesting and useful. Section 4 then shows that relative to such a model
of explanation , the constructs of both symbolic AI and common sense psychology
may have a legitimate role to play in giving psychological explanations
. This role is not just that of a useful approximation . Section 5 is a
speculative section in which the argument for the theoretical usefulness of
such symbolic constructs is extended to individual processing in a very
natural way . Here the cognizer , in the process of regulating, debugging ,
and understanding her own representations , creates symbols to stand for
sets of distributed activity patterns . The section points out the difficulties
for a pure distributed approach that may be eased by the addition of such
symbolic constructs, and it relates my speculations to the continuing debate
" "
over the correct architecture of cognition .
188 Appendix

2 Levelsof Description of ConnectionistSystems

Connectionist systems, like everything else, can be described at a variety of


levels, each with its own characteristic vocabulary . The central concern of
this appendix lies with the status of various high -level descriptions of such
systems. Low -level descriptions include
( 1) The numerical speci6cation of weights and activation -passing rules
and
(2) Subsymbolic interpretations of the activity of processing units .

High -level descriptionsinclude


(3) The partitioning treescreatedby performing a cluster analysison
a network,
" "
(4) Descriptionsthat usethe constructsof classicalAI (e.g., schema,
" " and
production, so on), and
(5) The ordinary conceptual-level descriptionsof commonsensebelief
and desirepsychology.
PaulSmolensky(1988) hasrecentlyattemptedthe essentialwork of detailing
the statusand interrelationsof theselevels of description. Smolensky's picture
(which only appliesto what I shall call pure distributed connectionism)
is, I believe, technically accurateand representsa major achievementin
sketchingthe theoreticalfoundationsof PDP. But it invites a certain distortion
of the role and statusof the high-level descriptions- an invitation that
the eliminativistscannot refuse. In the remainderof this sectionI sketchthe
receivedattitude to eachof the five levels of description.

Levell , thenumericallevel
The most precisecharacterizationof the actual processingof a particular
connectionist network is mathematicalin nature. Suchnetworks, we saw,
consistof interconnectedunits. The connectionsare weighted and the units
are miniprocessors that receiveand passon activation accordingto mathematical
. Thus, the theorist can give a precisecharacterization
specifications
of the state of such a system at a particular time by stating a vector of
numericalvalues. Eachelementin the vector will correspondto the activation
value of a single unit. Likewise, it is possibleto specify the evolving
behavior of a system by an " activation-evolution equation." This is a differential
equation that fixes the dynamicsof the network. If, as is generally
the case, the network is set up to learn, then it will be necessaryto specify
the dynamics of its learning behavior. This is done by meansof another
differential equation, the IIconnection-evolution equation." Such specifications
give a completemathematicalpicture of the activation and processing
Beyond eliminativism 189

pro61eof any given network. (For a more detailed accountseeSmolensky


1988, sections1 and 2.)
These mathematicalspecificationsplaya large and important role in
connectionist cognitive science. They are often the only way to understand
the distinctive wrinkles in the learning behavior of different kinds
of connectionistsystems(for example, Boltzmann-machinelearning versus
various forms of supervisedlearning). They also figure in explanationsof
specific behaviors and pathologies. In this sense, as Smolenskyobserves,
" the
explanations of behavior provided are like those traditional in the "
physical scientes, unlike the explanationsprovided by symbolic models
(Smolensky1988, 1).

Level2, thesubsymboliclevel
For all that, however, Smolenskyseemsespeciallyfond of a slightly higher
level of analysisthat he calls the subconceptual(or subsymbolic) level. It is
at that level, and not at the numerical(or mathematical)" one, that we find
" "
the completeformal accountof cognition. He writes, Complete, formal
and precisedescriptions of the intuitive (i.e. connectionist) processorare
generally tractable not at the conceptual level, but only at the subcon-
" -
ceptuallevel (Smolensky1988, 6 7). But there is, in fact, no inconsistency
here. For Smolenskyviews the subsymbolicas just the semantic(microsemantic
) description of the syntactic (units and activation) profile of level
1. He is thus committed to the semanticinterpretability of the numerical
variablesspecifying unit activations. This interpretation takes the form of
specifying the subsymboliC (or microfeatural) content to which the unit
activation correspondsin the context of a particular activation vector.
" ' '
Hence, the name subsymbolicparadigm is intended to suggestcognitive
descriptionsbuilt up of constituentsof the symbols used in the symbolic
paradigm; these fine-grained constituentscould be called subsymbolsand
they are the activities of individual processingunits in connectionistnetworks
"
(Smolensky1988, 3).
The semanticshift from symbolic to subsymbolicspeci6cationis one of
the most important and distinctive featuresof the connectionistapproach
to cognitive modeling. It is also one of the most problematic. One immediate
questionconcernsthe nature of a subsymbol. The level of description
at issueis clearly me~ t to be a level that ascribescontent, a level that
rather preciselyinterpretsthe numericalspeci6cationof an activation vector
by associatingthe activation of eachunit with a content. In an activation
vector that amountsto a distributed representationof coffee, we saw how
the activation of a single unit may representsuch featuresas hot liquid,
burnt odor, and so on. Such examplesmake it seemas if a subsymbolic
feature (or microfeature) is just a partial description, in ordinary-language
terms, of the top-level entity in question (coffee). This is certainly the case
190 Appendix

in most (or perhapseven all) of the toy examplesfound in the literature.


Nonetheless, there is clearly a theoretical commitment to something more
radical. Thus, concerningthe coffeeexample, Smolenskyadds, " we should
'
really use subconceptualfeatures, but even thesefeatures(e.g ., ' hot liquid )
"
are sufficiently low level to makethe point (1988, 16).
The official line on the semanticshift (or dimension shift) in connec-
tionist representationis that dimension-shifted representationsmust be of
featuresthat are more subtle than those in an ordinary task analysisof the
problem. The" claim is that the elementsof a subsymbolicprogram do not
refer to the sameconceptsas are used to consciously conceptualisethe
task domain" (Smolensky 1988, S). Or again, " the units do not have the
samesemanticsaswords of naturallanguage" (Smolensky1988, 6). We can
now seethat these claimscan be taken in two ways. The stronger way is
to take the claims to mean that the content to be associatedwith the
activation of a given unit in context cannotbe capturedby any formulation
in natural language, however long and hyphenated. The weakerway is to
take the claims to mean that individual unit activations don' t have the
semanticsof the single words that occur in a conscioustask analysisof the
domain. The latter is clearly the safer claim, at least as long as we. avoid
being too imaginative concerning the nature of the task analysis. But it
seemsthat Smolenskybelieves that the former, more radical reading will
"
ultimately prove correct. He thus notes, Semantically, the subconceptual
level seemsat present rather close to the conceptuallevel" (1988, 8). But
this, he conjectures , is probably becausethe choice of input and output
representations , a crucial factor in determining what a systemwill learn, is
based " "
heavily on existing theoreticalanalysesof the domain. It may well
be that truly subsymbolic models(i.e., in the strong sense) will not become
availableunlessinput and output representationscan be divorced from our
existing analysesof the domain. Whether this is possibleis a question that
would take us too far aAeld.
It is likely, however, that the real importanceof subsymbolicrepresentation
lies not just in what gets representedbut in the specialproperties of
the representationalmedium that connectionistsemploy. Much work in
ordinary AI (vision, natural-languageprocessing) depends, after all, on the
representationand manipulation of featuresquite invisible to daily, conscious
reflection on the task at hand. Where the two paradigmsdiff~r most
radically is surely in the general modeof representationand its associated
properties. In particular, subsymbolic (i.e., connectionist) representation
naturally embodiesa kind of semanticmetric (I owe this term to Andier
1988), which powers the distinctive features of generalization, graceful
degradation, and so on. The semanticmetric is best pictured as a spatial
arrangementof units in a multidimensionalspacearrangedso that semantically
related items are coded for by spatially related feature units. This
Beyond eliminativism 191

fact renders each individual unit pretty well expendable, since its near
neighborswill do almostthe samejob in generatingpatterns of activation.
And it is this samefact that allows suchsystemsto generalize( by grouping
the semanticallycommon parts of various items of knowledge), to extract
prototypes, and so on. Classicalrepresentationdoes not involve any such
built-in notion of semanticmetric.
Distributed (i.e., microfeatural) representationswith a built -in semantic
metric are also responsiblefor the context dependenceof connectionist
representationsof concepts. Recall that in what I am calling pure distributed
connectionismthere are no units that representclassicalconceptual-
level features, suchascoffee. Instead, coffeeis representedas a set of active
microfeatures.The point about context dependenceis that this set will vary
" "
according to the surrounding context. For example, coffee in cup may
involve a distributed representationof coffeethat includescontacting pro-
"
celain as a microfeature. But coffee in jar" would not. Conceptual-level
" "
entities (or symbols, to fall in with a misleadingtenninology) thus have
no stable and recurrent analogueas a set of unit activations. Instead, the
unit-activation vector will vary according to the context in which the
symbol occurred. This, we saw, is an important feature (though at timesit
may be a positive defect). It is directly responsiblefor the oft-cited fluidity
of connectionistrepresentationand reasoning.
If it is not the dimension shift in itself so much as the dimensionshift in
conjunctionwith a built -in semanticmetric that is the crucialfact in connec-
tionist processing , then a question arisesabout the status of the subsymbolic
level of description. For such descriptions seemedto involve just
listing a set of microfeaturescorresponding to an activation vector. But
such a listing leaves out all the facts of the place of each feature in the
general metric embodied by the network. And these facts seemto be of
great semanticsignificance. What a microfeature meansis not separable
from its place in relation to all the other representationsthe system embodies
. For this reason, I would dispute the claim that subsymbolicdescription
( least, if it is just a listing of microfeatures) affords an accurate
at
interpretationof the full numericalspecificationsavailablein level ! . Perhaps
the resourcesof natural language(however cannily deployed) are in principle
incapableof yielding an accurateinterpretation of an activation vector.
At first sight, sucha concessionmay seemto give the eliminativist an easy
victory . Fortunately, this impressionis wrong, aswe shall seein due course.

Level3, clusteranalysis
'
This level of analysisdoes not appearin SmolenskyStreatment. Rather, it
occursaspart of the methodology developedby Rosenbergand Sejnowski
for the analysisof NETtaik. (Foran accountof NETtaik though not, alas, of
cluster analysis, see Sejnowskiand Rosenberg1986.) I include it here for
192 Appendix

two reasons. The Arst is that it representsan interesting midway analysis


falling between subsymbolicand straightforwardly classicallevels of description
. The second is that (unlike a mere listing of microfeatures) it is
intended to reveal the outlines of the semanticmetric embodiedin a given
network.
NETtalk is a large distributed connectionistmodel for investigating part
of the process of turning written input (words) into phonemic output
(soundsor speech). The network architectureconsistsof a set of input units
that are stimulatedby sevenletters of text at a time, a set of hidden units,
and a set of output units that code for phonemes. The output is fed into a
voice synthesizer, which producesthe actual speechsounds.
The network began with a random distribution of hidden unit weights
and connections(within chosenparameters ), that is, it had no idea of any
rules for converting text to phonemes. Its task was to learn by repeated
exposureto training instancesto negotiate its way around this particularly
tricky cognitive domain (tricky becauseof irregularities, subregularities,
and the sensitivity to context of converting text to phonemes). Learning
proceededin the standardway, i.e., by a back-propagation learning rule.
This works by giving the systeman input, checkingits output (this is done
"
automatically by a computerized supervisor" ), and telling it what output
(i.e., what phonemiccode) it should have produced. The learning rule then
causesthe system to minutely adjust the weights on the hidden units in a
way that would tend to the correct output. This procedure is repeated
many thousandsof times. Uncannily, the systemslowly and audibly learns
to pronounceEnglishtext, moving from babbleto half-recognizablewords
and on to a highly creditablefinal performance.
Cluster Analysis is an attempt to display the shapeof the representational
spacethe system has createdby the carefully regulated weightings
on the hidden unit connections. To seehow it works, considerthe task of
the network to be that of setting hidden unit weights in a way that will
enableit to perform a kind of set partitioning. The goal is for the hidden
units to respond in distinctive ways when, and only when, the input corresponds
to a distinctive output. Thus, in converting text to phonemes,we
want the hidden units to perform very differently when given " the" as
" sail"
input than they would if given " "
as input. But we want them to
" "
perform identically if given sail and sale as inputs. So the task of the
hidden units is to partition a space(de6ned by the number of such units
and their possiblelevels of activation) in a way gearedto the job at hand.
A very simple system, suchas the rock/ mine network describedinChurch-
land (forthcoming 1989), may need to partition the spacedefined by its
hidden units into only two major subvolumes, one distinctive pattern for
inputs signifying minesand one for thosesigniiymg rocks. The complexities
of text-to-phonemeconversion being what they are, NETtalk must parti-
Beyond eliminativism 193

tion its hidden unit spacemore subtly (in fact, into a distinctive pattern for
eachof 79 possibleletter to phonemepairings). Cluster analysisas carried
out by Rosenbergand Sejnowskiin effect constructsa hierarchy of partitions
on top of this baselevel of 79 distinctive stable patterns of hidden-
unit activation. The hierarchy is constructed by taking each of the 79
patternsand pairing it with its closestneighbor, i.e., with the pattern that
has most in common with it . Thesepairings act as the building blocks for
the next stage of analysis. In this stage an average activation profile
between the "membersof the original pair is calculatedand paired with its
nearestneighbor drawn from the pool of secondaryfigures generatedby
averagingeachof the original pairs. The processis repeateduntil the final
pair is generated. This representsthe grossestdivision of the hidden-unit
spacethat the network learned, a division that, in the caseof NETtalk,
turned out to correspondto the division between vowels and consonants.
Cluster analysisthus provides a picture of the shapeof the spaceof the
'
possiblehidden-unit activationsthat power the network s performance.
A few comments. First, it is clearthat the clusteringslearnedby NETtalk
( g ., the vowel and consonantclusteringsat the top level) do not involve
e.
novel, unheard of subsymbolic features. This may be due in part to the
'
systems relianceon input and output representationsthat reflect the classical
theory. Even so, the metric of similarity built into the final set of
weights still offers someclear advantagesover a classicalimplementation.
Suchadvantageswill include generalizationvariousforms of robustness ,
and gracefuldegradation.
For our purposes , the most interestingquestionsconcernthe statusof the
'
cluster-theoretic description. Is it an accuratedescription of the systems
processing1One prominent eliminativist, Churchland (1989), answersfirmly
in the negative. Cluster analysis, he argues, is just another approximate,
'
high-level descriptionof the systems grossbehavior (seemy commentson
levels 4 and 5) and doesnot yield an accuratedescriptionof its processing.
The reasonfor this is illuminating. It is that the systemitself knows nothing
about its own clustering profile, and that profile does not figure in the
statement of the formal laws that govern its behavior (the activation-
evolution and connection-evolution equations of level 1). Thus, Church-
"
land notes, the learning algorithm that drives the systemto new points in
weight spacedoes not careabout the relatively global partitions that have
beenmade in activation space. All it caresabout are the individual weights
and how they relate to apprehendederror. The laws of cognitive evolution,
therefore, do not operate primarily at the level of the partitions. . . . The
"
level of the partitions certainly correspondsmore closely to the conceptual
"
level . . . , but the point is that this seemsnot to be the most important
dynamicallevel" (1989, 25).
194 Appendix

Churchland's point is that although a theorist could use her knowledge


of a system's clustering pro6le to predict some of its short-term behavior
(e.g ., that it will classifyinputs a and b together and a and c separately), she
could not use that knowledge to predict its cognitive development(if it is
a learning system) or the preciseshapeof its possiblebreakdowns. To get
suchMe -grained predictive power, Churchland argues, we need to seethe
precisenature and interrelationsof the subconceptualelementsresponsible
for the gross partitioning. By this he does not meanthat we need the kind
of subsymbolicdescriptionsconsideredin our discussionof level 2. Rather,
he opts for the numerical, connection-weight speci6cationas the appropriate
level of analysis.

Level4, thesymbolic -AI level


The term " conceptuallevel" as usedby Smolenskyseemsto be ambiguous
between any construct of classicalAI , e.g ., a schema , production, prototype
, and so on, and the terms of ordinary language, e.g ., " table, " office,"
"
coffee," and so on. The two meaningsare certainly linked, since some
classicalAI (the kind I have been calling classicalcognitivism) essentially
involves the manipulation of entities that have the semanticsof words of
natural language. However, they seemsufficiently different to merit some
teasingapart. Hence, level 4, as I understandit , will be the level at which a
connectionistsystemis describedasif it were a classicalone, e.g ., described
as firing a production, accessinga schema, extracting a prototype, and so
on. (Level S will be the level of good, old-fashioned folk-psychological
description.)
The general connectionist claim, as we have seen, is that all these
conceptual-level constructsof both ' s types
offer at best approximately accurate
descriptions of the system behavior. Such constructs(or rather, the
explanations in which a
they figure) give good indication of what the
system will do in a central range of cases , but they are unableto predict or
'
explain various other aspects of the systems capabilities. Recall the con-
nectionist model for solving simple circuit problemsdetailed in Smolensky
1986 and in chapter6, section 4 above. The model solvesproblemsby the
standardconnectionist method of massiveparallel satisfactionof a large
number of soft constraints, i.e., relations of excitation and inhibition between
units that representsubsymbolicfeatures. Nonetheless,it often looks
from the outside as if the system must work by satisfying hard, symbolically
couched constraints in serial order. If you give it a well-posed
problem and unlimited processingtime, it will converge on a solution by
making a set of micro decisions(recomputationsof unit values). And these
will in turn contribute to various macro decisionsas sectionsof the network
settle into a solution to their part of the problem. These macro
decisionsappearmuch like the serialfiring of production rules.
Beyond eliminativism 195

But if you give the system an ill -posed problem or arti8cially curtail its
" "
processingtime, it still gives what Smolenskycalls sensibleperformance,
This is explainedby the underlying subsymbolicnature of its processing,
which will always satisfy as many soft constraintsas it can, even if given
limited time and degradedinput. The moral of all this, asSmolenskyseesit ,
is that the theorist may analyzethe systemat the higher level of, e.g., a set
of production rules. This level will capture some facts about its behavior.
But in less ideal circumstancesthe system will also exhibit other behavior
that is explicable only by describing it at a lower level. Thus, the unified
account of cognition lies at one of the lower levels (level 2 or level I ,
accordingto your preference). Hence the famousanalogy with Newtonian
mechanics . Symbolic AI describescognitive behavior, much as Newtonian
mechanicsdescribesphysical behavior. They eachoffer a useful and accurate
account in a circumscribeddomain. But the uni8ed account lies.elsewhere
(in quantum theory in physics, and in connectionismin cognitive
science). Thus, commenting on the model for solving circuitry problems,
"
Smolenskynotes, A system that has, at the micro-level, soft constraints
satis8ed in parallel, appearsat the macro-level, under the right circumstances
'
to have hard constraints, satis8ed serially. But it doesnt really,
" " '
and if you go outside the Newtonian domain you see that it s really
" "Newtonian "
been a quantum system all along (1988, 20). Such analyses
are concededto be useful in that they may help describe interrelations
betweencomplex patterns of activity that approximatevarious conceptual
constructs in which the theorist is interested. As Smolenskypoints out
"
(1988, 6), such interactions will not be directly described"by the formal
"
definition of a subsymbolicmodel ; instead, they must be computed by
the analyst."

Level5, thefolk-psychological level


The folk-psychological level is the level in which we use the words and
conceptsof ordinary languagein the ordinary way to describethe cognitive
II
statesof a system. Thus, just as we may say, Johnbelievesthat Mary
l
will get the chairlll so we may say, ' Thenetwork believesthat bedrooms
'1
contain dressing tables. When we are dealing with a toy network, it is
clear enough that the system is in some way not a proper object of fullblooded
. But in consideringthe statusof folk-psychological
belief ascription
description, we must bracket this fact and ask instead the following question
. If , as Smolenskysays(1988, 7), connectionismdoes indeed afford the
" " -
completeformal accountof cognition, what follows on the statusof folk
psychologicaldescriptionsof humanmental states?
Drawing on our previous discussion,we can immediatelyobservethat if
the human mind is a pure distributed connectionistsystem, then the indi-
196 Appendix

vidual words used in a belief ascription will not have discrete, recurrent
"
analoguesin the actual processingof the" system. Thus, the word chair"
will not have a discrete analogue, since chair" will be representedas an
activation vector acrossa set of units that stand for subsymbolicmicrofeatures
, and it will not have a single recurrent analogue(not even as an
activation vector), since the units that participateand the degreeto which
they participatewill vary Horn context to context.
The radical eliminativist takes these facts and conjoins them with a
condition of causalefficacy, which states: a psychologicalascription is only
warranted if the items it posits have direct analoguesin the production (or
'
possibleproduction) of behavior. Thus ascribingthe belief that cows cant
fly to John is justified only if there is some state in John in which we can
in principle identify a discrete, interpretablesubstatewith the meaning of
" "
cow," fly ," and so on. Since, according to connectionism, there are no
such discrete, recurrent substates , the radical eliminativist concludesthat
commonsensepsychology is mistaken and does not afford an accurate
higher-level description of the systemin question(John). This is not to say
that suchdescriptionsare dispensablein practice; it is to say only that they
are mistakenin principle.
In the next section I shall sketch an account of explanation that dissociates
the power and accuracy of higher-level descriptions Horn the
condition of causal efficacy, which thereby gives a more liberal, more
plausible, and more useful picture of explanation in cognitive scienceand
daily life.

3 ExplanationRevisited
The eliminativist arguesher caseas follows.
Step1. Supposethat pure distributed connectionismoffers a correct
accountof cognition.
Step 2. It follows that there will be no discrete, recurrent, in-the-
head analogues to the conceptual-level terms that ftgure in folk-
psychologicalbelief ascription.
Step3. Henceby the condition of causalefficacy, suchascriptionsare
not warranted, since they have no in-the-head counterpart in the
causalchainsleading to action.
Step 4. Hence, the causal explanations given in ordinary terms of
beliefs and desires(e.g., " She went out becauseshe believed it was
"
snowing ) are technically mistaken.
My claim will be that even if pure distributed connectionismoffers a
corred and (in a way) completeaccountof cognition, the eliminativist con-
Beyond eliminativism 197
' ' '
clusion (step 4) doesnt follow . It doesnt follow for the simple reasonthat
good causalexplanation in psychology is not subject to the condition of
causalefficacy. Likewise, even if pure distributed connectionismis true, it
does not follow that the stories told by symbolic AI are mere approximations
. Instead, I shall argue, these various vocabularies(e.g., of folk-
psychology and of symbolic AI ) are gearedaccuratelyto capturelegitimate
and psychologically interesting equivalenceclasses , which would be invisible
if we restricted ourselves to subsymbolic levels of description. In a
'
sense , then, I shall be offering a version of Dennett s well-known position
on folk-psychological explanation but extending it , in what seemsto me
to be a very natural way, to include'the constructs of symbolic AI (e.g .,
schemata , productions.) If I am right , it will follow that many defenders
of symbolic AI and folk psychology (especiallyFodor and Pylyshyn) are
effectively shooting themselvesin the feet. For the defencesthey attempt
make the condition of causalefficacy pivotal, and they try to argue for
neat, in-the-headcorrelatesto symbolic descriptions(see, e.g ., Fodor 1987;
Fodor and Pylyshyn 1988). This is accepting terms of engagementthat
surely favor the eliminativist and that, as we shall see, makenonsenseof a
vast number of perfectly legitimate explanatory constructs.
What we need, then, is a notion of causalexplanationwithout causal
efficacy . I tried for such a notion in dark , forthcoming. But a superior case
has since been made by Frank Jacksonand Philip Pettit, so I begin by
drawing on their" account. Jacksonand Pettit ask the readerto considerthe
following case. Electrons A and B are acted on by independentforces FA
and FBrespectively, and electron A then acceleratesat the same rate as
electron B. The explanation of this fact is that the magnitude of the two
forces is the same.. . . But this samenessin magnitude is quite invisible to
A . . . . This sameness doesnot makeA move off more or lessbriskly" (1988,
- '
392 393). Or again, We may explain the conductor's annoyanceat a
concert by the fact that someonecoughed. What will have actually caused
the conductor's annoyancewill be the coughing of someparticularperson,
Fred, say" (Jacksonand Pettit 1988, 394). This is a nice case. For suppose
" "
someone , in the interestsof accuracy, insistedthat the proper (fully causal)
'
explanationof the conductor's"annoyancewas" in fact Freds coughing. There
is a good sensein which their more accurate explanationwould in fact be
" "
lesspowerful. For the explanationwhich uses someone has the advantage
"
of making it clear that any of a whole range of membersof the audience
coughing would have causedannoyancein the conductor" (Jacksonand
Pettit 1988, 395). This increasein generality, bought at the cost of sacri6cing
the citation of the actual entity implicated in the particular causalchain in
question, constitutes (I want to say) an explanatory virtue, and it legitimizes
a whole range of causalexplanationsthat fail to meet the condition
198 Appendix

of causalefficacy. Likewise in the electron case, there is no analogue of


" " which
sameness propels electron A . But citing samenessin our causal
explanation highlights the fad that the sameresult (identical acceleration )
would be obtained by an infinite number of values of FAand F. provided
just that FAis equal to F. .
A final example, especially relevant for our subsequentdiscussion, is
'
Hilary Putnams peg and hole explanation. We explain the fad that a one
inch squarepeg won' t pass through a one inch round hole by citing the
squarenessof the peg and the general fad that squarenesswill not pass
" "
through an equivalentareaof roundness. Yet supposewe askfor the real
causalstory. In any given casethis will involve a massof subatomicfacts
about clouds of particles. But exclusively to focus on this is to obscurethe
whole range of situations that we might well be interested in grouping
'
together as cases in which squarenesswon t pass through roundness.
(Imagine, for example, a universe with a different microstructure, but one
which still sustainshigher-level descriptions in terms of roundnessand
squareness .) The moral is that " in no particularcasewill the squareness and
the roundnessas such figure in the full story of the multitude of interactions
which stop the peg from fitting into the hole, but the fad of
squarenessand roundnessensures , though not causally, that there is some
very" complex set of interactionswhich stops the peg from fitting into the
hole (Jacksonand Pettit 1988, 395).
I hope that this talk of multitudes of smaller interadions and causally
inactive higher-level descriptionsputs the reader in mind of the subsymbolic
and higher-level descriptions of connectionist systems laid out in
sedion 2. Such, at any rate, will be the line I play out in the next section.
First, though, a little borrowed terminology. After Jacksonand Pettit, let us
call an explanation that highlights a common feature of a range of cases
(e.g ., the explanationsthat cite roundnessand sameness ) but abstractsaway
from the causallyadive featuresof a particular casea programexplanation .
In thesestyles of explanation the common featureor property will be said'
to causallyprogram the result without actually figuring in the causalchain
leading to an individual action or instance. And let us contrast suchexplanations
with processexplanations , which cite the very features that are
efficaciousin a particular caseor range of cases. My claim, then, is that
explanationsusing the various higher-level construds of symbolic AI and
folk psychology may be necessaryand fully accurateprogram explanations ,
while failing (as the eliminativist insists) to constitute good processexplanations
. They will do this just in casethey offer a terminology that groups
various systems into psychologically interesting equivalenceclassesthat
are unmotivatable if we restrict ourselves to, say, a pure subsymbolic
accountof processing.
Beyond eliminativism 199

4 TheValueof High-LevelDescriph
~
Consider once again the various higher-level analysesof pure distributed
connectionistsystems. The Arst of thesewas clusteranalysis. Clusteranalysis
, recall, involved charting the hierarchy of divisions (or partitions) that
the network had learned to make using its hidden units. Recall also the
attitude of at least one leading eliminativist, Paul Churchland, to such a
level of analysis. It was that when undergoing conceptual change (e.g.,
learning) the system would behavein ways not responsiveto the various
partitionings (which the systemdoes not really know about), but it would
behavein ways responsiveto the actualconnectionweights (which it does
really know about). '
This point is well taken asfar asit goes. (It is like saying, if you want to
'd better know the values
predict the actualaccelerationof electron A, you
of the forcesacting on it and not just that they are the sameas thoseacting
"
on electron B. ) But it would be a grave mistaketo assumethat this point
shows that the level of analysisadopted by the cluster analyst is inferior,
approximate, unnecessary , or downright mistaken. For an analysis that
cites partitionings, like one that cites the samenessof the forces acting
on the electrons, may likewise have virtues that other analysescannot
reach. For example, it is an important fact about cluster analysis (a fact
recognizedby Churchland [1989, 24]) that networks that have come to
embody different connection weights may have identical cluster analyses.
Thus, Sejnowskinotes that versions of NETtalk that begin with different
random distributions of weightings on the hidden units will , after training,
makethe samepartitions but by meansof different arrangementsof weights
on the individual connections. Now considera particularcognitive domain,
'
say, converting text to phonemes. Isn t it a legitimate psychological fact
that only certain systems can success fully negotiate that domain? And
don't we want some level of properly psychological, or cognitive, explanation
with the meansto group such systemstogether and to make some
generalizationsabout them (e.g., that suchsystemswill be prone to certain
illusions)? Cluster analysisis the very tool we need, it seems.Anyone of a
whole range of networks, we can say, will be able to negotiate that
cognitive domain. And we can give an account that specifieswhat networks
belong in that range (or in the equivalenceclass in question) by
requiring that they have a certain cluster analysis. In the terminology '
introduced in section3, the clusteranalysiscausallyprogramsthe systems
successfulperformance,but it is not part of any processexplanation.
Let us now move up another level to the descriptionsoffered by symbolic
AI . Suppose, for the sake of argwnent, that we describeNETtalk
at this level as a discrimination tree, or better, as a production system
with one production for each conversion of text to phoneme that it has
200 Appendix

learned. We have clearly lost explanatory power for explaining the per-
fonnance of an individual network. For as we saw, the network will per-
fonn well with degraded infonnation in a way that cannot be explained
by casting it as a standard symbolic AI system. But as with the cluster
analysis, we gain something else. For we can now deAne an even wider
equivalenceclassthat is still, I suggest, of genuine psychologicalinterest.
Membershipof this new, wider classrequiresonly that the systembehave
in the ways in which the pure production system would behave in some
central classof cases. The production-system model would thus act as an
anchor, dictating membershipof an equivalenceclass, just as the cluster
analysisdid in the previous example. And the benefitswould be the same
too. Supposethere turns out to be a lot of systems(some connectionist,
someclassical , someof kinds still undreamedof) all of which nonaccidentally
approximatethe behavior of the pure production system in a given range
of cases . They are all united by being able to convert text to phonemes. If
we seeksome principled and informative way of grouping them together
(i.e., not a bare disjunction of systemscapableof doing suchand such), we
may have no choicebut to appealto their sharedcapacity to approximate
the behavior of such and such a paradigmaticsystem. We can then plot
how eachsystemmanages , in its different way, to approximateeachseparate
production. Likewise, there may be a variety of systems (some con-
nectionist, some not ) capable of supporting knowledge of prototypical
situations. The symbolic AI construct of a schemaor frame may help us
understandin detail, beyond the gross behavior, what all these systems
have in common (e.g ., somekind of content addressability, default assignment
, override capacity, and so forth ). In short, we may view the constructs
of symbolic AI , not as mere approximationsto the connectionistcognitive
truth, but as a means of highlighting a higher level of unity between
otherwise disparite groups of cognitive systems. Thus, the fact that a
connectionistsystemIl and somearchitecturallynovel systemof the future
b are both able to do commonsensereasoningmay be explainedby saying
that the fact that Il and b eachapproximatea classicalscript or frame-based
system causally programs their capacity to do commonsensereasoning.
And this meansthat a legitimate higher-level sharedproperty of Il and b is
invisible at the level of a subsymbolicanalysisof Il. This is not to say, of
course, that the subsymbolicanalysis is misguided. Rather, it is to claim
that that analysis, though necessaryfor many purposes, does not render
higher levels of analysisdefunct or of only heuristic value.
Finally, let us move on to full -Aedgedfolk-psychological talk. On the
presentanalysis, such talk emergesas just one more layer in rings of evermore
explanatory virtue. The position is beautifully "illustrated by Daniel
"
Dennett' s left-handersthought experiment. " Suppose, Dennett says, that
the sub-personalcognitive psychology of somepeople turns out to be dra-
Beyond eliminativism 201

"
matically different horn that of others. For example, two peoplemay have
very different sets of connection weights mediating their conversionsof
text to phonemes. More radically still, it could be that left-handedpeople
have one kind of cognitive architectureand right -handed people another.
For all that, Dennett points out, we would never concludeon thosegrounds
alone that left-handers, say, are incapableof believing.
Let left- and right -handersbe as internally different as you like, we
already know that there are reliable, robust patterns in which all
behaviourallynonnal peopleparticipate- the patternswe traditionally
describe in terms of belief and desire and the other terms of folk
psychology. What spreadaround the world on July 20th, 19691The
belief that a man had steppedon the moon. In no two people was the
effect of the receipt of that information the same .. . , but the claim
that therefore they all had nothing in common . . . is false, and obviously
so. There are indefinitely many ways one could reliably distinguish
thosewith the belief Horn thosewithout it. (Dennett 1987, 235)
In other words, even if there is no single internal state (say, a sentencein
the languageof thought) common to all those who are said to believe that
so and so, it doesnot follow that belief is an explanatorily empty construct.
The samenessof the forces acting on the two electrons is itself causally
inefficaciousbut nonethelessfigures in a useful and irreducible mode of
explanation (program explanation), which highlights facts about the range
of actualforcesthat can producea certainresult (identical acceleration ). Just
so the posit of the sharedbelief highlights facts about a range of internal
cognitive constitutions that have somecommon implicationsat the level of
grossbehavior. This grouping of apparentlydisparateplaysicalmechanistns
into classesthat reAect our particular interests is at the very heart of the
scientific endeavor. To supposethat the terms and constructs proper to
such program explanations are somehow inferior or dispensableis to
embracea picture of scienceas an endlessand disjoint investigation of
individual causalmechanisms .
There is, of course, genuine questionabout what constructsbest serve
a
our needs. The belief construct must earn its keep by grouping together
.
creatureswhose gross behaviors really do have something important in
common (e.g., all those likely to harm me becausethey believeI am a
predator). In recognizing the value and statusof program explanationsI am
emphaticallynot allowing that anything goes. My goal is simply to counter
the unrealistic and counterproductive austerity of a model of explanation
"
that limits " real explanations to those that cite causallyefficaciousfeatures.
The eliminativistargument, it seems , dependscruciallyon a kind of austerity
that the explanatory economy can ill afford.
202 Appendix
5 Self-Monitoring Connectionist
Systems
The previous four sectionshave, I hope, establishedthat even if pure distributed
connectionismconstitutes a complete and accurateformal model
of cognition (as Smolensky claims), it does not follow that higher-level
analyses(like cluster analysis, symbolic AI , and folk psychology) are misguided
, mistaken, or even mereapproximations. Instead, they may be accurate
and powerful groupingexplanationsof the kind examinedin sections3
and 4.
In this more speculativesectionI want to use the samekind of observations
to cast some doubt on the idea that pure distributed connectionism
constitutesa complete and accurateformal model of cognition. There is a
very natural developmenthere, sincethe virtues of grouping explanations
as third-person, theorist's constructs have analoguesin first-person processing
. In short, there may be pressureon individual cognizersto monitor
and group their own internal states in the samegeneral way as there is
'
pressurefor program explanationsthat group other peoples internal states.
If this is the case, then parts of our internal cognitive economy may begin
to look distinctly classicaland symbolic. Without presuming to decide
what is clearly an empirical issueabout cognitive architecture, this section
aims to depict the kinds of pressurethat might make sucha mixed cognitive
economy attractive.
Pure distributed connectionisminsists that discretesymbolic constructs
" " " "
(e.g ., dog, office ) exist only as instruments of interpersonalcommunication
in a public languageand as constructsof higher-level, third-person
analysesof shifting, 8uid activation vectors. Let us bracket for now the
questionof public language. All theorists agreethat we use and processat
somelevel the symbolic entities of public discourse(words). My questionis
whether such symbolic entities (discrete recurrent items and conceptual
semantics ) have any role to play in individual cognition beyond whatever
is necessaryto produce and interpret language. The pure distributed con-
nectionist thinks not. Sheagrees, with Smolensky, that suchentities at best
emergeat a higher level of analysisof what are through and through subsymbolic
(see, e.g ., Smolensky (1988, 17). That is to say, such
" " systems
entities are visible only to the ext~rnaI theorist and do not figure in the
'
systems own inner workings. All the systemknows about, on this account,
are its manifold activation vectors. The rest are theorist' s fictions, a useful
and (accordingto our earlierargwnents) even indispensableaid to grouping
systemsinto equivalence classes, but not a featureof individual processing
'
(recall Churchlands comment about cluster analysis revealing partitions
that the systemitself knows nothing about).
Pure distributed connectionism , we saw in chapter 9, is heir to some
interesting problems. For it seems that systems lacking the distinctive
Beyond eliminativism 203

resourcesof classicalsymbolic AI (hard pattern matching, variablebinding,


easy recombinationof atomic elements, etc.) may be inadequatefor some
tasks. In their lessexcessivemoments, such recent criticisms as Fodor and
Pylyshyn 1988 and Pinker and Prince 1988 may be touching on just such
inadequacies . In particular, there may be a set of problems associatedwith
the apparent lack, in pure distributed connectionism, of anything corresponding
to higher-level labelsfor setsof distributed activity , labelsthat are
capable acting either as cuesfor the full distributed representationor as
of
stand-ins in operations (e.g ., deductive inference) in which the full distributed
representation is either unnecessarilycomplex or even distorting
becauseof its extreme context sensitivity (see criticisms in Fodor and
Pylyshyn 1988). 1propose to sketchthree problem areasthat are representative
of this kind of pressurefor actual in-the-head symbolic structures.
The Arst is what I shall call the problem of relying on cues. This problem
(which is the subjectof detailed investigation in Robbins, unpublished) can
be introduced with a simple example. Considera pure distributed connec-
tionist model of prototype extraction (e.g ., McClelland and Rumelhart
1986). Such a network will be given many examplesof a certain kind of
item. For instance, it may be given many examplesof dogs, eachparticular
dog being speciAedas a set of microfeatures. Since connectionist models
must store data superpositionallyby the subtle orchestrationof a whole set
of units .each of which participatesin the encoding of many patterns, the
network is able to generatea prototypical dog representation,aswe saw in
chapter 5. This representationis just the center of the state spacedefined
by the set of dog inputs. Such inputs will have various features in common
'
, and thesewill becomestrongly representedin the network s pattern-
completion dispositions. The more variablefeatureswill tend to cancelout,
unlessrecalledby a unique cue. A further bene6t is that one set of weights
canencodemultiple prototypes (e.g ., of a dog, a cat, and a bagel) if they are
sufficiently dissimilar to avoid confusion (see Mcdelland and Rumelhart
1986, 185).
The trouble with all this, as Robbins points out, is that as things stand
" "
the system knows the prototype in only a very weak sense. In knows it
in that given, say, half the prototypical dog as a cue, it could reliably
completethe rest. But it has no way of representingthe prototype to itself
asa prototype . In a sense, the center of its own state spaceis as invisible to
the systemitself asare the higher-level partitionings mentionedearlier. The
" "
system knows "about "prototypical dogs in the way a musicalnovice with
a good ear may know about E-flat. The novice can differentially respond
to E-flat inputs, but she hasnot marked the sound out to herself.
One responseto this, suggestedby McClelland and Rumelhart, is to
" "
associatea name (e.g., dog ) with all the exemplar inputs. The result
" "
would be that dog would then act as a partial cue capableof generating
204 Appendix
a full prototypical representation. This move, however, simply defeatsthe
purposeof POP, for the network now doesnot perform anything like automatic
categorization. Instead the theorist, in training the network, decides
what categoriesit will really know about by feeding it the labels.
What is actually required, I suggest, is some kind of internal pressure
built into the systemso that it is driven to seekparticularly salientpeaksof
activation and to label them itself as part of a processof monitoring and
organizing itself. Salience , as Robbins also points out, is not the sameas
any old signal averageor any old state spacecenter. Instead, we want the
system itself to recognizethe center of some state spaceas a particularly
important activation vector and then label it for future use. '
There is an analogy here with some of Pinker and Princes (1988) criti-
cismsof the past-tense-learning network reviewed in chapter 9. That network
was forced into a higher rule-basedlevel of organization by external
pressure(in fact, by the sudden influx of a large body of regular past
tenses). In contrast, Pinker and Princeargue, the child is internallydriven to
seekprinciplesby which to organizeand regimentthe examplesshealready
knows. This posit of internal pressurefor a real higher-level understanding
fits very well with somespeculationsof Annette Karmiloff-Smith, a leading
developmentalpsychologist. The child, accordingto Karmiloff-Smith (1985,
1986, 1987), is her own metatheorist. The child tries to bring her own processing
strategiesunder higher-level descriptionsin' order to facilitate problem
solving. This effect amounts to doing for one s own processingwhat
program explanationswere seento do for the processingof groups of systems
; it amountsto seekingimportant common effectsof various strategies
and then explicitly grouping suchstrategiesinto labeledequivalenceclasses .
One areain which the availability to oneselfof suchhigher-level analyses
' '
of one s own processingis especially useful is in debugging one s own
performances .
This leadsvery nicely to the secondkind of problem I want to mention:
" "
what Smolenskycallsthe assignment-of-blameproblem. The assignment-
of-blame problem is that of deciding, when a system fails to perform in
somedesiredway, what featuresof the system are at fault. As Smolensky
notes, " In subsymbol iCsystems, this assignmentof blame problem is a difficult
one, and it makesprogramming subsymbolicmodels by hand very
" 1988 15 . Automatic
tricky ( , ) learning procedures(like back-propagation)
constitute a partial solution to the problem. But if, after a period of training,
something is still going wrong, it is extremely hard to put it right' except
by more training on better chosen examples. In short, you cant really
debug a pure distributed connectionistnetwork.
'
Compare this state of affairs to the position of the human expert. A
golfer say, who has a problem with her swing. The golfer will naturally
wish to debug her swing in the most efficient way possible. And this will
Beyond eliminativism 205

involve not just practicebut very carefully chosenpractice, practiceaimed


at some aspectof her swing that she feels is the root of the trouble (say,
wrist control). For the expert golfer, having a higher-level articulation of
the swing into wrist , arm, and leg componentsis an essentialaid to improving
and debugging performance.
'
What s true of golf is often true of life in general, and the presentcaseis
no exception. The really expert cognizer, I suggest, builds for itself various
-
higher-level representationsof its own lower-level (pure distributed con
nectionist) reasoning. These representations( which must group activation
vectors into rough classesaccording to their various roles in producing
behavior) provide the key to efficient improvement and debugging. The
studies by Karmiloff-Smith referred to above provide evidence that just
such a processof higher-level representationoccursin children. Thus, she
shows how various aspectsof linguistic competenceseemto arise in three
'
phases. The first phaseinvolves the child s attainment of basic behavioral
success . The child learnsto producethe requiredlinguistic forms. Internally
driven by a goal of control over the organization of internal representations
, the child then goes on to establisha structured description of her
"
own basicprocessing(phasetwo ). The initial operation of phasetwo is to
re-describethe phaseone representationsin a form which allows for (albeit
"
totally unconscious ) access ( Karmiloff-Smith 1986, 107). This added cognitive
IQad, however, may causenew errors, which are only correctedonce
a balanceof phase 1 (procedural competence ) and phase2 (redescriptive
competence ) is achieved . This balancing act constitutes phase3. The general
a wealth of data in Karmiloff -Smith 1985, 1986,
message(backedby
1987) is that the child (and also the adult, for this is a general learning
pattern) first learns to produce correct outputs, and then yields to endogenous
pressureto form representationsof the form of the processingthat
yields the outputs. These higher-level representationsare much closer to
classicaldiscretesymbol structuresthan to highly distributed connectionist
representations .
Perhaps , then , the pure distributed connectionist underratesthe value
of the discrete symbol structures used in natural language and classical
AI . Such structures, according to the radical connectionist, serve at best
two roles, to mediateinterpersonalcommunicationand to help the novice
acquirerudimentary skills in a domain. I have argued, in contrast, that such
symbol structuresmay also be a vital aid to the expertby providing an articulated
model of her own problem-solving strategies. This model, though
lacking the fluidity and power provided by the connectionist substratum,
' s own
may be an indispensable aid to debugging one performance.
Finally , I wish to consider an internal correlate to the role of interpersonal
communication. Just as the gross symbol structures of public language
facilitate communicationbetweenwhole agentswho may have very
206 Appendix

different internal representations of the states of affairs being discussed, so


internal symbol structures may facilitate communication between different
connectionist subsystems that have very different representations of some
domain of mutual interest .
It seemsplausibleto supposethat the mind is not one big undifferentiated
network. Instead we should expect some articulation into special-purpose
networks. This immediately brings a problem of communication. It would
be intolerably wasteful for example, for a network that needsto coordinate
its activity with some other network to have to reproducein full the activation
vectors characteristicof the other network' s processing. It is likely
to needonly a sketchof the overall shapeof the other' s activity . This could
easily constitute pressurefor a kind of higher-level representationalformalism
geared to pass messagesbetween the various " subsystemsof a
single cognizer. Thus, Churchland imagines that two distinct networks
whoseprincipal concernsand activities are non-linguistic learn Horn scratch
some systematicmeansof manipulating, through a proprietory dimension
of input, the cognitive abilities of the other network." He then asks, 'What
'
system of mutual manipulation- what language- might they developf
(Churchland 1989, 42).
In sum, then, there seemsto be signi8cant internalpressureto develop
varioushigher-level articulationsof subsymbolicactivation vectors. Instead
of being just an external theorist' s descriptionof emergentfeaturesof subsymbolic
processing , many symbolic constructs may be incarnatewithin
the systemas its own description of emergentfeaturesof its subsymbolic
processing.Sucha conjecture'seemsto me to go someway toward fleshing
out some of Daniel Dennett s recent claims about the evolution of consciousne
. This is obviously a massivetopic, which I can barely touch on
here. But part of Dennett's picture is of a form of consciousnessin which
some animals simulate a Von Neumann, discrete, serial processorusing
their natural subsymbolicconnectionistcognitive architecture. The useful-
nessof sucha development, Dennett thinks, may be to createa systemthat
can benefit Horn its own self-monitoring activity . Dennett' s story involves
some elaborate, though to my mind rather plausible, conjectureson the
role of self-stimulation in the creation of such an architecture. I shall not
"
attempt to detail his argument here. The upshot of it is an architecture
that is incessantlyre- organisingitself, trying out novel combinations,
sometimesidly , sometimeswith great purpose.. . . And what is all this
good for? It seemsto be good for the sorts of self-monitoring" that can
protect a flawed systemHorn being victimisedby its own failures ( Dennett
1988, 27). This insistenceon the importanceof self-monitoring and on its
associationwith a somewhatmore classicalstyle of architectureand representation
chimesvery nicely with my picture of self-improving, debugging
Beyond eliminativism 207

systemsthat utilize condensedor higher-level representationsto engagein


metareasoningabout their own basicprocessing.
Of course, many connectionistshave already felt the need to introduce
coarserrepresentationsthan those charaderistic of a pure distributed approach
. We saw that Geoffrey Hinton, for example, has suggestedthat it
may be necessaryto equip systemswith two representationsof everything.
One would be the full , distributed representationspreadacrossa whole
set of feature units. The other would be a condensedversion capableof
standing in for and recalling if
' , necessary
, the full (or expanded) representation
. This, like Minsky s k-line theory (1980) and various other partitially
local, partially distributed approaches, can be seenas a way of responding
to the pressuressketchedabove. If these pressuresare genuine, and if the
answeris somekind of mixed system(part distributed, part symbolic), then
" "
it seemsthat the raging debateover the corred cognitive architectureis
badly posed. For if human beings are indeed mixed systemsemploying
various kinds of representationsand operationsfor various purposes, they
will be properly desaibed in both classicaland connectionist terms. The
seriousquestionswould then arisecaseby casefor every aspectof cognition
, and the absolutely global question (POP or not POP?) would wither
and die, a vestigial organ proper to a previous generationof debate.

6 A DoubleE" or
The eliminativist, I have argued, makesa double error. First, her conditional
argumentis flawed. Evenif pure distributed connectionismwerea complete,
fonnal accountof individual processing , it would not follow that the higher-
level constructsof symbolic AI and folk psychology were inaccurate , misguided
. Instead, such constructsmay be the essentialand
, or dispensable
accurategrouping principles for explanationswhich prescind from causal
'
processto causalprogram. The eliminativist s conditional argument was
seento rest on an insupportablecondition of causalefficacy, a condition
that, if applied, would rob us of a whole range of perfectly intelligible and
legitimate explanationsboth in cognitive science and in daily life.
'
Second, the antecedentof the eliminativist s conditional is itself called
into doubt by the power and usefulness of higher-level articulations of
processing. Such articulations, I argued, may do in the first person what
program explanationsdo in the third. That is, they may enablethe system
to observeand mark the common contributions of a variety of activation
vectors. Such marking would provide the system with the resourcesto
reflect on (and henceimprove and debug) its own basicprocessingstrategies
. In this way, entities that the pure distributed connectionistviews as
-
emergentat a higher level of externalanalysis( high level clusterings, types,
208 Appendix

and categories) '


- may- in fad be incarnate in virtue of the system s own
self-analytic activity .
If this is correct, a notable implication is that the whole 'What architec-
'
turer debateturns out to have beenseriouslyill posed. Where the question
once seemedto be POP or not POP? it now becomesmerely, Where
POP, and where not POP? Thus may the ungrammaticaltriumph over the
unanswerable .
Notes

Chapter2
1. Thanks
toLesley
Benjamin theshampoo
forspotting andforsome
example stimulating
about
conversation involved
theproblems inparsing
it.
210 Notes to pages28- 103

Chapter 3
1. At a minimum , the eliminativematerialistmust believethat this is the primarypoint
of the practice. In conversationPaulChurch land hasacceptedthat folk-psychological
talk servesa varietyof otherpurposesalso, e.g., to praise , to blame, to encourage , and
so on. But, he rightly says, so did witch talk. That in itselfis not sufficientto savethe
Notesto pages111- 155 211

Chapter 6
1. A relatedissuehereconcernsour capacityto changeverbsinto nounsand vice versa.
Thusnewly coineduseslike "Shewantsto thatcherthe organization " or "Don' t crack
"
wisewith me are easilyundentood. This might be partiallyexplainedby supposing
that the verb/noundistinctionis (at best) onemicrofeature amongmanyandthat other
factors , canforcea changein this featureassignment
, like positionin the sentence while
leavingmuch of the semanticsintact. (This phenomenonis dealt with at length in
Benjamin , forthcoming).
2. Many of the ideasin this section(includingthe locution"an equivalence -classof algorithms
" are
) developedout of conversations with BarrySmith . He is not to blame , of
course , for the particularviewsI advance.

Chapter 7
1. See, e.g., Hinton 1984.
2. Lecturegivento the BritishPsychological , April 1987.
Society
3. Thischapterowesmuchto Hofstadter(1985), whosesuggestive comments havedoubtless
shapedmy thoughtin morewaysthanI amaware . Themaindifference , is
, I suspect
that I amkinderto the classical .
symbolicaCcounts

Chapter 8
1. As ever, this kind of claimmustbe readcarefullyto avoidChurch -Turingobjections .A
taskwill countasbeyondthe explanatoryreachof a POPmodeljust in caseits performance
requiresthat to carry it out, the POPsystemmust simulatea differentkind of
processing (e.g., that of a Von Neumannmachine ).
2. He allowsthat intentionalrealismcanbe upheldwithout acceptingthe LOT story (see
Fodor1987, 137). ButasI suggestin section3, hedoesseemto believethatsome fonn of
physicalcausation is necessaryfor the truth of intentionalrealism. This, I argue,is a very
dangerous assumption to make.
3. A parsingtreeis a datastructurethat splitsa sentence up into its partsandassociates
thosepartswith grammatical categories. Forexample , a parsingtreewould separate the
" "
sentence"Fodor exists" into two components"Fodor" and exists and associatea
grammatical labelwith each.Theselabelscanbecomehighly complex . But the standard
simpleillustrationis

s
...~
NP VP
I I
Fodor exists
212 Notes to pages 160- 185

lack of interest in actual brain processingmechanismsby adverting to groUndsof broad


theoretic simplicity (seeStich 1972, 211).
S. An exception may be the caseof conscious thought . We may consciously deploy, say, a
mental model of ftuid 8ow to understand electricity or consciously apply grammatical
rules in the early stages of speaking a second language or consciously reason by
adducing a seriesof sententially formulated facts and rules. In suchcases,it seemsnatural
to assumethat at least some level of computational organization in the brain doesinvolve
operations on formal tokens having a projectible semanticsthat can be given using the
conceptsand relations appearingin sententialformulations. As for the other cases,which
form the bulk of daily life, we can only wait and seewhat develops.

Chapter9
1. Theactualstrudureof themodeliscomplicated
in various waysnotgennane
to present
concerns
. SeeRumelhart andMcClelland
1986for a fulla ount.
2. A trivial modelwouldbe onethat merelyuseda PDPsubstrateto implementaconventional
theory. But therearecomplications ; seesections.
here
3. Thisexampleis mentionedin Davies , forthcoming, 19.
4. Thanksto Martin Daviesfor suggestive conversations concerningtheseissues
.
s. I owe this suggestionto JimHunter.
6. This point wasmadein conversation by C. Peacocke .
7. Forexample , Sacl
<sreportsthecaseof Dr. P., a musicteacherwho, havinglost theholistic
ability to recognizefaCel', makesdo by recognizingdistinctivefacialfeaturesandusing
theseto identify individuals . Sackscommentsthat the processingthesepatientshave
intactis machinelike, by whichhemeanslike a conventional computermodel. As heputs
it:
dassicalneurology... has alwaysbeen mechanical .... Of coune, the brain is a
machineanda computer.. . , but our mentalprocess eswhichconstituteour beingand
our life, are not just abstractand mechanical [they] involve not just classifyingand
categorising , but continualjudging and feelingalso. If this is missing , we become
computer -like, as Dr. P. was.... By a sort of comicand awful analogy , our current
cognitiveneurologyandpsychologyresembles nothingsomuchaspoorDr. P.I (Sacks
1986, 18- 19)
Sacksadmonish escognitivesdencefor being" too abstractandcomputational ." But he
as well have said " too - "
might rigid, rule
-bound
, coarsegrained , andserial
.

Chapter 10
1. This is not to saythat the philosophers
who raisedthe worrieswiDagreethat they are
. They won't.
bestlocalizedin the way I go on to suggest

Epilogue
1. Thestoryisinspiredbytwosources
: theGouldandLewontin ofadaptationist
critique
thinking in chapter
, reported 4, andDouglas 'sbriefcomments
Hofstadter onoperating
systems , 641-642
(1985 ).
Bibliography

Adams,D. 1985. 50longand~ ksfor All theFish . London : Pan.


Andier, D. 1988. Representations in cognitivescience : Beyondthe Proandthe Con. CREA
research paper,Paris.
ArmstrongD . 1970. The natureof mind. Reprintedin N. Block , ed., Readings in Philosophy
of Psychology , vol. 1, pp. 191- 199. London:MethuenandCo., 1980.
Baddeley , R. 1987. Connectionism andgestalttheory. Unpublished manuscript . University
of Sussex .
Baron-Cohen "
, 5., Leslie , A , and Frith, U. 1985. Doesthe autisticchild havea theory of
mind"? Cognition 21: 37- 46.
Benjamin , L. Unpublished . How nounscanverb. Draft research paper.Universityof Sussex .
, N. 1980. Troubleswith functionalism
Block . In N. Block , ed., Readings in Philosophy of
Psychology , vol. 1, pp. 268- 305. London:MethuenandCo.
Bobrow, D., and Winograd , T. 1977. An overviewof KRL, a knowledgerepresentation
Language . Cognitive Science 1: 3- 46.
Boden, M. 1984a . Animal perceptionfrom an AI viewpoint. In C. Hookway, ed., Minds,
Machines andEvolution . Cambridge : CambridgeUniversityPress .
Boden , M. 1984b.What is computational psychology ? Proceedings of theAristotelian Society ,
suppl. 58: 17- 35.
Brady, M., Hollenbach , J., Johnson , T., Lozano - PereZ, T., andMason , M., eds. 1983. Robot
Motion: Planning andControl . Cambridge : MIT Press .
Broadbent , D. 1985.A questionof levels:Commenton McClellandandRummelhart . ] oumal
of&perimentalPsychology : General 114: 189- 192.
Chamiak , E., andMcDermott, D. 1985. Introduction to ArlijiciRlIntelligence. Reading , Mass.:
Addison-Wesley.
Church land, P. 1979. Scientific Realismand the Plasticii .v of Mind. Cambridge : Cambridge
UniversityPress .
Church land, P. 1981. Eliminativematerialismand the propositionalattitudes . ] oumalof
Philosophy 78, no. 2: 67- 90.
Churmland , P. 1986. Neurophilosophy : Towardsa UnifiedTheoryof theMind-Brain. Cambridge
: MIT Press .
Church land, P. 1989. On thenatureof theories : A neurocomputational penpective . in P. M.
Church land, TheNeurocomputational . Cambridge
Perspective : MIT Press .
" in
Church land, P., andChurch land, P. 1978. Commentaryon cognitionandconsciousness
non -human species . Beh Rvioural andBrainSciences ' 4: 565- 566.
Church land, P., andChurch land, P. 1981. Fundionalism , qualia , andintentionality . In J. Biro
and R. Shahan , eds. Mind, Brain, and Fundion . Oklahoma : Universityof Oklahoma
Press , 1982.
Chomsky , N., andKatz' J. 1974. Whatthe linguistis talkingabout. In N. Block , ed., Readings
in Philosophy of Psychology , vol. 2, 1980. pp. 223- 237. London:MethuenandCo.
214 Bibliography
Oark. A 1986. A biologicalmetaphor . Mind andLimgURge I , no. 1: 45- 64.
Oark. A. 1987a . Fromfolk-psyd1ologyto naivepsychology . CognitiwScience II , no. 2:
139- 154.
ClarkA 1987b. Connectionism and cognitivescience . In J. Hallamand C. Mellish, eds.,
Advances in Arlificiallntelligence , pp. 3- 15. Chichester : Wiley.
dark. A 1987(. Thekludgein the machine . Mind andLanguage 2, no. 4: 277- 300.
aark, A Forthcoming . Thoughts , sentence , andcognitivescience . Philosophical Psychology .
Cole, M., Hood, L., and McDennott, R. (1978). Ecologicalniche picking. In UNeis -
ser, ed., MemoryObsm Jed: Remembering inNaturA'1 Conta1s . SanFrancisco : Freeman ,
1982.
Cosmides , L. 1985. Deductionor Darwinianalgorithms ? An explanationof the "elusive "
contenteffecton theWasonSelectionTest. Doctoralthesis.HarvardUniversity.
Davidson , D. 1973. The materialmind. In J. Haugeland , ed., Mind Design , pp. 339- 354.
Cambridge : MIT Press , 1981 .
Davidson , D. 1984. Inquiries into TnlthandInterpretation Oxford: Oxford UniversityPress .
Davies, M. 1986. Individualismand supervenience . Proceedings of theAristotelianSociety ,
Suppl.60: 263- 283.
Davies,M. Forthcoming . Modularity: Levelsof explanation , neuropsyd1ology andconnec -
tionism. Paperpresented to theWorkingPartyon MentalRepresentation . Mandtester
University, 1987. Forthcomingin BritishJournal for thePhilosophy of Science .
Dawkins , R. 1986. TheBlindWatchmaker . England : Longman .
De Kleer, J., andBrownJ. 1985. A qualitativephysicsbasedon conftuences . In J. Hobbsand
R. Moore, eds. FormalTheories of theCommon senseWorld,pp. 109- 183. Horwood , N.J.:
Ablex.
Dennett, D. 1981. Brainstorms . Sussex : HarvesterPress .
Dennett, D. 1984a . ElbowRoom : TheVarieties of FreeWill WorthWanting . Oxford: Oxford
UniversityPress .
Dennett,D. 19Mb. Cognitivewheels : Theframeproblemof AI. In C. Hookway,ed., Minds,
MAchines RndEvolution . Cambridge : CambridgeUniversityPress .
Dennett,D. 1987. TheIntent i OnRI StRnte . Cambridge : MIT Press .
Dennett, D. 1988. The evolutionof consdousness . Tufts University,.Centerfor Cognitive
Studies . Circulatingmanusaipt,CCM 88-1.
Devitt, M., andSterelny , K. 1987. LRng URge RndRtRlity: An Introduction to thePhilosophy of
LRng URge . Oxford: Blackwell .
Draper, S. 1986. Machinelearningandcognitivedevelopment . To appearin J. Rutkowsb
andC. Crook, eds., TheComputer RndHumRnDeoelopment : PsychologiC RIl S9ues .
'
Dreyfus, H. 1972. WhRtComputers Cant Do. New York: HarperandRow.
Dreyfus, H. 1981. Frommiao-worlds to bowledge representation : AI at an impasse . In
J. Haugeland , ed., Mind Design , pp. 161- 205. Cambridge : MIT Press .
Dreyfus, H., and Dreyfus, S. 1986. Mind ooerMAchine : ThePowerof HUlnRnIntuitionRnd
&perlisein theErRof theComputer . New York: FreePress , MacMillan .
Durham , T. 1987. Neuralbrainwaves breaknewground . Computing , 9 April 1987.
Evans , G. 1982. 11reVRrieties of Reference Oxford: Oxford UniversityPress .
Feigenbaum , E. 1977. The art of arti6cialintelligence : 1. Themesand casestudiesof
bowledge engineering . Proceedings of thefifth Intemati OnRIJointConference onArlifu:i R1
Intelligence
- 11: 1014- 1029.
Fodor, J. 1968. The appealto tadt knowledgein psychologicalexplanation . ] oumalof
Philosophy 65: 627- 640.
Fodor, J. 1980a . Methodologicalsolipsismconsideredas a researmstrategyin cognitive
psychology . Reprintedin J. Haugeland , ed., Mind Design , pp. 307- 339. Cambridge :
MIT Press , 1981.
Bibliography 215

Fodor, J. 1980b. Somenoteson what linguisticsis about. In N. Block , ed., Readings in


Philosophy of Psychology , vol. 2, pp. 191- 201. London:MethuenandCo. '
Fodor, J. 1985. Fodor's guide to mentalrepresentations : The intelligentAunties vade-
mecum . Mind 94: 11 100.-
Fodor,J. 1986. Individualism andsupervenience . Proceedings of theAristottlianSocitfy , Soppl.
60: 235- 263.
Fodor,J. 1981. Psycho Stmantics : TheProblem of Meaningin thePhilosophy of Mind Cambridge :
MIT Press .
Fodor, J., and Pylyshyn , z. 1988.. Connectionism and cognitivearchitecture : A critical
analysis . Cognition 28: 3- 11.
Gould, S., andLewontinR. 1918.TheSpaltdrels of SanMarcoandthePanglossian Paradigm :
A critiqueof the adaptationist programme . Reprintedin E. Sober , ed., Conctpi URilssUtS
in Eoolution RryBiology , Cambridge : MIT Press , 1984.
Hallam, J., andMellish, Ceds . 1981.Advances in ArlificiRllnttlligma Chichester : Wiley and
Sons .
' , no. 1486, December .
Harcourt , A. 1985. All s fair in play andpolitics. NtWScitntist
Harris, M., and Coltheart , M. 1986. l Rng URgt Processing in Children Rnd Adults . London:
RoutledgeandKeganPaul.
Haugeland , J. 1981. The natmeandplausibilityof cognitivism . in J. Haugeland , ed., Mind
Design , pp. 243- 281. Cambridge : MIT Press .
Haugeland . J. 1985. ArlificiRllnttlligtnct: TheVeryldtRCambridge : MIT Press .
Hayes , P. 1919. The naive physicsmanifesto . In D. Michie, ed., &pert Systems in the
Micro-Eltctronic Agt. Edinburgh : EdinburghUniversityPress .
Hayes , P. 1985a . The secondnaive physicsmanifesto . In J. Hobbsand R. Moore, eds.,
Form Rl Theories of theCommonstn St World, pp. 1- 36. Norwood . N.J.: Ablex.
Hayes , P. 1985b. Naive I:
physics Ontology for liquids . In J . Hobbs and R. Moore, eds.,
Form Rl Theories of theCommon StnSt WorldNorwood . N.J.: Ablex.
Hebb, D. 1949. TheOrgRniZRtionof BthRpior . New York: Wiley andSons.
Hinton. G. 1984. Parallelcomputations for controllinganarm . JoUmRI of MotorBth Rpior 16:
171- 194.
, G., andAndenon
Hinton , J. 1981 , eds.ParallelModels ofAssoci4tive Memory , N.J.:
. HiUsdale
Erlbaum .
Hobbs, J. 1985 . Introduction to J. R. HobbsandR. Moore , eds., FOmIaITheoriesof the
Common senseWorld , pp . xi - xxii. Norwood . N .:
.J Ablex .
Hobbs, J., andMooreR. 1985 . eds.FOm Ial11reoriesof theCommon senseWorld . Norwood ,
N.J.: Ablex .
Hodges , A 1983 . AlanTuring : TheEnigma . NewYork: SimonandSchuster .
Hofstadter , D. 1985 . Wakingup from theBoolean dream . or
, ascomputation
Subcognition .
In hisMetam RRiCRIThemas : Questing for theEssente ofMindandPattern , pp. 631- 665.
Hannondsworth : Penguin .
Hornsby, J. 1986. Physicalistthinkingand behaviour . In P. Pettit and J. McDowell, eds.,
Subltd, Thought , and Contat . Oxford: Oxford University Press .
Hull, D. 1984. Historicalentitiesand historicalnarratives . In C. Hookway, ed., Minds,
Machines andEvolution . Cambridge : CambridgeUniversityPress .
' . ConsdousntSf Regained . New York: Oxford
Hwnphreys , N. 1983. Natures psychologists
University Press.
IsraelD . 1985. A short companionto the naive physicsmanifesto . In J. Hobbs anJ
R. Moore, eds., FonnalTheories of the Commonsmse World, pp. 427- 447. Norwood,
N .J.: Ablex.
-
Jacob , F. 1977. Evolutionandtinkering. Sciena196, no. 4295: pp. 1161 1166.
Jackson , F., and Pettit, P. 1988. Functionalismand Broad Content . Mind 97, no. 387:
381- 400.
216 Bibliography
Kahneman , D., Slovic, P., andTversky, A., eds. 1982.Judgement underUncertainty : Heuristics
andBlases . Cambridge : CambridgeUniversityPress .
Kanniloff-Smith '
, A. 1984. Otildrens problemsolving. In ME . Lamb, A L. BroWi1 ;, and
B. Rogoff, eds. Advances in Developmental Psychology , vol. 3, pp. 39- 90. Hillsdale , N.J.:
Erlbaum .
Kanniloff-Smith , A. 1985. Language andcognitiveprocess esfroma developmental perspective
. Language andCognitive Process es1, no. 1: p. 61- 85.
Kanniloff-Smith , A. 1986. Frommetaprocess esto consdousaccess : Evidence fromchildren 's
metalinguistic andrepairdata. Cognition -
23: 95 147.
Kanniloff-Smith , A. 1987. Beyondmodularity: A developmental perspectiveon human
consciousness . Draft manuscriptof a talk given at the annualmeetingof the British
Psychological Society , Sussex . April
KatZ, J. 1964. Mentalismin linguistics . Language 40: 124- 137.
Krellenstein , M. 1987. A reply to parallelcomputationandthe mind-body problem. Cognitive
Sciences 11: 155- 157.
Knuth, D. 1973. SortingandSearching . Reading , Mass.: Addison-Wesley.
Kohler, W. 1929. GestaltPsychology . New York: Uveright.
Kuczaj , S. A. 1977. Theacquisitionof regularandirregularpasttenseforms.Journalof Verbal
Learning andVerbalBehaviour 16: 589- 600.
Lakatos , I. 1974. Falsificationand the methodologyof scientificresearchprogrammes . In
I. Laktosand A Musgrave , eds., CritidsmandtheGrowthof Knowledge . Cambridge :
CambridgeUniversityPress .
Langley , P. 1979. Rediscovering physicswith BACON3. Proceedings of theSirthInternational
JointConference onArtificialIntelligence 1: 505- 508.
Langley , P., Simon , H., Bradshaw , G., andZytkow, J. 1987. Scientific Discovery : Computational
Erplorations of theCreative Process . Cambridge : MIT Press .
Lenat,D. 1977. Theubiquityof discovery . Proceedings of thefifth International JointConference
onArtificialIntelligence 2: 1093- 1105.
Lenat,D. 1983a.Theoryformationby heuristicsearch . ArtificialIntelligence 21: 31- 59.
Lenat, D. 1983b. EURISKO : A programthat learnsnew heuristicsand domainconcepts .
ArtificialIntelligence 21: 61- 98.
Levi-Strauss , C. 1962. TheSavage Mind. London:WeidenfeldandNicolson.
Ueberman , P. 1984. TheBiologyandEvolution of Language . Cambridge : HarvardUniversity
Press .
Lycan, W. 1981. Form , function , andfeeLJournalof Philosophy 78, no. 1: 24- 50.
Maloney, J. 1987. The right stuff. Synthese 70: 349- 372.
McClelland , J. 1981. RetrievinggeneralandspeCAc knowledgefrom storedknowledgeof
speciAcs . Proceedings of the Third AnnURI Conference of The CognitiveScience Sod" "
(Berkeley ) 170- 172.
McClelland , J. 1986. The programmable blackboardmodelof reading. In J. McCleUand ,
D. Rumelhart , andthe POPResearch Group, ParallelDistributed Procl SSing: Explorations
in theMicrostructure of Cognition , vol. 2 pp. 122- 169. Cambridge : MIT Press .
Mc CUeland , J,. and Kawamoto , A . 1986 . Mechanisms of sentenceprocessing : Assigning
roles to constituentsof sentences . In J. McClelland , D. Rumelhart , and the POP
ResearchGroup, ParallelDistributedProc I SSing: &plorationsin the Microstructure of
Cognition , vol. 2, pp. 216- 271. Cambridge : MIT Press .
McCleUand , J., and Rumelhart , D. 1985a . Distributedmemoryand the representation of
generalandspeCAc information . Journalof &perl ",,"TaiPsychology : Gmera1114 , no. 2:
159- 188.
Mc CleUand , J., andRumelhart , D. 1985b.Levelsindeed!A response to Broadbent . Journalof
&perl ",,"TaiPsychology : Genlra1114 , no. 2: 193- 197.
Bibliography 217

McClelland, J., andRumelhart , D. 1986. Amnesiaanddistributedmemory.In J. McClelland ,


D. Rumelhart , andthe PDPResearch Group, ParallelDistributed Processing : Uplorations
in theMicrostructure of Cognition , vol. 2, p. 50.3- 529. Cambridge : MIT Press .
McClelland , J., Rumelhart , D., and Hinton, G., 1986. The appealof PDP. In Rumelhart ,
McClelland , andthePDPResearch Group. ParallelDistributed Processing : Uplorationsin
theMicrostnlcture of Cognition , vol. 1, pp. .3- 44. Cambridge : MIT Press .
McClelland .
, J, and Rumelhart , , and the PDPResearch
D. Group, Parail ,1 Distributed Processing
: Uplorations in thl Micr0sfn4ch4re of Cognition , voL 2 Cambridge : MIT Press .
McCulloch, G. 1986. Scientism , mind, and meaning . In P. Pettit and J. McDowell, eds.,
Subject , Thought , andContut. Oxford: Oxford UniversityPress . 1986.
McCulloch , W., and Pitts, W. 1943. A logicalcalculusof the ideasimmanentin nervous
activity. Bulletinof Mathematical Biophysics 5: 115- 13.3.
McDermott, D. 1976. Artificial intelligencemeetsnaturalstupidity. In J. Haugeland , ed.,
Mind VlSignCambridge : MIT Press , 1981.
McGinn, C. 1982. The structureof content. In A. Woodfleld, ed., Thoughtand Object ,
pp. 207- 259. Oxford: Oxford Univenity Press .
Marr, D. 1977. Artificial intelligence : A penonalview. In J. Haugelanct , ed., Mind Design ,
p. 129- 142.Cambridge : MIT Press .
, 1981
Marr. D. 1982 . NewYork: W. H. Freeman
. Vision andCo.
Marr, D., and Poggio, T. 1976. Cooperativecomputationof stereodisparity. Science 194:
283- 287.
Michaels , C., andCarello,C. 1981. DirectPerception . EnglewoodCIiHs,N.J.: Prentice -Hall.
Michie, D., andJohnston , R. 1984. TheCrtRtiDeComputer . Harmondsworth : Penguin .
Millikan, R. 1986. Thoughtswithout laws, cognitive sciencewith content. Philosophical
Rtt1itW.95: 47- 80.
Minsky, M. 1974. A frameworkfor representingknowledge . MIT lab memo306. Cambridge
, Mass. Excerptsin J. Haugeland , ed., Mind Design(Cambridge : MIT Press ,
1981).
Minsky, M., 1980. K-lines: A theoryof memory. Cognitif JtScience 4: 117- 133.
Minsky, , M. and Papert , S. 1969 . PtrCtpirons . Cambridge : Mit Press .
Newell, A. 1980. Physicalsymbolsystems ..Cognitif JtScience4: 135- 183.
Newell, A , andSimon , H. 1976. Computerscience asempiricalinquiry. In J. Haugeland , ed.,
Mind Design . Cambridge : MIT Press .
NormanD . 1986. Re8ections on cognitionand paralleldistributedprocessing . In J. Mc-
OeUand , D. Rumelhart , and the POPResearch Group, ParallelDistributed Proct . uing:
Explorations in theMicrostrudurtof Cognitiop , vol. 2, pp. 110- 146. Cambridge : MIT
Press .
Pettit, P., and McDowell, J., eds., 1986. Subject , Thoughtand Contut. Oxford: Oxford
UniversityPress .
Pinker , S. 1984. Language LtamabilityandLanguage DeDtiopmtnt . Cambridge : HarvardUniversity
Press.
Pinker , S., andPrince . A 1988. On languageandconnectionism : Analysisof a paralleldistributed
processing modelof languageacquisition . Cognition 28: 73- 193.
Poggio
-- , T. , and Koch , C. 1987 . Synapses that compute motion . Amlrican, May,
Scientific
pp. 42- 48.
Premack , D., and Woodruff, G. 1978. Doesthe chimpanzee havea theory of mind? Be-
MoiouralandBrainScience -
4: 515 526.
Putnam , H. 1960. Mindsandmachines . In S. Hook, ed., Dimensions of Mind. New York: New
York UniversityPress .
Putnam , H. 1967.Psychological Predicates . In W. CapitanandD. MerilLeds., Art, Mind, and
Religion , pp. 37- 48. Univenity of PittsburghPress .
218 Bibliography

, H. 1975a
Putnam . The meaningof "meaning ." In H. Putnam, Mind, Llmguage , andReality,
pp. 215 - 271. CambridgeCambridgeUniversityPress
: .
, H. 1975b. Philosophyand our mentallife. In H. Putnam
Putnam , Mind, Language and
, pp. 291- 303. Cambridge
Reality : CambridgeUniversityPress .
, H. 1981. Reductionism
Putnam and the natureof psychology . In J. Haugeland, ed., Mind
Design , pp. 205- 219. Cambridge : MIT Press .
Pylyshyn , Z. 1986 . Comput Rtion RndCognition . Cambridge : MIT Press .
Ridley , M. 1985 . TheProblems of Evolution . Oxford:OxfordUniversity Press .
Ritchie , G., andHanna , F. 1984 . AM: A casestudyin AI methodology . Artifit:illllnteUigence
23: 249- 268.
Robbins , A Unpublished . Representing typeandcategory in PDP.Draftdoctoraldissertation
. University of Sussex .
Rosenblatt , F. 1962 . Principles ofNeurod ,Vn Rmics . NewYork: Spartan Books .
Rumelhart , Hinton , G., andWilliams , R. Learning internal representations by errorpropagation
. In Rumelhart , McClelland , andthePOPResearm Group , PRrRllelDistributed Processing
: &: plorRtions in theMicrostructure of Cognition , voL1, pp. 318- 362. Cambridge :
MIT Press .
Rumelhart , D., andMcClelland , J. 1986 . On learningthepasttenses of Englishverbs . In
J. McCleUand , D. Rumelhart , andthePOPResearch Group , PRrRllelDistributed Processing
: &: plorRtions in theMicrostructure of Cognition , voL2, pp. 216- 271. Cambridge :
MIT Press .
Rumelhart , D., andMcClelland , J. 1986 . POPmodels andgeneral issues in cognitive science .
In D. Rumelhart , J. McClelland , andthePOPResearch Group , PIlrRUel Distributed Pr0-
cessing : &: plorRtions in theMicrostructure of Cognition , voL1, pp. 110- 146.Cambridge :
MIT Press .
Rumelhart , D., Mclelland , J., andthePOPResearch Group , 1986 . PRrRllelDistributed Processing
: &: plorRtions in theMicrostructure of Cognition , voL1, Cambridge : MIT Press .
Rumelhart , D., andNormanD. 1982 . Simulating a skilledtypist: A studyin skilledmotor
performance . Cognitive Science 6: 1- 36.
Rumelhart , D., Smolensky , P., McCleUand , J., andHinton , G. 1986 . Schemata andsequential
thoughtprocess esin POPmodels . In J. McCleUand , D. Rumelhart , andthePOPResearch
Group , PRrRllelDistributed Processing : &: plorRtions in theMicrostructure of Cognition
, voL2, pp. 7- 58. Cambridge : MIT Press .
Rutkowska . J. 1984 . Explaining infantperception : Insights Hornartificial intelligence . Cognitive
studies research paper005. University of Sussex .
Rutkowska . J. 1986 . Developmental psymology ' s contributionto cognitivescience .
In KS. Gill, ed. Artifit:illllnteUigence for Society , pp. 79- 97. Chichester , Sussex : John
Wiley.
Ryle,G. 1949 . TheConcept ofMind. London : Hutchinson .
Sacks, O. 1986 . TheManWhoMistook HisWifeforRBRt.London : Picador .
Schank , R., andAbelson , R. 1977 . Scripts , PIRns , GORis , RndUnderst Rnding . Hillsdale , N.j.:
Lawrence Erlbaum Associates .
Schilcher , C., andTennant , N. 1984 . Philosophy , Evolution , RndHumlmNRture . London :
Routledge andKegan Paul
Schreter , Z., andMaurer , R. 1986 . Sensorimotor spatiallearningin connectionist artiftcial
organisms . Research abstract FPSE . University of Geneva .
Searle, J. 1969 . Speech Acts : An ~ y in thePhilosophy of LAng URge . Cambridge : Cambridge
University Press .
Searle, J. 1980 . Minds , brains , andprograms . Reprinted in J. Haugeland , ed., MindDesign ,
pp. 282- 307.Cambridge : MIT Press , 1981 .
Searle, J. 1983 . lntmtionillity . Cambridge : Cambridge University Press .
Bibliography 219
SearJe,J. 1984. IntentionaJity andits placein nature.SynthtSt61: 3- 16.
Sejnowski , T., and Rosenberg , C. 1986. NETtaJk: A parallelnetwork that learnsto read
aloud. JohnHopkinsUnivenity TechnicalReportJHU/EEC-86/01.
Shortlife , E. 1976. Computer Based MedicalConsult aHons : MYCIN New York: Elsevier .
Simon,H. 1962. The architecture of complexity.Reprintedin H. Simon , ed., TheSciences of
theArtificial. Cambridge : CambridgeUnivenity Press , 1969.
Simon , H. 1979. Artificial intelligenceresearchstrategiesin the light of AI modelsof
scientificdiscovery . Proceedings of the SixthInternaHonalJoint Conference on Artificial
Intelligence 2: 1086 - 1094 .
Simon , H. 1980. Cognitivescience : The newestscienceof the artificial. Cogni HveScience 4,
no. 2: 33- 46.
Simon , H. 1987. A psychologicaltheory of scientificdiscovery . Paperpresentedat the
annualconference of the BritishPsychological Society.Univenity of Sussex .
Sloman , A 1984. Thestructureof the spaceof possibleminds. In S. Torrance , ed., TheMind
andTheMachine . Sussex : EllisHorwood.
Smart , J. 1959. Sensations andbrainprOcesses . Philosophical Review 68: 141- 156.
Smith, M. 1984. The; evolutionof animalintelligence . In C. Hookway, ed., Minds, Machines
andEvolution . Cambridge : CambridgeUnivenity Press .
SmoJensky , P. 1986. Informationprocessing in dynamicalsystems : Foundations of hannony
theory. In O. Rumelhart , J. McClelland , andthe POPResearch Group. ParallelDistributed
Processing: Explora Honsin the Microstn4dure of Cognition , vol. I , pp. 194- 281.
Cambridge : MIT Press .
Smolensky , P. 1987. Connectionist AI, andthe brain. ArtificialIntelligence Review1: 95- 109.
Smolensky , P. 1988 On the proper treatment of connectionism . Behavioural and Brain
Sciences 11: 1- 74.
Sterelny , K. 1985. Reviewof StichFromFolkPsychology to CognitiveScience . Australasian
Journalof Philosophy 63, no. 4: 510- 520.
, S. 1971. What everyspeakerknows. Philosophical
Stich Review80: 476- 496.
Stich, S. 1972. Grammar , psychology , andindeterminacy . Reprintedin N. Block , ed., Readings
in Philosophy of Psychology , vol. 2, pp. 208- 222. London: MethuenandCo., 1980.
, S. 1983. FromFolkPsychology
Stich to Cognitive Science . Cambridge : MIT Press .
Tannenbaum , A. 1976. Stn4dured Computer Organization . EnglewoodCliffs, N.J.: Prentice -
Hall.
Tennant , N. 19848 . Intention aJity, syntacticstructure , and the evolutionof language . In
C. Hookway, ed., Minds, Machines , andEvolution . Cambridge : CambridgeUnivenity
Press .
Tennant , N., and Schilcher , C. 1984. Philosophy , Evolution , and HumanNature . London:
Routledge Kegan and Paw .
Tennant , N. 1987. Philosophyandbiology: Mutualenrichmentor one-sidedencroachment .
Lanuovacritica 1- 2: 39- 55.
Thagard , P. 1986. Parallelcomputationand the mind-body problem. CognitiveScience 10:
301- 318.
Torrance , S. 1984. Philosophyand AI: Someissues . Introductionto S. Torrance , ed., The
Mind andtheMachine , pp. 11- 28. Sussex : EllisHorwood.
Turing, A 1937. On computablenumbeDwith an applicationto the Entscheidungs problem
. Proceedings of theLondon Mathematical Society 42: 230- 265.
Turing, A 1950. Computingmachineryandintelligence . Mind 59: 433- 460.
VanFraasen . B. 1980. TheScientific Image . Oxford: Oxford Univenity Press .
Vogel S. 1981. Behaviourandthe physicalworld of an animal.In P. BatesonandP. Klopfer,
eds., Pmpectives in Ethology , vol. 4. New York: PlenumPress .
Walker, S. 1983. AnimalThought . London: RoutledgeandKeganPaw.
220 Bibliography

Warrington , C., and McCarthy, R., 1981. Categoriesof knowledge: Furtherfractionations


andanattemptedintegration . Brain110: 1213- 1296.
Winograd , T. 1912. Understanding naturallanguage. Cognitive 1: 1- 191.
Psychology
Winston, P. 1915. ThePsychology of Computer Vision
. New York: McGrawHill.
Wittgenstein, L. 1969. On Certainty
. Oxford: Blackwell
.
Woodfield, A., ed. 1982. Thoughtandobject. Oxford: Oxford UniversityPress.
Index

Abelson , R., 25, 30- 31, 92, 93 Belief. SetFolkpsychology


Absent-qualiaargument , 2,3- 24, 34, 35 Biology, 61- 80, 104- 105, 184
Action, 38 Blendingefron, 166- 167
Adivation-evolutionequation , 188 Block, N., 23, 149- 150
Adams , D., 7 Bobrow, D., andWinograd , T., 7
Amnesia , 101 Boden , M., 52
Andier, D., 190 Bodyproblem , 27- 28
Anologicalreasoning , 123, 125 Boltzmannmad1ine , 183
Approximationrelation , 115, 111, 130, Bradshaw , G., 16
131- 139 Brady , M., 75
Architecture Brain's-eyeview, 4- 5, 84
cognitive, 16, 128- 130, 150- 152 Breakdown patterns , 168- 169, 173. See
definedby primitiveoperations , 16, ISO alsoConnectionism , andpathological
(seealsoPrimitiveoperations ) data
of PDP.systems , 112, 125, 118- 119 Broadbent , D., 129 .
of perceptrons , 85 Broadcontent, 42- 46, 179. SetalsoFolk
virtual 11 (seealsoVirtual machines ) psydtology
Von Neumann , 16 Brown . J., 158
AmstrongD ., 21
Artifacts, asaidsto computation , 132- 135 Case -role assignment . SetSentence -
ArtificialIntelligence . SeeOassicalAI; processing models
Cognitivism ; Connectionism ; Causalcognitivescience . SetCognitive
Representations science
Ascriptive-meaningholism . 48- 50 Causalefficacy , conditionof, 196, 201
Assignment -of-blameproblem . 204 Causalpowers,32- 33
Autonomousguidedvehicle, 15 Otinese
- ------- - room - - 34
- --- . 30 -
Chomsky , N., 134, 156
Backpropagation, 192 Church land, Paul 1, 38- 42, SO- 54,
BACON, 13- 17, 139- 141, 174- 175 192- 194, 199, 202, 206
. SeePrimitiveoperations
Basicoperations Oark, A., 17, 131
Behavior OassicalAI, 19- 21. Seealso
ambiguity of, S3 Representations ; Cognitivism
internal causesof, 38 (seealsoCognitive Ousteranalysis , 191- 194, 199
science) Coadaptation . 68
and mental states, 44, 47- si , S7- S8 Cognitivearchitecture . SeeArchitecture
as structured, 146- 1S0 Cognitiveethology, 65
as world involving, SS':' S6 (seealsoBroad Cognitivescience , 9, 37, 72- 80, 152- 154,
content) 156- 160, 178- 182
Behaviorism , 22 CognitivewheeL65
222 Index

Cognitivism Datastructures , 20, 21. Seealso


doubtsconcerning , 25- 36 Representations , structured
introduced 9
, , 11 - 21 Davidson , D., 48, 145
methodologyof, 74- 80 Davies , M., 11, 21, 169
Commonsense , 5, 25- 30 Dawkins , R. , 68, 11- 12
Competence models , 115- 116 Deductiveinference , 19
Computational architecture . See Defaultassignment , 90- 91, 93, 96
Architecture De Kleer, J., 158
Computational linguistics , 12, 162- 166. Deltarule, 98- 99, 102, 183
SeealsoGrammar Dennett, D., 65, 111, 141, 191, 200- 201,
Conceptual leveL18, 148- 150. Seealso 206
Subsymbolic paradigm Desaiptivecognitivescience . See
Connection -evolutionequation , 188 Cognitivescience
Connectionism . SeealsoSubsymbolic Desaiptivism , 154- 151. SeealsoGrammar ;
paradigm ; Representation ; Crosstalk Cognitivescience
analogywith quantumtheory, 115, 130 Developmental data, 162- 163, 112
contextsensitivityof, 109- 111, 117 Developmental dysphasia , 168
contrastedwith classicism , 19- 21, Developmental psychology , 64- 65
111- 114 Devitt, M., 155- 156
anddebugging , 204 Dimensionshift, 20, 114, 111- 118,
defined, 118- 119 189- 190. SeealsoConnectionism ;
anddevelopmental data, 161- 167 Subsymbolic paradigm ; Distributed
emergentpropertiesof, 86- 100 representations
andthe essence of thought, 124- 126 Disambiguation , 110
asimplementation theory, 144, 150- 152 Distributedrepresentations , 94- 99,
levelsof descriptionof, 188- 196 110- 114, 116. Se, alsoConnectionism ;
miaofeaturalrepresentation in, 108- 114, SubSymbolicparadigm ; Representations
167- 168 Doppelgangers , 42- 43
asneurallyinspiredtheory, 83, 84 Dreyfus, , - 30, 65, 93
H. 25
andpathologicaldata, 100- 102, Dreyfus, H., andDreyfus,S., 21, 29,
151- 152, 168- 169, 194 125
asthe primarymodeof cognition
174- 175
,
"
psycholoS
) ,, 66
problemsfor, 127, 143- 175, 203- 207
reconciledwith classicism , 152, 202- 207
Ecological. -
Electrical model
-drcuit
Eliminativematerialism
115- 117
andsearchs , 122 associatedwith connectionism, 196- 17
andself-monitoring , 202- 207 associationwith connectionism ,
rejected
anda sentence -processing modeL 207- 208
107- 111 -
asswnptionsof, 47 48
serialcontrolasa problemfor, 103 evidencefor. 39- 42
Consciousness , 132, 173, 205 andmentalcausation , 160, 196- 201
Conscious reasoning , 14, 127, 131- 132, ,
rejected SO - 54, 58- 59, 207- 208
140, 173, 206 Environment asan aid to cognition , 63- 66,
Conscious roleinterpreter , 3, 114, 132- 136
137- 139, 151 Epistemic liasons, 49. SeealsoHolism
Constituentstructure , 145- 150
Constitutiveclaims , 43, 54- 57
Content,42- 43. SeealsoBroadcontent Evolution,61, 64
Crosstalk, 101, 122- 124 andadaptationism, 76- 80
Cue-relianceproblem , 203 of complexwholes,66- 73
Culturalknowledge , 114, 202 andpsychological , 156
realism
Index 223

Evolutionarilybasictasks , 72- 73, Hofstadter, D., 13, IS, 123, 174


104- 105, 114 Holigm
Excitatorylinks, 86. 5tealsoConnectionism of beliefanddesire , 5, 48- 50, 57- 58,
Exclusiveor, 102- 103 145, 147
Expertskill, 204- 205 of evolutionaryprocess es, 66- 71.
Explanation in pop storage , 107- 111, 11.0- 11.1
andgeneralizations , 196- 202 Horizontallylimitedmicroworlds . See
Marr' s levelsof, 18 Microworlds
varietiesof, 15, 17, 128, 171, 173, Hornsby,I ., 44, 53
196- 202 Humphreys , N., 51
Explicitrules. 5teRules
Identitytheory, 22- 1,3
Feedback loops, 66 Implementation theory, 124, 129, 144,
, 71- 71.
Flatfish 170- 172
Fodor, J., 1, 17, 19- 1.0, 48, 143- 151
., 156, Infonnationalholism , 107- 121
160, 197, 203 Infonnationprocessing , 29, 64
Folk psychology. SeeRisoEliminative Infonnation -processing warfare, 62- 63
materialism Inhibitorylinks, 86
and broad content, 42- 46, 179 Insight, flashof, 15, 139- 141
criticisms of, 39- 42 Intentionalrealism , 160
and folk physics, 52 Intuitiveprocessor , 14, 137- 139. Seealso
holism of, 48- 50 (seeRisoHolism) Connectionism ; Subsymbolic paradigm
innatenessof, 51- 52 israeLD., 158
as a level of description of connectionist
systems, 195- 196 Jackson, P., 197- 198
as a theory, 38- 39 Jacob, ., 70
P
Fonnal properties, 10, 31- 34 JetsandSharksmodeL86- 92
Frame-basedreasoning, 26- 30, 92- 95, 200
Fundion, samenessof , 129 Kanniloff-Smith , A , 165, 171., 1.04- 1.05
Fundionalism, 21- 24, 36, 58- 59. SteRIso .
Katz, J, 155 - 156
Miaofundionalism Kawamoto , A , 108- 111
KingsCollegechapel , 77
Generalizations. 92. 181- 182. 196- 202 K-line theory, 1.07
G~ ~ li~ dela nile. Su; Delta nile Kludge, 69- 71., 134
Gestalttheory, 84 Knowledge -representation language , 1.9
Gould, S., 76- 80 Kodt, C., 183
Gracefuldegradation , 62, 89- 90, 95 Krellenstein , M., 11.1, 11.9
Gradualisticholism, 68- 70 Kuczaj , 5., 161 .
Grammar , 21, 154- 157, 162- 166
Guesses, 62

Hallam. D., 158


Harcourt , A , 51
Hayes , p., 52, 158- 159 Cognitivism
Heuristicsearch , 13 , 165- 166. Se, alsoConnectionism
Learning
Hiddenunits, 102- 103. Seealso Lewontin, R. , 76- 80
Connectionism , 162
Lexicon
High-leveldesaiptions , 111- 112, Uebennan , P., 69
198- 202 Linguistictheories , 160- 163. Seealso
Hinton, G., 132, 152, 207 Grammar ; Computational linguistics
Hobbs,j ., 158 USP, 10, 13
224 Index

Logical inference, 127 Partialsimulation . 182


Look-up tree, S6 Past-tensenetwork, 160- 166
Lycan, W ., 3S in, 163
data. SeeConnectionism , and
McCarthy,J., 9- 10 data
McCarthy, R. , 151 Patterncompletion , 92
McClelland , J., 84, 88, 92- 104, 108- 111, Perceptrons , 85
115, 130, 160- 163, 167, 203- 204 Pettit, P., 37, 43, 46, 55, 197, 198
McCulloch , G., 45 Physical -symbol-systemhypothesis ,
McDoweU, J., 37, 43, 46, 55 11- 17, 135. SeealsoSPSS hypothesis
Marr, D., 17- 18, 159 Pinker,S., 160- 172, 174
Marr' s threelevels, 18 Poggio, T., 183
Mellish, C., 158 Primitiveoperations , 12, 16, 128- 130. See
Memory. SeealsoConnectionism alsoArchitecture
andconfabulation . 105 Prince , A., 160- 172, 174
contentaddressable , 88- 89 Process explanations , 198
distributed , 96- 105 Programexplanations , 198
Mentalstates . Seefolk psychology ; Projectibleinterpretations , 18, 46, 113
Representations ; Mind Propositional attitudes . Se, Folk
Metaphoricalunderstanding , 111, 167- 168 psychology
Microfeatures . SeeConnectionism ; Prototypes , 96- 100
Representations Psychological generalizations . See
Microfundionalism , 34- 36, 118, 180 Generalizations
Microworlds,26- 27, 63, 75 Psychological instantiation , 178- 180
Mind. SeealsoContent; folk psychology ; Psychological modeling , , 129, 130,
61
Representations 180- 183. SeealsoRepresentations ;
andAI models , 152- 154 Cognitivism ; Connectionism
cognitive science and, 37- 59 Psychological realism . 154- 160, 162
functionalphylogenyof, 72 Psychopathology , Connectionism
. Se ; and
mapof, 177 pathologicaldata
asmessy , 76 Putnam , H., 22- 23, 36, 42, 182, 198
assystematic , 144- 150 Pytyshyn , Z., 17, 19- 20, 143- 152, 203
'
Mind s-eye view , 3- 4
Minsky, M., 9, 207 . Se, Naivephysics
Qualitativereasoning
Morris, M., 43
Real-timeprocessing , 62- 63
Naivephysics , 52, 157- 160 Reduction . 41
Narrowcontent,42, 45- 46. SetalsoBroad Relaxationalgorithms , 4. Seealso
content; Folkpsychology Connectionism
NETtalk, 191- 194, 199 Representational primitives, 102
Neutralnetworks . SeeConnectionism Representations . SeealsoConnectionism ;
Nevins, J., 66 Symbolprocessing ; Distributed
Newell, A., 9- 17 representations
Norman,Do, 140, 170 context-sensitive , 109- 111, 117
Numericallevel 188- 189 high -level, 205
involving theenvironment , 63- 66
Optimizingprocess, 76 microfeaturaL 108- 114, 116- 118,
. 163
Overregularization 167- 170
in a semanticmetric, 190, 191
. See
Paralleldistributedprocessing structured , 20, 143- 152
Connectionism Restaurant script, 30- 31
Index 225

, M., 68- 69
Ridley Spreading activation , 86- 92
Robbins, A , 203 SPSShypothesis , 12- 13, 32, 135. Setalso
Rooms example , 94- 96 Physical -symbol-systemhypothesis
Rosenberg , c., 191- 193 Sterelny , K., 155- 156
Rosenblatt , F., 85 , 5., 37, 39- 41, 155- 156
Stich
Rules STStheory. SeeSemantic transparency
context-free,27 Subconceptuaileve L 189- 191. Setalso
explicitversus implicit, 20- 21, 127 Connectionism
input-output , 33- 34 Subpenonal cognitivepsychology , 147
in PDPsystems , 99- 100,112,115- 118, Subsymbolic paradigm . 111- 114, 175, 188.
120- 121,132,162- 165 SeealsoConnectionism
Rumelhart , D., 84, 88, 92- 104, 115, Surfacedyslexia , 168
130- 131,160- 163,167203 - 204 Swimbladder , 69
Rutkowska , J., 64- 65 Symbiosis , 69- 70
SymbolicAI, 194- 195
Schank . R., 25, 30- 31, 92- 93 Symbolicatoms, 12
Schemata , 92- 96. SeealsoFrame -based Symbolicparadigm , 112- 113, 175
reasoning Symbolicreasoning , 131- 136
Scientificcreativity, 13- 17. Seealso Symbolprocessing , 2, 9, 11- 13, 17- 21,
BACON 112, 131- 137. SeealsoCognitivism ;
Scientificessence , 11 Representations
Scripts . SeeFrame -basedreasoning Syntax.SeeRepresentations
Searle, J., 30- 34, 55, 135, 154, 178 Systematicity argument , 144- 150
Sejnowski , T., 191- 193, 199
SeIf-monitoring,202- 207 Tacitrules, 20- 21. SeeRisoRules
Semantically specificdeficits , 151 Taskanalysis , 18, 159
Semantic metric, 190- 191 Tasks, 15, 17
Semantic transparency , 2, 17- 21, 35, 105, Technological AI, 153
107, 111- 120. SeealsoRepresentations ; Tennant , N., 51
Cognitivism ; Connectionism Thagrad , P., 129
Sentence -processing models , 107- 111 Thought. SeeRisoFolkpsychology ;
Sententialism , 80. SeealsoCognitivism ; Holism ; Representations
Folkpsychology ascriptionof, 48- 50
Sequential thought, 132- 136 structurescapableof supporting ,
Seriality , 14, 15 124- 126
SHRDLU , 25- 27 two kindsof studiesof, 152- 160
Simon , H., 9- 17, 66, 135, 139- 141 Truth conditions , 42- 43
Simulation . SeeVirtualmachines Turing, A , 9 - 10
Situatedintelligence , 22, 28, 63- 66 Turingmachine , 10, 12, 23
Sloman , A., 12, 74 Twin earth , 42- 44. SeeRisoBroadcontent
Slugs, 85
Smart , J., 22 . 2, 128- 130,
Uniformityasswnption
Smith, Maynard,51 137- 139
Smolensky , P., 3, 17- 21, 81, 111- 118,
131- 132, 137- 139, 151- 152, 173, Variablebinding , 170
188- 195, 202 Verticallylimitedmicroworlds. See
Snowballeffect, 71 Microworlds
Solipsism , 45, 54, 64 Virtualmachines
Spandrels , 77- 80, 94 connectionist, 121- 122
Speedof processing , 119, 121- 122 explanations gearedto, 174- 175, 181
Sponges , 63 - 66 symbolic - 136, 140- 141
, 131
226 Index

Visualcliffis2
VogeLSo , 63- 64
Walker
, S., 73
Warrington , C., 151
Winograd , T., 25- 26
Winston, P., 26
Wood6eld , A , 42, 47

Zytkow,J., 16

You might also like