Clark, A - Microcognition
Clark, A - Microcognition
Microcognition.
Andy Clark.
1991 The MIT Press.
: Neil
Specialthanksare due to the following people (in no particularorder)
Tennant and Donald Campbellfor showing me what a biological perspective
was all about; Martin Davies, Barry Smith, and Michael Morris for
for
reminding me what a philosophicalissuemight look like; Aaron Sloman
-
encouragingan ecumenicalapproach to conventional symbol processing
AI ; Margaret Boden for making the School of Cognitive and Computing
Sciencespossiblein the Ant place; my father, JamesHendersonClark, and
my mother, Christine dark , for making me possible in the Ant place;
my father and Karin Merrick for their saintly patience with the typing ;
H. Stanton and A . Thwaits for help with the many mysterious details of
Adam Sharp
publishing a book; the CogneticsSociety at Sussex(especially
and Paul Booth) for some of the graphics; and LesleyBenjaminfor invaluable
help with the problem of meaningin all its many forms.
I would like to thank the copyright owners and authors for permission
to use the following figures. Figures 4.3 and 4.4 are from S. J. Gould and "
"
R. Lewontin, The Spandrelsof SanMarco and the PanglossianParadigm,
Proceedings of the Royal Societyof London , SeriesB, 205 (1979) no. 1161;
-
582 583. Reproducedby permission of the authorsand the Royal Society.
Table 5.1 and figures 5.1, 5.3, 5.4 , 5.5, and 5.7 are from J. McClelland,
D. Rumelhart, and the POP ResearchGroup, Parallel DistributedProcessing :
0
Explorationsin the Microstructure / Cognition, vols. 1 and 2 (Cambridge:
MIT / Press, 1986). Reproducedby permission of the authors and the MIT
Press.
Partsof chapters3 and 4 are basedon the following articlesof mine that
have appearedin scholarlyjournals. Thanksto the editors for permissionto
use this material.
"From Folk- "
Psychology to Cognitive Science , CognitiveScienceII ,
no. 2 (1987): 139- 154.
" " -
A Biological Metaphor, Mind and LanguageI , no. 1 (1986): 45 64.
"The " 4 1987):
Kludge in the Machine, Mind and Language2, no. (
277- 300.
xiv Acknowledgments
2 ParallelDistributedProcessing AI
and Conventional
Parallel distributed processingnamesa broad class of AI models. These
modelsdependon networks of richly interconnectedprocessingunits that
are individually very simple. The network storesdata in the subtly orchestrated
morass of connectivity. Some units are connected to others by
excitatory links, so that the activation of one will increasethe likelihood
that the other is activated. Some are inhibitorily linked. Some may be
neutral. The overall systemturns out to be an impressivepattern completer
that is capable of being tuned by powerful learning algorithms. Many
useful properties seemto come easily with such a setup. Taken together,
these allow such systems to representdata in an economicalyet highly
flexible way . A certain classof work in conventionalAI I shall call semantically
. A model will count as semanticallytransparentif and only
transparent
if it involves computational operations on syntactically specifiedinternal
statesthat (1) can be interpreted as standingfor the conceptsand relations
" "" "" "" "
spokenof in natural language(suchitems as ball , cat, loves, equals,
and so on) and (2) theseinternal tokens recur whenever the system is in a
state properly describedby content ascriptions employing those words:
the token is, as we shall say, projectibleto future cases . ( Note that such
statesneed not be localizablewithin the machine. The point is rather that
we can make senseof the systemas operating accordingto computational
rules on entities of that grain.) In short, a systemis semanticallytransparent
if there is a neat mapping between states that are computationally trans-
fonned and semanticallyinterpretable bits of sentences . A great deal of
work in conventionalAI (but not all) is semanticallytransparent. Work in a
highly distributed, connectionistparad~gm is not. And therein, I shall argue,
lies a philosophically interesting difference.
3 TheMultiplicity of Mind
Some POP theorists argue that conventional AI models are at best good
approximations to the deep truth revealedby connectionism. Some conventional
theorists argue that connectionismdisplays at best a new way
of implementing the insights contained in more traditional models. Both
campsare thus endorsingwhat I shall call the uniformity assumption. This
statesthat a single relation will obtain between connectionistand conventional
models for every classand aspectof mentality studied by cognitive
psychology.
The uniformity assumptionis, I believe, distortive and unhelpful along a
number of dimensions. Most straightforwardly, it is distortive if (as seems
likely) the mind is best understood in terms of a multiplicity of virtual
machines , someof which are adaptedto symbol processingtasksand some
What the brain tells the mind 3
of which are adapted for subsymbolic processing. For many tasks, our
everyday performancemay involve the cooperativeactivity of a variety of
suchmachines.Many connectionistsare now sympatheticto sucha vision.
Thus, Smolensky(1988) introducesa virtual machinethat he calls the conscious
rule interpreter. This is, in effect, a POP systemarrangedto simulate
the computational activity of a computer running a conventional (and
semanticallytransparent) program.
Less straightforward but perhaps equally important is what might be
termed the multiplicity of explanation . This will be a little harder to tease
out in summary form. The general idea is that even in caseswhere the
underlying computationalform is genuinelyconnectionist, there will remain
a need for higher levels of analysis of such systems. We " will need to
display, for example, what it is that a number of differentconnectionist
networks, all of "which have learned to solve a certain classof problems,
have in common. Finding and exhibiting the commonalitiesthat underpin
important psychological generalizationsis, in a sense, the whole point of
doing cognitive science. And it may be that in order to exhibit such
commonalitieswe shall need to advert to the kinds of analysisfound in
symbolic (nonconnectionist) AI .
'
4 TheMind s-EyeViewand theBrain's-EyeView
AI that involves conventional, semanticallytransparentsymbol processing
is to be identified with what I am calling the mind's- eye view. The mind' s-
eye view generatesmodels basedon our intuitive ideasabout the kind of
semanticobject over which computational operations need to be defined.
They survey the nature of human thought from within normal human
experienceand set out to model its striking features. The models produced
dependon encoding and manipulatingtranslationsof the symbol strings pf
ordinary language. For some explanatory projects, I shall argue, such an
approachmay indeedbe both correct and necessary . But for others it looks
severely 'limited.
The mind s-eye approachwas prominent in the late sixties and throughout
the seventies. It is characterizedby the tasksit selectsfor study and the
forms of the computational approachit favors. The tasks are what I shall
term " recent achievements ." " Recent" here has both an evolutionary and a
developmentalsense. In essence , the tasks focusedon are those we intui-
tively considerto be striking and interestingcognitive achievements . These
include chessplaying (and game playing in general), story understanding,
consciousplanning and problem solving, cryparithmeticpuzzles, and scientific
creativity. Striking achievementsindeed. And programs were devised
that did quite well at such individual tasks. Chess-playing programs performed
at closeto the world-classlevel; a scientificcreativity program was
4 Introduction
'
able to rediscoverone of Kepler' s laws and Ohm s law; planning programs
learnedto mix and match old successfulstrategiesto meet new demands;
new data structures enabled a computer to answer questions about the
unstatedimplicationsof stories; cryparithmeticprogramscould far outpace
you or me. But something seemedto be missing. The programmedcomputers
lacked the smell of anything like real intelligence. They were rigid
and brittle, capableof doing only a few tasksinterestingly well.
This approach(which was not universal) may have erred by placing too
'
much faith in the mind s own view of the mind. The entities that found
their way into such models would be a fairly direct translation of our
standardexpressionsof belief and desireultimately into some appropriate
'
machinecode. But why supposethat the mind s natural way of understanding
its own and others mental states by using sentential attributions of
beliefs, desires, fears, and so on should provide a powerful model on which
to base a scientific theory of mind? Why suppose, that is, that the com-
putational substrateof most thought bears some formal resemblanceto
ordinary talk of the mind? Such talk is an evolutionarily recent development
, gearedno doubt to smoothing our daily socialinteractions. There is
no obvious pressurehere for an accurateaccount of the computational
structure underlying the behavior that such talk so adequatelydescribes
and (in a very real sense) explains.
'
Part 1 of the book examinesthe mind s-eye approach, associatingit with
a commitment to semanticallytransparentprogramming. It looks at some
standardphilosophicalcriticismsof the approach(mistakenly identified by
many philosopherswith an AI approachin general) and also raisessome
further worries of an evolutionary and biological nature. In part 2 our
" '
attention is focusedon the POP alternative, which I call the brain s-eye
"
view. The label refers to the brainlike structure of connectionistarchitectures
. Such architecturesare neurally inspired. The neural networks found
in slugs, hamsters, monkeys, and humans are likewise vast parallel networks
of richly interconnected, but relatively slow and simple, processors .
The relative slowness of the individual processors is offset by having them
work in a cooperativeparallelismon the task at hand. A standardanalogy
is with the way a film of soapsettleswhen stretchedacrossa loop (like the
'
well-known childrens toy ). Each soap molecule is affected only by its
immediateneighbors. At the edgestheir position is determinedby the loop
(the system input). The affect of this input is propagated by a spreading
seriesof local interactions until a global order is achieved. The soap film
settlesinto a stable configuration acrossthe loop. In a POP systemat this
point we say the network hasrelaxedinto a solution to the global problem.
In computing, this kind of cooperativeparallelismproves most usefulwhen
the task involves the simultaneoussatisfactionof a large number of small,
"
or " soft, constraints. In such cases( biological vision and sensorimotor
What the brain tells the mind 5
5 TheFateof theFolk
And where does all that leave folk psychology? The position I adopt
explicitly rejectswhat I call the syntadic challenge. The syntadic challenge
demandsthat if beliefs and desiresare real and causebehavior, there must
be neat, in-the-headsyntactic analoguesto the semanticexpressionsin sentences
ascribing them. I in general deny this to be the case. Instead, I see
belief and desire talk to be a holistic net thrown acrossa body of the
behavior of an embodiedbeing acting in the world. The net makessenseof
the behavior by giving beliefs and desiresas causesof actions. But this in
no way dependson there being computational brain operations targeted
on syntactic items having the semanticsof the words usedin the sentences
ascribingthe beliefs. SemanticallytransparentAI posits a neat reductionist
mapping between thoughts so ascribed and computational brain states.
That is, thoughts (as ascribedusing propositional-attitude talk) map onto
computationaloperationson syntactic strings whose parts have theseman-
tics of the parts of the sentencesused to expressthe attitudes. The picture
I propose looks like this: thoughts (as ascribed using propositional attitudes
talk) are holistically ascribedon the basisof bodies of behavior. Individual
items of behavior are causedby computationalbrain operationson
syntactic items, which may not (and typically will not) be semantically
transparent. In my model a thought is typically not identical with any
computationalbrain operation on syntactically identi6~d entities, although,
6 Introduction
6 Threadsto Follow
Here are some threads to follow if your interest lies in a particular topic
among those just mentioned. Conventional, semantically transparentAI
is treated in chapter I , sections 2 to 5; chapter 2, section 4; chapter 4,
section 5; chapter 7, section 6; chapter 8, sections2 and 4 to 8; chapter 9,
sections 3, 5, and 6; chapter 10, section 4, and the epilogue. Parallel
distributed processingis covered in chapter 5, sections 1 to 7; chapter 6,
sections1 to 8; chapter 7, sections1 to 7; chapter 9, sections1 to 7; and
chapter 10, sections 2 to 5. Mixed models (POP and simulated conventional
systems) are takenup in chapter7, sections1 to 7; chapter9, sections
1 to 7 (especially 9.6); and chapter 10, section 4. Folk psychology and
thought are discussedin chapter 3, sections 1 to 9; chapter 4, section 5;
chapter 7, section 6; chapter 8, sections1 to 9; and chapter 10, section 4.
Biology, evolutionary theory, and computational models are discussedin
chapter3, section6; chapter 4, sections1 to 6; and chapter5, section6.
The main POP models used for discussionand criticism are the Jetsand
the Sharks(chap. 5, sec. 3), emergent schemata(chap. 5, sec. 4), memory
(chap. 5, sec. 5), sentenceprocessing(chap. 6, secs. 2 and 3), past tense
acquisition(chap. 9, secs. 2 and 3).
Chapter 1
ClassicalCognitivism
2 Turing , Newell
, McCarthy , andSimon
Thebiggerthenames , thehardertheydrop. Thesewoulddentthekindsof
Roorsthat supportedancientmainframes . It would be fair to say that
Turing made AI conceivable , andMcCarthy(alongwith Minsky, Newell,
10 1
Chapter
-Symbol
3 ThePhysical -SystemHypothesis
-
A physicalsymbolsystem, accordingto NewellandSimon(1976, 40 42), is
systemsmeetingthe
anymemberof a generalclassof physicallyrealizable
following :
conditions
(1) It containsa set of symbols, which are physical patterns that can
be strung together to yield a structure(or expression).
-
(2) It containsa multitude of suchsymbol structuresand a set of pro
cessesthat operate on them (creating, modifying , reproducing, and
destroying them accordingto instructions, themselvescoded as symbol
structures).
(3) It is locatedin a wider world of real objectsand may be related to
that world by designation(in which the behavior of the systemaffects
or is otherwise consistently related to the behavior or state of the
object) or interpretation(in which expressionsin the systemdesignate
a process, and when the expressionoccurs, the systemis able to carry
out the process).
In effect, a physical symbol system is any system in which suitably manipulable
tokenscanbe assignedarbitrary meaningsand, by meansof careful
to some
programming, can be relied on to behave in ways consistent (-
specifieddegree) with this projectedsemantic content . Any generalpurpose
computer constitutes such a system . What , though , is the relation between
such systemsand the phenomenaof mind (hoping, fearing, knowing, believing
and Simon are
, planning, seeing, recognising, and so on)? Newell
commendably explicit once again. Such ability an to manipulatesymbols,
of
they suggest, is the scientificessence thought and intelligence, much as
-
H2O is the scientific" essenceof water. According to the physical-symbol
and sufficient condition for a
system hypothesis, the necessary physical
system"to exhibit generalintelligent action is that it be a physical symbol
system, Newell and Simon thus claim that any generally intelligent physical
systemwill be a physical symbol system (the necessityclaim) and that
"
any physical symbol system can be organisedfurther to exhibit general
" the
intelligent action (' sufficiency " ). And general, intelligent action, on
claim
Newell and Simons gloss, implies the samescopeof intelligence seenin
"
humanaction.
It is important to be as clear as possible about the precise nature of
'
Newell and Simons claim. As they themselvespoint out (1976, 42), there
is a weak (and incorrect) reading of their ideas that assertssimply that a
of
physical symbol system is (or can be) a universal machinecapable any-
-
well specifiedcomputation, that the essence of intelligence lies in computa
tio~ and that intelligencecould thereforebe realizedby a universalmachine
(and henceby a physical symbol system). The trouble with this reading is
12 1
Chapter
4 BringinghometheBACON
Still, a little evidencenever goes amiss. For a start, we find the following
commentsandwichedbetweenNewell and Simon's outline of the nature of
a physical symbol system and their explicit statement of the hypothesis:
"
The type of systemwe have just defined. . . bearsa strong family resemblance
to all general purpose computers. If a symbol-manipulation language
, suchas LISP, is taken as defining a machine, then the kinship becomes
"
truly brotherly ( Newell and Simon 1976, 41). Douglas Hofstadter (1985,
646, 664), who takes issue with the idea that baroque manipulations of
standard LISP atoms could constitute the essenceof intelligence and
thought, is happy to ascribejust 'that view to Newell and Simon.
Moreover, Newell and Simons own practicedoes seemto bear suchan
ascription out. Thus, all their work, from the early GeneralProblem Solver
(1963) to their more recent work on production systems" and on automating
scientificcreativity, hasbeen guided by the notion of serialheuristic
searchbased on protocols, notebook records, and observation of human
subjects. (Heuristic searchis a meansof avoiding the expensiveand often
practically impossible systematic search of an entire problem space by
using rules of thumb to lead you quickly to the areain which with a little
luck the solution is to be found.) For our purposes, the things that most
significantly characterizethis work (and much other work in contemporary
AI besides- see, for example, the AM program mentionedbelow)
are its reliance on a serial application of rules or heuristics, the rather
high-level, consciously introspedible grain of most of the heuristics involved
, and the natureof the chosen-task domains. I shall try to makethese
points clearerby looking at the exampleof BACON and some of its suc-
14 1
Chapter
"
expert systemsand qualitative
" reasoning(see, e.g ., the section Reasoning
about the PhysicalWorld in Hallam and Mellish 1987). Thus the MYCIN
rule (Shortliffe 1976) for blood injections reads: If (1) the site of the culture
is blood, (2) the gram stain of the organismis gramneg, (3) the morphology
of the organismrod, and (4) the patient is a compromisedhost, then there
is suggestiveevidencethat the identity of the organism is pseudomonas -
aeruginosa (&om Feigenbaum 1977 , 1014).
Likewise, BA CON ' s representationof data was at the level of attribute-
value pairs, with numericalvaluesfor the attributes. The general character
of the modeling is even more apparent in programs for qualitative-law
discovery: GLAUBER and STAHL. GLAUBER applies heuristic rules to
data expressedat the level of predicate-argument notation, e.g ., " reacts
"
[inputs (Hd , NH3 ) outputs ( NH4, d )], and STAHL deploys such heUristics
"
as identify components:If a is composedof b and c, and a is composed
of b and d, and neither c containsd nor d containsc, then identify c with d."
Third, BACON usesfairly slow serial search, applying its heuristicsone
at a time and assessingthe results. Insofar as BACON relies on productions
, there is an element of parallelism in the searchfor the currently
applicable rule. But only one production fires at a time, and this is the
seriality I have in mind. Serialbehavior of this kind is characteristicof slow,
consciousthought. And Hofstadter (1985, 632) reports Simon as asserting,
"
Everything of interest in cognition happensabove the 100 millisecond
level- the time it takesyou to recogniseyour mother." Hofstadterdisagrees
vehemently, assertingthat everything of interest in cognition takes place
below the l00 -millisecondlevel. My position, outlined in detail in chapters
5 to 9 below, is sympatheticto Hofstadter's (and indeed owes a great deal
to it ). But I will show that the notion of a dispute over the correct level
of interest here is misplaced. There are various explanatory projects here,
all legitimate. Some require us to go below the lOO-millisecond level (or
whatever) while others do not. This relates, in a way I expand on later, to
a problem areacited by Simon in a recent lecture (1987; seealso Langley,
et ale1987, 14- 16). Simon notes that programslike BACON are not good
at very ill structuredtasks, tasksdemandinga great deal of generalknowledge
and expectations. Thus, though BACON neatly arrives at Kepler's
third law when given the well-structured task of finding the invariants in
the data, it could not come up with the flash of insight by which Fleming
could both see that the mould on his petri dish was killing surrounding
bacteriaand recognizethis as an unusualand potentially interesting event.
Ustening to Simon, one gets the impressionthat he believesthe way to
solve these ill -structured problems is to expandthe set of high-level data
and heuristics that a system manipulatesin the normal, slow, serial way
(i.e., by creating, modifying , and comparing high-level symbol strings
accordingto stored rules). Thus, in a recent coauthored book he dismisses
16 1
Chapter
the idea that the processes involved in the flash-of-insight type of discovery
might be radically different in computational kind, saying that
"
the speedand subconsciousnature of such events does not in any way
imply that the processis fundamentally different from other processes of
discovery- only that we must seekfor other sourcesof evidence
"
about its
nature (i.e., subjects' introspections can no longer help) (Langley et al.
1987, 329).
The position I develop holds rather that the folk-psychological term
" scientific "
discovery encompass es at least two quite different kinds of
processes. One is a steady, V on Neumann-style manipulation of standard
symbolic atoms in a searchfor' patterns of regularity. And this is well
modeled in Simon and Langleys work. The other is the flash-of-insight
type of recognition of somethingunusualand interesting. And this, I shall
suggest, may require modeling by a method quite different (though still
computational).
In effect, theorists such as Langley, Simon, Bradshaw, and Zytkow are
betting that all the aspectsof human thought will turn out to be dependent
on a single kind of computationalarchitecture. That is an architecture
in which data is manipulated by the copying, reorganizing, and pattern
matchingcapabilitiesdeployed on list structuresby a Von Neumann(serial)
processor. The basic operations made available in such a setup definethe
computational architecture it is. Thus, the pattern matching operations
which such theorists are betting on are the relatively basic ones available
.
in such cases(i.e., test for complete syntactic identity , test for syntactic
identity following variablesubstitution, and so on). Other architectures(for
example, the POP architecture discussedin part 2 of this book) provide
different basic operations. In the case of parallel distributed processing
these include a much more liberal and flexible pattern-matching capacity
able to find a best match in caseswhere the standardSPSSapproachwould
find no match at all (seeespeciallychapters6 and 7 below).
-
Langley, Simon, et al. are explicit about their belief that the symbol
processingarchitecturethey investigate has the resources to model and
explain all the aspectsof human thought. Pacedwith the worry that the '
approach taken by BACON , OAL TON , GLAUBER, and STAHL won t
suffice to explain all the psychological processes that n) ake up scientific
"
discovery, they write, Our hypothesisis that the other processesof scientific
discovery, taken one by one, have [the] samecharacter, so that programs
for discovering researchproblems, for designing experiments, for
designing instruments and for representingproblems will be describable
by meansof the" samekinds of elementaryinformation processes that are
usedin BACON (1987, 114). They makesimilar commentsconcerningthe
questionof mental imagery (p. 336). This insistenceon a single architecture
of thought may turn out to be misplaced. The alternative is to view mind
Classicalcognitivism 17
5 Semantically TransparentSystems
It is time to expand on the notion of a standard symbolic atom, introduced
in section 3 above. One of the most theoretically interesting points
of contrast between classicalsystems(as understood by philosopherslike
Fodor and Pylyshyn) and connectionist systems(as understood by theorists
like Smolensky) concernsthe precisesensein which the former rely on,
and the latter eschew, the use of such symbolic atoms. To bring out what
is at issue here, I shall speakof the classicistas ( by definition) making a
methodological commitment to the construction of semantically transparent
systems. Credit for the general idea of semantictransparencybelongs
elsewhere. The analysis I offer is heavily influenced by ideas in
7
Smolensky1988 and Davies, forthcoming.
18 1
Chapter
Levell , computational theory. This level describes the goal of the computation
, the general strategies for achieving it , and the constraints on
such strategies .
Level 2, representationand algorithm . This describes an algorithm , i.e., a
series of computational steps that does the job . It also includes details
of the way the inputs and outputs are to be represented to enable the
algorithm to perform the transformation.
Level 3, implement a Hon . This shows how the computation may be
given flesh (or silicon) in a real machine.
In short, level ! considerswhat function is being computed(at a high level
of abstraction), level 2 finds a way to compute it , and level .3 shows how
that way can be realizedin the physical universe.
Supposethat at level ! you describea task by using the conceptual
apparatusof public language. (This is not" compulsory at level ! but is often
"" '" "
the case.) You might usesuchwords as liquid, flow : edge, and so on.
You thus describethe function to be computed in terms proper to what
Paul Smolenskycalls the conceptual level, the level of public language.
Very roughly, a systemwill count asan STSif the computationalobjectsof
its algorithmic description (level 2) is isomorphic to its task-analytic description
couchedin conceptuallevel terms (level ! ). What this meansis
that the computationaloperationsspecifiedby the algorithm are applied to
internal representationsthat are projectibly interpretable as standing for
conceptual-level entities. (Again, clarificationof the notion of projectibility
will have to wait until chapter6.)
Someexamplesmay help to sharpentheselevels. Considerthe following
specificationsof functions to be computed.
Classicalcognitivism 19
objects to which the rule is deAnedto apply. The derivation rules may be
tacit, so long as the data structuresthey apply to are explicit. On this Fodor
and Pylyshyn rightly insist: " Classicalmachinescan be rule implicit with
respect to their programs. . . . What doesneed to be explicit in a classical
machineis not its program but the symbols that it writes on its tapes (or
stores in its registers). These, however, correspondnot to the machine's
rules of statetransition but to its data structures" (1988, 61). As an example
they point out that the grammarposited by a linguistic theory neednot be
explicitly representedin a classicalmachine. But the structuraldescriptions of
sentences over which the grammar is deAned (e.g ., in terms of verb stems,
subordinateclauses , etc.) must be. Attempts to characterizethe classical/
connectionistdivide by referenceto explicit or non explicit rules are thus
shown to be in error (seealso Davies, forthcoming).
6 Functionalism
While I am setting the stage, let me bring on a major and more straightforwardly
philosophical protagonist, the functionalist. The functionalist is
in many ways the natural bedfellow of the proponent of the physical-
symbol-systemhypothesis. For either versionof the physical-symbol-system
hypothesis claims that what is essentialto intelligence and thought is a
certaincapacityto manipulatesymbols. This puts the essenceof thought at
a level independentof the physical stuff out of which the thinking system
is constructed. Get the symbol manipulating capacitiesright and the stuff
doesnot matter. As the well-known bluesnumberhasit , " It ain' t the meat,
it 's the motion." The philosophical doctrine of functionalism echoesthis
sentiment, asserting (in a variety of forms) that mental states are to be
identified not with , say, physicochemicalstatesof a being but with more
abstractorganizational, structural, or informational properties. In Putnam's
" ' "
rousing words We could be madeof swisscheeseand it wouldn t matter
(1975, 291). Aristotle , some would have it , may have been the first philo-
sophical functionalist. Though there seemsto be a backlashnow underway
(see, e.g ., Churchland 1981 and Churchland 1986), the recent popularity
of the doctrine can be traced to the efforts of Hilary Putnam (1960,
1967), Jerry Fodor (1968), David Arm strong (1970) and, in a slightly different
vein, Daniel Dennett (1981) and William Lycan (1981). I shall not
attempt to do justice to the nuancesof thesepositions here. Instead, I shall
simply characterizethe most basicand still influential form of the doctrine,
leaving the searchfor refinements to the next chapter. First, though, a
commenton the question to which functionalismis the putative answer.
In dealing with the issuesraised in this book, it seemedto me to be
essentialto distinguish the various explanatory projects for which ideas
about the mind are put forward. This should becomeespeciallyclear in the
22 Chapter1
'
open to a variety of criticisms. Especiallyrelevant here is Putnams (1960,
1967) criticism that identity theory makes far too tight the tie between
being in a certain mental state (e.g ., feeling pain) and being in a certain
- -
physicochemicalor neural state. For on an extreme, type type identity
reading, the identity of some mental state with , say, some neural state
would seemto imply that a being incapableof being in that neural state
could not, in principle, be in the mental state in question. But for rather
obvious reasonsthis was deemedunacceptable . A creaturelacking neurons
'
would be unable to occupy any neural state. But couldn t there be exotic
'
beings made of other stuff who were nonethelesscapableof sharing our
beliefs, desires, and feelings? If we allow this seeminglysensiblepossibility
then we, as philosophers, need someaccountof what physically variously
constituted feelersand believershave in common that makesthem feelers
and believers. Behaviorismwould have done the trick, but its denial of the
importanceof inner stateshad beenperceivedas a fault. Identity theory, it
seemed , had gone too far in the other direction.
Betweenthe Scyliaand the Charybdissailedthe good ship functionalism.
What is essential to being in a certain mental state, according to the
functionalistschema , is being in a certainabstractfunctional state. And that
functional state is deAned over two components: (1) the role of some
internal statesin mediating system input and systemoutput (the behavior
element) and (2) the r:ole of the statesor processes in activating or otherwise
affecting other internal statesof the system(the inner element). If we
alsopresumethat cognition is a computationalphenomenon, then we can
link this characterization(as Putnam [1960Jdid) to the notion of a Turing
machine, which is defined by its input and output and its internal state-
transition profile. What Turing machine you are instantiating, not what
substanceyou are made of, characterizesyour mental states. As I said, it
' '
ain t the meat, it s the motions.
Now the bad news. Functionalismhashad its problems. Of specialinterest
to us will be the problem of excessiveliberalism(seeBlock 1980). The
charge is that Turing-machine functionalism allows too many kinds of
things to be possiblebelieversand thinkers. For example, it might in principle
be possibleto get the population of China to passmessages(letters,
values, whatever) betweenthemselvesso asbrieRy to realizethe functional
-
specificationof somemental state (Block 1980, 276 278). (Recall, it is only
a matter of correctly organizing inputs, outputs, and internal state transitions
'
, and these, however they are specified, won" t be tied to any particular
kind of organism.) As Block (1980, 277) puts it , In describingthe Chinese
systemas a Turing machineI have drawn the line [i.e., specifiedwhat counts
as inputs and outputsJ in sucha way that it satisfiesa certain type of functional
description- one that you also satisfy, and "
one that, according to
functionalism , justifies attributions of mentality. But, saysBlock, there is at
24 1
Chapter
2 TheDreyfusCase
Hubert Dreyfus hasbeenone of the most persistentyet sensitivecritics of
the cognitivist tradition. At the core of his disquiet lies the thought that
there is a richnessor " thickness" to human understandingthat cannot be
capturedin any set of declarativepropositions or rules. Instead, the richness
depends on a mixture of culture, context, upbringing, and bodily
self-awareness . Accordingly, Dreyfus turns a scepticalgaze on the early
microworlds work associatedwith Winograd (1972) and the frame- and
script-based approaches associatedwith Minsky (1974) and Schankand
Abelson (1977), among others.
'
Winograd s (1972) SHRDLU was a program that simulated a robot
acting in a smallmicroworld composedentirely of geometricsolids(blocks,
pyramids, etc.). By restricting the domain of SHRD L U's alleged competence
and providing SHRDLU with a model of the items in the domain,
Winograd was able to produce a program that could engage in a quite
sensibledialogue with a human interlocutor. SHRDLU could, for example,
26 2
Chapter
resolve problems concerning the correct referents of words like " it " and
"
the pyramid" (there were many pyramids) by deploying its knowledge of
the domain. The output of the program was relatively impressive, but its
theoretical significanceas a step on the road to modeling human understanding
was open to doubt. Could a suitableextensionof the microworld
strategy really capture the depth and richnessof human understanding?
Dreyfus thought not, since he saw no end to the number of such micro
'
competencesrequired to model even a child s understanding of a real-
world activity like bargaining.
Considerthe following example(due to Papertand Minsky and cited in
Dreyfus 1981, 166).
" ' '
Janet: That isn t a very good ball you have. Give it to me and I ll
"
give you my lollipop .
For the set of micro theories neededto understandJanet's words, Minsky
and Pape~ suggest a lengthy list of concepts that includes: time, space,
things, people, words, thoughts, talking, social relations, playing, owning,
eating, liking, living , intention, emotions, states, properties, stories, and
places. Each of these requires filling out. For example, anger is a state
causedby an insult (and other causes ) and results in noncooperation(and
other effects). And this is just one emotion. This is a daunting list and one
that, as Dreyfus points out, shows no obvious signsof completion or even
'
completability. Dreyfus thus challengesthe micro theorist s faith that some
finite and statableset of micro competencescan turn the trick and result in
a computerthat really knows about bargainingor anything else. Suchfaith,
he suggests, is groundless, as AI has failed " to produce even the hint of
a system with the flexibility of a six-month old child" (Dreyfus 1981,
173). He thinks that " the specialpurpose techniqueswhich work in context
-free, gamelikemicro-worlds may in no way resemblegeneral
purpose
humanand animal intelligence."
Similar criticisms have been applied to Winston' s (1975) work in computer
vision and Minsky 's (1974) frame-based approach to representing
everyday knowledge. A frame is a data structure dealing with astereo -
typical courseof events in a given situation. It consistsof a set of modes
and relations with slots for specific details. Thus a birthday-party frame
would consist of a rundown of a typical birthday-party-going sequence
and might refer to cakes, candles, and presentsalong the way. Armed with
sucha frame, a system would have the common senserequired to assume
that the cakehad candles, unlesstold otherwise.
But as Dreyfus is not slow to point out, the difficulties here are still
enormous. First, the frames seem unlikely ever to cover all the contingencies
that common sensecopes with so well~(Are black candleson a
birthday cake significant?) Second, there is the problem of accessingthe
Situation arid substance 27
right frame at the right time. Humanscan easily make the transition from,
say, a birthday-party frame to an accidentframeor to a marital-sceneframe.
Is this done by yet more, explicit rules? If so, how does the system know
when to apply these. How does it tell what rule is relevant and when? Do
we have a regress here? Dreyfus thinks so. To take one final example,
considerthe attempt to imbue an AI system with knowledge of what it is
for somethingto be a chair. To do so, Minsky suggests, we must choosea
set of chair-descriptionframes. But what does this involve? Presumablynot
a searchfor common physical features, sincechairs, as Dreyfus says, come
in all shapesand sizes(swivel chairs, dentist' s chairs, wheelchairs, beanbag
chairs, etc.). Minsky contemplatesfunctional pointers, e.g ., " somethingone
can sit on." But that is too broad and includesmountains and toilet seats.
And what of toy chairs, and glass chairs that are works of act? Dreyfus'
suspicionis that ,the task of formally capturing all theseinter-linked criteria
and possibilities in a set of context-free facts and rules is endless. Yet, as
Dreyfus can hardly deny, human beings accomplishthe task somehow. If
Dreyfus doubts the cognitivist explanation, he surely owes us at least a
hint of a positive account.
3 It Ain 't What You Know; It 's the Way You Know It
And we .get such an account. In fact, we get hints of what I believe to be
two distinct kinds of positive account. Critics of Dreyfus (e.g ., Torrance
[1984, 22- 23]) have tended to run these together, thereby giving the
'
impressionthat reasonabledoubts concerningone line of Dreyfuss thought
constitute a global undermining of his position. In this sectionI shall try to
be a little more sympathetic. This is not to say, however, that such critics
are wrong or lack textual evidence for their position. In early articles
Dreyfus does indeed seemto run together the two accountsI distinguish.
And even in the most recent book (Dreyfus and Dreyfus 1986), he treats
the two as being intimately bound up. My claim is that they both can and
shouldbe kept separate.
The first line of thought concernswhat I shall call " the body problem."
The body problem is neatly summedup in the following observation: "The
computer comesinto our world even more alien than would a Martian. It
doesnot have a body, needsor emotions, and it is not formed by a shared
"
languageand other social practices ( Dreyfusand Dreyfus 1986, 79). We
can seethis worry at work in the following passage , in which Dreyfus tries
'
for a positive account of what makessomething a chair: What makesan
object a chair is its function [and] its place in a total practical context. This
pre-supposescertain facts about hwnan beings (fatigue, the way the body
bends) and a network of other culturally detennined equipment (tables,
floors, lamps) and skills (eating, writing, going to conferences , giving
28 2
Chapter
'
gaps in their knowledge. And couldn t a Martian be said to have learned
somethingof eating by learning what we eat and why , even before knowing
enough about human anatomy to decide whether we eat with our
mouths or our ears? Critical-mass arguments thus strike me as very unconvincing
. Let us thereforepassquickly on to another line of thought.
'
The secondline of Dreyfus s thought takes its cue not from the body
problem but from "
more general observations about ways of encoding
"
if
knowledge ( encoding is the right word ). It is here that his ideas are
most suggestive. He notes that human beings seemnaturally inclined to
spot the signiAcantaspectsof a situation, and he relatesthis capacitynot to
a stored set of rules and propositions. but to their vast experience of
previous 'concretesituationsand somekind of holistic associativememory.
He asks, is the know-how that enableshumansconstantly to sensewhat
specificsituationsthey are in the sort of know-how that canbe represented
as a kind of knowledge in any knowledge representation language no
"
matter how ingenious and complex? (Dreyfus 1981, 198.) Viewed from
' '
this angle, Dreyfus s worry is not that machinesdon t know enough (because
of their lack of bodies and so on) but rather that the way in which
current AI programs representknowledge is somehow fundamentally inadequate
to the real task. Such programs assumewhat Dreyfus doubts,
" all that is relevant to
namely , that behaviour can be formalised
in a structured " . 200intelligent
. This
' description p ( ) is, for Dreyfus, the most basic
"
tenet of what he calls the " information processingapproach.
At this point I must pauseto raise a few questionsof my own. What
exactly counts as a structured description? What counts as aknowledge -
representationlanguage? It may seem as if Dreyfus here intends to rule
out all forms of computational accounts of cognition and lay total (and
mysterious) stresson human bodies and culture. This, however, is not the
case.In a recentbook Dreyfus stresses the needfor 8exible systemscapable
" "
of what he calls holistic similarity recognition if any progress is to
be made in modeling human expertise ( Dreyfus and Dreyfus 1986, 28).
He cites, with some approval, recent work on connectionist or PDP ap-
proachesto mind" (p. 91). Perhapsmost revealingly of" all, .he does so in a
section entitled AI without Information Processing. As we shall see, I
disagree with the claim that such approaches do away with structured
descriptionsor information processing. But that, for now, is another matter.
'
What we need to notice here is just that since Dreyfus s doubts exclude
suchapproachesbut include the approaches taken by Newell, Simon,
Winograd, and others, they may best be seenas doubts about what I earlier
dubbed the SPSShypothesis. They are doubts about whether a certain
computational approach can in principle yield systems with the kind of
8exibility and common sensewe tend to associatewith the warranted
ascription of understanding. The underlying thought, in effect, is that
30 2
Chapter
~ '
'
where real intelligence is concerned, it ain t what you know, it s th~ way
you know it. Of course, there could be a link with the points about bodies
and so forth even here. There may be somethings we know about largely
by our own awarenessof our bodily and muscularresponses(Dreyfus cites
swimming as an example). Perhapsa machinelacking our kind of body but
equipped with some kind of mechanismfor holistic similarity processing
could even so know less about such things than we can. Nonetheless, on
the reading of Dreyfus 1 am proposing (I have no idea whether he would
endorseit ), such a machinewould be flexible and commonsensicalwithin
its own domains,. and as suchit would be at leasta candidatefor a genuine
knower, albeit not quite a human one. This is in contradistinction to any
systemrunning a standardcognitivist program. 1shall expandon this point
in subsequentchapters.
'
Of Dreyfus s two points (the one about the social and embodied roots
of human knowing, the other about the need for flexible, commonsense
knowledge) it is only the secondwhich 1 think we can expect to bear any
deeptheoreticalweight. But at leastthat point is suggestive. So let us keep
it in our back pockets as we turn to a rather different criticism of AI and
cognitive science.
then asked, "And did the man eat the hamburgerf' it can answer " yes,"
becauseit apparently knows about restaurants. Searle believes, I think
rightly , that the computer running this program does not really know
about restaurantsat all, at least if by "know " we mean anything like
"
understand." The Chinese-room example is constructed in part to demonstrate
this. But Searle believes his arguments against that sort of
computational model of understanding are also arguments against any
computationalmodel of understanding.
We are askedto imaginea humanagent, an Englishmonolinguist, placed
in a large room and given a batch of paperswith various symbols on it.
These symbols, which to him are just meaninglesssquiggles identifiable
only by shape, are in fact the ideogramsof the Chinesescript. A second
batch of papersarrives, again full of ideograms. Along with it there arrives
a set of instructions in English for correlating the two batches. Finally, a
third batch of papersarrives bearing still further arrangementsof the same
uninterpretedformal symbolsand again accompaniedby someinstructions
in English concerning the correlation of this batch with its predecessors .
The human agent performs the required matchings and issuesthe result,
which I shall call " the response." This painstaking activity , Searleargues,
'
correspondsto the activity of a computer running Schanks program. For
we may think of batch 3 as the questions, batch 2 as the story, and batch 1
as the script or background data. The response, Searlesays, may be so
convincing as to be indistinguishablefrom that of a true Chinesespeaker.
And yet, and this is the essential point, the human agent performing
the correlations understandsno Chinese, just as, it would now appear,
a computer running Schank's program understandsno stQries. In eachcase
what is going on is the mere processingof information. If the intuitions
prompted by the Chinese-room example are corred, understandingmust
involve something extra. From this Searle concludes that no computer
can ever understandmerely by " performing computational operations on
"
formally specified elements. Nor , consequently, can the programs that
determinesuchcomputationaloperationstell us anything abou~the special
nature of mind (Searle1980, 286).
Rammingthe point home, Searleasksus to comparethe understanding
we (as ordinary English speakers ) have of a story in English against the
" "
understanding the personmanipulatingthe formal symbols in the Chinese
room has of Chinese. There is, Searleargues, no contest. 'in the Chinese
caseI have everything that Artificial Intelligence can put into me by way
of a program and I understand nothing; in the English case I understand
everything and there is so far no reasonat all to supposethat my
understanding has anything to do with computer programs- i.e. "
, with
computational operations on purely formally specified elements (Searle
1980, 286). In short, no formal accountcan be sufficientfor understanding,
32 2
Chapter
"
since a human will be able to follow the fonnal principleswithout understanding
"
anything (p. 287). And there is no obvious reason to think
that satisfying somelonnal condition is necessary either, though as Searle
admits, this could (just conceivably) yet prove to be the case. The fonnal
descriptions, Searlethinks (p. 299), seemto be capturing just the shadows
of mind, shadowsthrown not by abstractcomputationalsequencesbut by
the actual operation of the physical stuff of the brain.
I shall argue that Searleis simply wrong thus completelyto shift the
emphasisaway from fonnal principleson the basisof a demonstrationthat
the operation of a certain kind of fonnal program is insufficient forinten -
tionality . The position to be developed below and in chapters3 and 5 to
11 views as a necessarythough" perhaps insufficient condition of real
understandingthe instantiation of a certain kind of fonnal description that
is far more microstructural than the descriptionsof the SPSShypothesis.
'
Undennining Searles strongest claims, however, is no simple matter, and
we must proceed cautiously. The best strategy is to look a little more
closely at the positive claims about the importance of the nonfonnal,
biological stuff.
support thought. Well , (1) may be right (see chapter 3), though not for
the reasonscited in (2). But even so, (2) is surely not that obscurea claim.
Searlecites the less puzzling caseof photosynthesis. By focusing on this,
we may begin to unscramblethe chaos.
Photosynthesis , Searle suggests, is a phenomenon dependent on the
actual causal properties of certain substances . Chlorophyl is an earthly
example . But perhaps other . substancesfound elsewhere in the universecan
photosynthesize too. Similarly , Martians might have intentionality, even
though poor( souls ) their brains are made of different stuff from our own.
Suppose we now take a fonnal chemical theory of how photosynthesis
occurs. A computer could then work through the fonnal description. But
'
would actual photosynthesisthereby take place? No , it s the wrong stuff,
you see. The formal description is doubtlessa handy thing to have. But if
it ' s energy(or thought) you need, you had better go for the real stuff. In its
way, this is fair enough. A gross' formal theory of photosynthesismight
consist of a single production, if subjected to sunlight, then produce
"
energy. A fine-grained formal theory might take us through a seriesof
microchemicaldescriptionsin which various substancescombineand cause
various effects. Grossor fine-grained, neither formalismseemsto herald the
arrival of the silicon tulip . Market gardening has nothing to fear from
simulatedgardeningas yet .
Now , there are properties of plants that are irrelevant to their photosynthetic
capacities, e.g ., the color of blooms, the shapeof leaves(within
limits) the height off the ground, and so on. The questionsto askare: What
do the chemicalproperties buy for the plant, and what are the properties
of the chemicalsby which they buy it? The human brain is made out of
a certain physical, chemicalstuff. And perhapsin conjunction with other
'
factors, that stuff buys us thought, just as the plant s stuff buys it energy.
So, what are the properties of the physical chemicalstuff of the brain that
'
buy us thought? Here is one answer (not Searles or that of supporters
'
of Searles emphasison stuff, e.g ., Maloney [1987]): the vast structural
variability in responseto incoming exogenousand endogenousstimuli that
the stuff in that arrangementprovides.4
Supposethis were so. Might it not also be true that satisfyingsomekinds
of formal descriptionguaranteedthe requisitestructuralvariability and that
satisfying otherkinds of formal description did not? Sucha state of affairs '
seemsnot only possible but pretty well inevitable. But if so, Searles
argumentagainstthe formal approachis, to say the least, inconclusive. For
the only evidenceagainst the claim that the formal propertiesof the brain
buy it structural variability, which in turn buys it the capacity to sustain
thought, is the Chinese-room thought experiment. But in that example
the formal description was at a very gross level, in line with the SPSS
hypothesisof chapter 1, which in this caseamountsto rules for correlating
34 Chapter 2
6 Microfunctionalism
The defenceof a formal approach to mind mooted above can easily be
extended to a defence of a form of functionalism against the attacks
mounted by Block (seechapter 1, sectionS). An unsurprisingresult, since
'
Searles attack on strong AI is intended to castdoubt on any purely formal
accountof mind, and that attack, aswe saw, bearsa striking resemblanceto
the ~ arges of excessive liberalism and absent qualia raised by Block.
Functionalism, recall, identified the real essenceof a mental. state with
.
an input internal state transition, and output profile. Any system with
the right profile, regardlessof its size, nature and components, would
Situation and substance 3S
occupy the mental state in question. But unpromising systems (like the
population of China) could, it seemed, be so organized. Such excessive
liberalismseemedto undenninefunctionalism: surely the systemcomprising
the population of China would not itselfbe a proper subjectof experience.
The qualia(subjectiveexperienceor feels) seemto be nowhere present.
It is now open to us to respond to this chargein the sameway we just
respondedto Searle. It all depends, we may say, on where you locate the
grain of the input, internal state transitions, and output. If you locate it at
the gross level of a semanticallytransparentsystem, then we may indeed
doubt that satisfying that formal description is a step on the road to being
a proper subjectof experience. At that level we may expect absentqualia,
excessiveliberalism, and all the rest, although this needn't precludeformal
accountsat that level being good psychological explanationsin a senseto
be developed later (chapters 7 and 10). But supposeour pro61eis much
finer-grained and is far removed from descriptionsof events in everyday
language, perhapswith internal-statetransitionsspecifiedin a mathematical
formalism rather than in a directly semanticallyinterpretable formalism.
Then it is by no meansso obvious (if it ever was- see Churchland and
Churchland 1981) either that a systemmadeup of the population of China
couldinstantiatesucha descriptionor that if it did, it would not be a proper
subjectof the mental ascriptionsat issue(other circumstancespermitting-
seechapter 3). My suggestionis that we might reasonably bet on a kind
of microfunctionalism, relative to which our intuitions about excessive
liberalismand absentqualiawould show up as more clearly unreliable.
Sucha position owes something to Lycan's (1981) defenceof functionalism
againstBlock. In that
'
defensehe accusesBlock of relying on a kind of
gestalt blindness (Lycan s term) in which the functional componentsare
made so large (e.g ., whole Chinese speakers '
) or unlikely (e.g ., Searles
beer cans) that we rebel at the thought of ascribing intentionality to the
giant systemsthey comprise. Supersmallbeings might, of course, have the
sametrouble with neurons. Lycan, however, then opts for what he calls
a homuncularfunctionalism, in which the functional subsystemsare iden-
tified by whatever they may be said to do for the agent.
Microfunctionalism, by contrast, would describe at least the internal
functional pro61eof the system (the internal state transitions) in terms
far removed from such contentful, purposive characterizations . It would
delineateformal (probably mathematical ) relationsbetweenprocessingunits
in a way that when those mathematicalrelationsobtain, the systemwill be
capableof vast, flexible structural variability and will have the attendant
emergentproperties. By keeping the formal characterization(and thereby
any good semantic interpretation of the formal characterization ) at this
fine-grained level we may hope to guaranteethat any instantiation of such
a description provides at least potentially the right kind of substructureto
36 2
Chapter
support the kind of flexible, rich behavior patternsrequired for true understanding
. These ideas about the right kind of fine-grained substructures
will be fleshedout in later chapters.
Whether such an account is properly termed a speciesof functionalism,
'
as I ve suggested, is open to somedebate. I have opted for a broad notion
of functionalismthat relatesthe real essenceof thought and intentionality
to patterns of nonphysically specifiedinternal state transitions suitablefor
mediating an input-output profile in a certain general kind of way. This in
effect identifies functionalism with the claim that structure, not the stuff,
counts and hence identifies it with any formal approach to mind. On
that picture, microfunctionalism is, as its name suggests, just a form of
functionalism, one that specifiesinternal state transitions at a very fine-
grained level. " "
Somephilosophers, however, might prefer to restrict the functionalism
label to just those accountsin which (1) we begin by formulating, for each
individual mental state, a profile of input, internal state transitions, and
output in which internal state transitions are described at the level of
beliefs, desires, and other mental states of folk psychology (see the next
chapter) (2) we then replacethe folk-psychological specificationsby some
formal, nonsemantic specification that preserves the boundaries of the
folk-psychological specifications .! Now there is absolutely no guarantee
that such boundaries will be preserved in a microfunctionalist account
(seethe next chapter). Moreover, though it may, microfunctionalismneed
not aspire to give a functional specificationof eachtype of mental state.
(How many are there anyway?) Instead, it might give an account of the
kind of substructureneededto support general, flexible behavior of a kind
that makes appropriate the ascription to the agent of a whole host of
"
folk-psychologicalstates. For thesereasons,it may be wise to treat microfunctiona
" asa term of art and the defenceof functionalismasa defence
of the possiblevalue of a fine-grained formal approachto mind. I use the
terminology I do becauseI believe the essentialmotivation of functionalism
lies in the claim that what counts is the structure, not the stuff (this is
consistentwith its roots- seePutnam 1960, 1967, 1975b). But who wants
to fight over a word? Philosophical disquiet over classicalcognitivism,
I conclude, has largely been well motivated but at times overambitious.
Dreyfus and Searle, for example, both raise genuine worries about the
kind of theories that seek to explain mind by detailing computational
manipulationsof standardsymbolic atoms. But it is by no meansobvious
that criticisms that make senserelative to those kinds of computational
modelsare"legitimately generalizedto all computationalmodels. The claim
that structure, not stuff, is what counts has life beyond its classicalcognitivist
incarnation, as we shall seein part 2.
3
Chapter
FolkPsychology , andContext
, Thought
1 A Canof Wonns
'
2 A Beginner
s Guideto Folk Psychology
The good news about folk psychology is that a beginner's guide won' t be
necessaryafter all. For it is a fair bet that nobody reading this book is
a beginner. The term " folk psychology" refers simply to our mundane,
daily understandingof ourselvesand others as believing, hoping, fearing,
desiring, and so forth. Some such understandingof mental states is the
commonproperty of most adult humanspeakersin contemporarysocieties.
At its core it usesbelief and desireascriptionsto shedlight on behavior or
(to avoid begging questions) bodily movements.
A colleaguesuddenlygets up and rushesover to the bar. You explainher
movementsby saying, "Sheis desperatefor a Guinnessand believesshecan
'
get one at the bar." Very likely you won t expressyourself in sucha stilted
and artificial way in sucha simplecase. But the explanatorymode is familiar
enough, and it is one we explicitly usein more complex cases , e.g ., solving
detective mysteriesthat involve the searchfor motives for the evil deed.
At a minimum folk psychology is thus the use of belief and desire talk
to explain action or, better, movement (movement becomesaction when
it is subsumableunder the intentional umbrella of a folk-psychological
'
understanding). Folk psychology is not the gossips understanding "
of, e.g .,
Freudian theories of psychoanalysis . In that respect the term folk pys-
" is
chology somewhat misleading. In the recent literature (Churchland
1979, 1981; Stich 1983) folk psychology is treated (and criticized) as a
primitive, protoscientifictheoryof the internal causalantecedentsof human
behavior. At first blush this may seema somewhatstartling thought. What
could be meant by the claim that our ordinary ideas about the mental
involve some kind of theory? And even if they do, why should it be
a theory of intenial causesof behavior? Let us addressthe first of these
questionsand leave the other to ferment until later. Return to the caseof
the lover of Guinness. Our belief-desire description of her movements
toward the bar is explanatory , it is argued, only if we tacitly accepta general
psychological law. In this case :
(:r)(p)(q) { [(:r desiresthat p) & (:r believesthat (q - + p ]
- + (:r will try , all elsebeing equal, to bring it about that q) }
Folkpsychology , andcontext 39
, thought
3 TheTroublewith Folk
Now for the bad news. Folk psychology,', it seemsto some, is in various
ways flawed and unsatisfactory (Churchland 1981; Stich 1983). Specific
complaintsabout folk psychology include the following :
(1) Folk psychology affords only a local and somewhat species -
specificunderstanding . It floundersin the face of the young, the mad,
and the alien.
(2) It is stagnantand infertile, exhibiting little change, improvement,
or expansionover long periods of time.
(3) It shows no signs as yet of being neatly integrated with the
body of science. It seemssadly disinterestedin carving up nature at
neurophysiologicallyrespectableroots.
The folk, in short, just don' t know their own minds. Let us look at each
complaint in a little more detail. '
Complaint (1) surfacesin Churchlands insistencethat the substantial
explanatory and predictive successof folk psychology must be set against
its failure to cope with , " the nature and dynamics of mental illness, the
faculty of creative imagination, . . . the ground of intelligence differences
betweenindividuals, . . . the nature and psychologicalfunctions of sleep, . . .
the miracle of memory, [and] . . . the nature of the learning processitself"
(Churchland 1981, 73).
StephenStich, in a similar vein, is worried by the failure of folk psychology
to success fully explain the behavior of exotic folk and animals.
Thus, he claims that we are unable adequatelyto characterizethe content
of alien or outlandish beliefs. Giving the exampleof someonewho seems
to believehe is a heapof dung, Stich notes that we are tempted to say that
if it seemsthat someonebelieves that, then we really can't be sure what
they believe. That is, it seemsunlikely that any such alien belief can be
40 Chapter 3
4 Contentand World
The content of a mental state is what gets picked out by the " that" clause
in construcHons like: "Daredevil believes that Elektra is dead," "Mary
' "
hopes that Fermats last theorem is true, and so on. Sincethe discussion
of content coversquesHonsabout meaningand about mind, in suchdiscussions
philosophy of mind, philosophy of psychology, and philosophy of
languageall meet up, with spectacularpyrotechnic results (see, e.g., Evans
1982 and essaysin W o~ eld 1982 and in PetHt and McDowell 1986).
The part of the display that interestsus hereconcernsthe debateover what
hasbecomeknown as broador world-involvingcontent.
There is a tendency to think of psychological states as, in essence ,
self-containedstatesof the individual subject. That is to say, of course, not
that we are not located in and affected by the world , but only that our
psychological statesare not essenHally determinedby how the world about
us really is so much as by how it strikes us as being. In other words, the
intui Hon is that whatever doesn't in any way impinge on your conscious
or unconsciousawarenesscan't be essentially implicated in any correct
specificationof your mental state. On this view your mental stateshave
the contents they do becauseof the way you are, irrespecHve of the
possibly unknown facts about your surroundings.
Much recent philosophy is characterizedby a snowballing crisis of faith
in this seeminglyimpregnabledoctrine. Content, accordingto the hereHcs,
essenHally involves the world (PetHt and McDowell 1986, 4). The crisis
began on twin earth (seePutnam 1975a). The twin earth thought experiments
work by varying the facts about the environment while keeping
all the narrowly specifiablefads about the subject constant. Narrowly
'
specifiablefacts about a subject include the subjects neurophysiological
profile 'and any other relevant facts specifiablewithout referenceto the
subjects actual surroundings either present or past. The upshot of such
thought experimentsis to suggest that some content, at least, essenHally
involves the world. Thus, to usethe standard, well-worn example, imagine
" "
a speakeron earth who says There is water in the lake. And imagine on
twin earth a na" ow doppelganger (someone whose narrowly specified
Folk psychology, thought, and context 43
statesare identical with the first speaker) who likewise says"There is water
"
in the lake. Earth and twin earth are qualitatively identical except that
water on earth is H2O while water on twin earth is XYZ, a chemical
differenceirrelevant to all daily macroscopicwater phenomena.Do the two
speakersmean the samething by their words? It has begun to seemthat
they cannot. For many philosophershold that the meaningof an utterance
must determine the conditions under which the utteranceis true. But the
utteranceson earth and twin earth aremadetrue or falseby the presenceor
absenceof H2O and XYZ respectively. So if meaning determines truth
conditions, the meaningof statementsinvolving natural kind terms (water,
'
gold, air, and so on) cant be fully explained simply by reference to
narrowly specifiablestates of the subject. And what goes thus" for natural
kind terms also goes (for similar reasons) for demonstratives( that table,"
" the "
pen on thei' sofa, etc.) and proper names. The l~sson, asPutnamwould
have it , is that meaningsjust ain' t in the head."
At this point, accordingto Pettit and McDowell (1986, 3), we have two
options. (1) We could adopt a composite account of meaning and belief
in which content dependson both an internal psychological component
(common to the speakerson earth and twin Earth) and an external world -
involving component ( by hypothesis, not constant acrossthe two earths).
Or (2) we could take suchcasesas calling into question the very idea that
the psychological is essentially inner and hence as calling into question
even the idea of a purely inner and properly psychological componentof
mental states, as advocatedin (1). As Pettit and McDowell (1986, 3) put it ,
"No doubt what is 'in the head' is
causallyrelevant to statesof mind.' But
must we suppose that it has any constitutive relevance to themf Of
course, we do not haveto take the twin earth casesin either of the ways
mentioned above. For one thing, they constitute an argument only if we
antecedentlyacceptthat meaning should determine truth conditions. And
even then there might be considerableroom for maneuver(see, e.g ., Searle
1983; Fodor 1986). In fact, I suspect that as argumentsfor content that
involves the world , the twin -earth casesare red herrings. As Michael
Morris hassuggestedin conversation, they servemore to clarify the issues
than to argue for a particular view. Nonetheless, the idea that contentful
statesmay essentially involve the world has much to recommendit (see
especiallythe discussionof demonstrativesin Evans 1982).
This, however, is not the place to attempt a very elusive argument.
Instead, I propose to conduct a conditional defenceof cognitive science.
Even if all content turned out to radically involve the world (option (2)
above), that in itself need not undermine the claim of cognitive science
to be an investigation that is deeply (though perhapsnot constitutively)
relevant to the understandingof mind. In short, acceptingoption (2) (i.e.,
44 Chapter 3
rejecting the idea that the psychological is essentially inner) does not
commit us to the denial of conceptual relevance, implied in the quoted
passagefrom Pettit and McDowell .
The notion of constitutive relevance will be amplified shortly. First,
- '
though, a word about an argument that (if it worked which it doesnt )
would make any defenceof cognitive scienceagainst the bogey of broad,
world -involving content look strictly unnecessary
. The argument(adapted
from Hornsby 1986, 110) goes like this:
Two agentscan differ in mental state only if they differ somehow in
their behavioraldispositions.
A behavioral difference (i.e., a difference in behavioral dispositions)
requiressomeinternal physical difference.
'
So there cant be a difference in mental states without somecor-
responding difference of internal physical states (contrary to some
readingsof the twin -earth cases ).
In other words, the content of mental stateshasto be narrowly determined
if we are to preservethe idea that a differencein behavioral disposition
(upon which mentality is said to supervene) requiresa differenceof inner
constitution.
This argument, as Hornsby (1986, 110) points out, tradeson a fluctuating
" "
understanding of "behavior." In the first premise behavior" means bodily
"
movements. This is clear, sinceno state of the headcan causeyou to, e.g .,
throw a red ball or speakto Dr. Frankensteinor sit in that (demonstratively
'
identified) chair in the real absence - despite appearances , let s presume-
of a red ball, Dr. Frankenstein , or the chair, respectively. At most, a state of
the head causesthe bodily movements that might count, in the right
externally specifiedcircumstances , as sitting in that chair, throwing the red
ball, and so on. In the second premise, however, the appropriate notion
'
of behavior is not so clearly the narrow one. It may be (and Putnams
argumentswere meantto suggestthat it mustbe) that the correctascription
of contentful statesto one another is tied up with the actual statesof the
surroundingenvironment. If this is so, then it would be reasonableto think
that since the ascription of contentful statesis meant to explain behavior,
behavior itself should be broadly construed. Thus, there could be no
behavior of picking up the red ball in the absenceof a red ball (whatever
the appearancesto the subject, his bodily movements, etc.). In this sense
the ideaof behaviorimplicatedin mental-stateascriptionsis more demanding
than the idea of merebodily movements. In another sense,as Hornsby also
- -
points out (1986, 106 107), it might be lessdemanding, as fine grained
differencesin actual bodily movement (e.g ., different ways of moving our
fingers to pick up the red ball) seemstrictly irrelevant to the ascription of
Folk psychology, thought, and context 4S
- thus seem
psychologicalstates. Ascriptions of folk psychologicalcontents
to carve reality at joints quite different to any we may expect to derive
from the solipsistic study of states of the head that determine bodily
movements. The conclusion(that ascriptionsof folk-psychologicalmental
statesare concernedonly with narrow content) is thus thrown into serious
doubt. It is not clear that we can make senseof any appropriate notion of
"behavior" cannotbe
narrow content. And equivocationon the meaning of
relied upon to fill in the gaps.
The radical thesisthat all ideasof content essentiallyinvolve the world
thus survivesthe latest assault. Despite the doubts voiced earlier, I propose,
as I said, to give away as much as possibleand acceptthis thesis, while still
" "
denying the pessimistic implications for cognitive science. ( This may
seem a curious project, but there are independent reasonsfor requiring
somesuchdefence, as we shall see.)
The target posture (accept broad content and deny the concep.tual or
uncomfortableif
philosophicalirrelevanceof cognitive science) may seem
you acceptthe following argument .
The mental states of folk psychology (belief, desire, fear, hope, etc.)
are individuated by appeal to a broad, world -involving notion of
content.
The accountsand explanationsgiven by cognitive science , insofar as
or
they are formally computationally specifiable , must in principle be
of semantic, world -involving considerations . They
independent any
must have an internal syntactic reading that treats only of narrow ,
definable statesof agents . (See, e. g ., Fodor 1980a .)
solipsistically
There is every causeto believe that semantic, world -involving accounts
and solipsistic, narrow accountswill not carve nature at the
samejoints. There will be no neatly individuated internal states(either
mental
neurophysiologicallyor formally specified) that map onto the
statesindividuated by folk psychology.
'
So cognitive sciencecant be in the businessof contributing to a
because
philosophical understandingof the nature of mental states,
the stateswith which it directly dealsdo not map satisfactorily onto
our notions of mental states.
The conclusion, which is essentiallythat of McCulloch (1986), amounts
"
to a pretty fundamentalrejection of the idea' that there can be a scientific
' - and the
synopsis (of the manifest folk psychological image of ourselves
scientific one) given that mind is unquestionably one of the things that
" 1986,
must show up in it in some suitably scientific guise ( McCulloch
-
87 88). The question, I th ~ is what counts as a scientific synopsishere.
Must a satisfactory synopsis involve a state-for -state correlation ? Or is
46 Chapter 3
there some more indirect way to achieve both synopsis and conceptual
relevance? I believe there is, but we must tread very carefully .
5 Interlude
' "
What a curious project!" you may be thinking. The author proposesto
attempt a defenceof the significanceof cognitive scientific investigations
againsta radical, intuitively unappealing
, and inconclusivelyargueddoctrine.
And he proposesto do so not by challenging the doctrine itself, but by
provisionally accepting it , and then turning the aggressor ' s blade." The
reason for this is simple. With or without the broad-content theory, it
looks extremely unlikely that the categories and classificationsof folk
psychology will reduce neatly to the categoriesand classificationsof a
scientificaccountof what is in the head. This is the " failing" that Church-
land and (to a lesserdegree) Stich ascribeto folk psychology. I embracethe
mismatch, but not the pessimisticconclusion. For folk psychology may not
be playing the samegame as scientificpsychology, despite its deliberately
provocative and misleadinglabel. So I take the following to be a very real
possibility: wheneverI entertaina thought, it is completely individuated by
a state of my head, i.et .he content of the thought does not essentially
involve the world , but there will be no projectablepredicatesin a scientific
"
psychology that correspond to just that thought. By no proledable
"I
predicate meanno predicate(in the scientificdescription) that is projectable
onto other caseswhere we rightly say that the being is entertaining
the samethought. Suchother caseswould includemyself at different times,
other humans, animals, aliens, and machines.
Regardlessof broad content, I thereforejoin the cynics in doubting the
scientificintegrity of folk psychology as a theory of statesof the head. But
I demur at both the move from this observation to the conclusion that
cognitive science, as a theory of states of the head, has no philosophical
relevance to the understandingof mind (Pettit and McDowell ) and the move
to the conclusionthat folk psychology be eliminatedin favor of a scientific
accountof statesof the head(Churchland).
What I try to develop, then, is more than just a conditional defenceof
cognitive sciencein the face of allegations of broad content. It is also
a defenceof cognitive sciencedespite any mismatchbetween projectable
states of the head and ascriptions of specific beliefs, desires, fears, etc.
Relatedly, it is a defenceof belief-desire talk against any failure to carve
nature at internally visible joints. Coping with the broad-content worry is
thus really a fringe benefit associatedwith a more careful accommodation
of commonsensetalk of the mental into a scientific framework. So, now
'
that we know we are getting value for our money, let s move on.
Folk psychology, thought, and context 47
6 SomeNaturalisticReflections
At this point I think we may be excusedfor indulging in a little armchair,
naturalistic reflection. It seemsa fair question to ask, What earthly use
is the everyday practice of ascribing mental states to one another using
the apparatusof folk psychology, that is, the apparatusof propositional-
attitude ascriptionwith notions of belief and desire? One answermight be '
that it is useful as a means of predicting and explaining other peoples
bodily movements by attempting to track internal states of their heads.
This, we saw, is what the eliminative materialist mustbelieve the practice
is for.l Otherwise, it would hardly be to the point to criticize it for failing
to carve up nature at neurophysiologicaljoints. And if that is what the
practiceis for, it may be in deep water. But why should we assumethat it
has any such purpose? Consider an alternative picture, due in part to
Andrew Woodfield.2
On this picture, the primary purpose of folk-psychological talk is to
makeintelligible to us the behavior of fellow agentsacting in the world. In
particular, it is to make their behavior intelligible and predictable just
insofar as that behavior bears (or could bear) on our own needs and
interests. Now let us throw in a few more small facts. The other agents
whosebehavior we wish to makeintelligible areprimarily our peers, beings
with four notable traits. First, they largely share our sensitivity to the
world, i.e., our sensesand any innate protoconceptualapparatus. Second,
they share our world. Third, they shareto a large extent our own most
basicinterestsand needs. Fourth, the biological usefulnessof their thoughts,
like our own, involves their tracking real states of the world , a purpose
for which we may (on evolutionary grounds) assumethat their thinking is
well adapted. Taken together, those traits help makeplain the convenience
and economy of ascribingfolk-psychologicalcontent. The thoughts of our
peers are well adapted to the same world as our own. So given also
a convergenceof needs and interests, we may economically use talk of
statesof the world generally to pick out the salientfeaturesof the thoughts
of others. The development of a tendency to make broad-content ascriptions
begins at this point to seemlesssurprising. Broad-content ascription,
we may say, is content ascription that is sensitiveto the point of thinking,
which is to track statesof the world. In general, it looks as if our thinking
succeedsin this. The paradox of broad-content ascription is just that when
"
our thinking fails (when, e.g ., that chair" is entertainedin the absenceof
any chair), we must say that the thought (or better, the thinking) failed to
have the content intended. But that seemsacceptable , once we see the
generalreasonableness .
of the overall enterprise
Even if we bracket the stuff about broad-content ascription, we still
have a naturalizedgrip on somereasonswhy ascribingfolk-psychological
48 3
Chapter
content ought not to aspireto track neat, projectable states of the head.
For the question must then arise: whose head? According to the present
account, what we are interested in is a very particular kind of understanding
of the bodily movementsof other agents. It is an understanding
of those movementsjust insofar as they will bear on our own needsand
projects. And it is an understanding that seemsto be availableany time we
" "
can use talk of the world (as in the that clauseof a propositional-attitude
'
ascription) to help us pick out "broad patterns in other agents behavior. "
Take for examplethe sentence John believesthat Buffalo is in Scotland.
This thought ascription is useful not becauseit would help us predict, say,
'
exactly how Johns feet would move were someoneto tell him that his
long-lost cousin is in Buffalo or even when he would set off. Rather, it is
useful becauseit helps us to predict very general patterns of intended
behavior (e.g., trying to get to Scotland), and becauseof the nature of our
own needs and interests, these are what we want to know about. Thus,
supposeI have a forged rail ticket to Scotlandand'I want to sell it. I am not
interested in the Ane-grained details of anyones neurophysiology. All
I want to know is where to And a likely sucker. If the population included
Martians whose neurophysiology was quite different from our own (perhaps
'
it involves different fonnal principles even), it wouldn t matter a jot ,
so long as they were capable of being moved to seek their long-lost
cousins. Thus construed , folk psychology is designedto be insensitive to
any differencesin statesof the headthat do not issuein differencesof quite
coarse-grainedbehavior. It papersover the differencesbetweenindividuals
and even over differencesbetweenspecies.It doesso becauseits purposeis
provide a general framework in which gross patterns in the behavior of
many other well-adapted beings may be identified and exploited. The
failure of folk psychology to fix on, say, neurophysiologicallywell-deAned
statesof humanbeings is thus a virtue, not a vice.
7 AscriptiveMeaningHolism
The preceding section offers what is in effect just a slightly different
way of putting the fairly common observation that belief ascription (and
propositional-attitude ascription in general) is holistic. It is a net thrown
over a whole body of behavior and is usedto makesenseof the interesting
regularitiesin that behavior. For this reasonbeliefsare ascribedin complexes.
- 'in that an
As one well known meaningholist puts the point, saying agent
performed a single intentional action, we attribute a very complex system
"
of statesand events to him (Davidson 1973, 349). It is important, however
, to be clear about exactly what meaning holism involves. In a recent
"
attack on the doctrine Fodor summarizesit as follows: Meaning holism
is the idea that the identity - speci8cally, the intentional content- of
Folk psychology, thought, and context 49
connectionist models in part 2, we shall also see how a system can produce
semantically systematic behavior without any internal mirroring of the
semantically significant parts of the sentences we use to describe its
behavior .
8 Churchland Again
9 CognitiveScience
and ConstitutiveClaims
There is without doubt a connectionbetween the broad-content theorist' s
'
worry that' cognitive sciencecant illuminate the mind and the eliminative
materialists worry that folk psychology is a distorting influence on any
scientific study of mind. Eachparty seesthe samebumps and potholes in
the cognitive terrain, the samedeep-seatedmismatchof folk-psychological
kinds and narrowly specifiedscientific kinds. But one side concludesthat
the folk just don' t know their own minds, while the other concludesthat
' '
cognitive sciencecant know the folks minds. Eachside claspsmind tightly
to its bosom, pitying the other side as embracing at best a distorted
shadowof the real thing and deprived forever of the joys of true constitutive
involvement. But this is surely overly romantic. I shall sketcha more
permissiveapproach. First, though I must unpackthe notion of constitutive
relevance.
The intended contrast, as far as I can see, is between constitutive and
merely causalrelevance. What has constitutive relevanceis somehowconceptually
bound up with the subject of study (in this case, mind), whereas
various factorsmay be causallyrelevant to thought without the tie being so
tight that the very idea of thought is unable to survive their subtraction.
The contrast, I suspect, is not as hard and fast as some of those who use
(and often abuse) the term seemto believe. If by intellectual reflection we
can see that a certain phenomenon could not occur in any physically
possibleworld (i.e., in any world where the laws of physicsapply), is this a
caseof constitutive or merely causalrelevance? Despite suchunclearcases
Folk psychology, thought, and context 55
holistically ascribed
on basisof
Level (3) a
() Actual
- / "'"
speaker-world
relations
constitutively
dependent on both
without Folk
10 Functionalism
How does all this relate to our earlier discussionsof functionalism and
classicalcognitivism1Think of the way extra nails relate to a coffin lid. The
classicalcognitivist, recall, was committed (in practiceat least) to a specific
architecturalassertion, namely, that you could instantiatea thinking system
by ensuring(at least for the in-the-headcomponent) that it engagedin the
appropriate manipulation of standard symbolic atoms. But if my earlier
conjecturesare at all on the mark, the attempt to model (with a view to
instantiation) the inner, scientificallyinvestigablecausesof our behavior (or
better, movements) by putting versions of folk-psychologicaldescriptions
into themachinebeginsto look very peculiarand unsound. It is asif we were
attempting to create a human thinker by putting sentencesspecifying
various contentful statesinto her head. This now looks preciselybackward.
On my picture, we need to specify inner states capableof causing rich,
flexible behavior, which itself determines (without boundary-preserving
mappings ) the correctnessof folk -psychological descriptions. Putting the
'
mind possibly broad descriptions of the states of agents acting in the
s
world back into the head and expecting thereby to create mentality is
slightly bizarre. In this respect, the eliminative materialist seemsto have
'
been right ; cognitive science shouldn t seek to model internal states on
Folk psychology, thought, and context 59
ordinary contentful talk. For such talk is not able (nor, I would add, intended
) to be sensitiveto the relevant internal causes , yet it is sensitiveto
irrelevant external statesof affairs. The contrast is between putting tokens
of ordinary contentful talk back into the head (classicalcognitivism) and
seeking an account of how what is in the head enables the holistic
ascription of such contents to the subject in the setting of the external
world. In sum, the less plausible we And folk -psychological ideas as a
scientific theory of the internal causesof behaviour, the less plausiblethe
classicalcognitivist program should seem, since it relies heavily on that
level of description. Where goes the classicalcognitivist, there goes the
standardfunctionalist also. For standardfunctionalism (not the microfunctionalism
describedin chapter 2, section 6) is committed to filling in the
following schemasfor eachindividual mental state.
Mental statep is any stateof a systemthat takesx, .v, z (environmental
effects) as inputs, gives 1, m, n, ( bodily movements, vocalizations, etc.)
as outputs, and fulfils internal state transitionsg, h, i.
The difficulties start in the speciAcation of g, h, i, the internal state transitions
. For internal state transitions specify relationsbetweenmental states,
folk psychologically identiAed. The folk-psychologicalspeciAcationsact as
-
scien-
placeholdersto be 811edin with appropriate, syntactically speciAed
ti Ac kinds in due course. But this, of course is
, simply to bet on the neat,
-
boundary preserving relation between folk psychological kinds and scientific
kinds, which I have beenat pains to doubt for the last umpteenpages.
That kind of functionalism- the kind that treats folk-psychological descriptions
as apt placeholdersfor scienti Acally well-motivated statesof the
head rightly deservesmost of the scorn so freely heapedon it by the
-
eliminative materialist.
In conclusion, if I'm even halfway right , the folk do know their own
minds. But they do so in a way sensitive to the pressuresof a certain
in
explanatory niche, the niche of our everyday understandingof patterns
behavior. The pressureson a computational theory of brain activity are
a
quite different. Such a theory is about as likely to share the form of
folk-psychologicalpicture of mind asa land-bound herbivore is to sharethe
form of a sky-diving predator.
Chapter4
Biological Constraints
feeding was madeonly in the last decade. And yet, as Vogel points out:
The structure of spongesis most exquisitely adaptedto take advantage
of such currents, with clear functions attaching to a number of
previously functionlessfeatures. Dynamic pressureon the incurrent
openings facing upstream, valves closing incurrent pores lateral and
downstream, and suction from the large distal or apical excurrent
openings c~mbine to gain advantagefrom even relatively slow currents
. And numerousobservationssuggestthat spongesusually prefer
moving water. Why did so much time elapse before someone
madea crude model of a sponge, placedit in a current and watcheda
streamof dye passthrough it? (1981, 190)
'
Vogel s questionis important. Why wassuch an obvious and simple adaptation
overlooked? The reason, he suggests, is that biologists have tended
to seeknarrowly biological accounts, ignoring the role of various physical
and environmental constraints and opportunities. They have, in effect,
treated the organism as if it could be understood independently of an
understandingof its immediatephysical world. Vogel believesa diametrically
opposedstrategy is required. He urgesa thorough investigation of all
the simple physical and environmental factors in advanceof seekingany
"
more narrowly biological account. He thus urges, 00 not develop explanations
-
requiring expenditure of metabolic energy (e.g. the full pumping
hypothesis for the sponge) until simple physical effects ( g e. . the use of
" 182 . a number
ambient currents) are ruled out ( Vogel 1981, ) Vogel gives
of other examplesinvolving prairie dogs, turret spiders, and mimosatrees.
It is the generallessonthat should interest us here. As I seeit , the lesson
is this: if evolution can economize by exploiting the structure of the
'
physical environment to aid an animals processing , then it is very likely
to do so. And processing here refers as much to information processingas
'
to food processing. Extending Vogel s observations into the cognitive
domain, we get what I shall dub the 007 principle. Here it is.
The 007 principle. In general, evolved creatureswill neither store nor
processinformation in costly ways when they canusethe structureof
the environmentand their operationsupon it asa convenientstand-in
for the information-processingoperations concerned. That is, know
only as much as you need to know to get the job done.
Something like the 007 principle is recognized in some recent work in
developmentalpsychology. Rutkowska (1984) thus argues that a proper
-
understandingof the nature of infant cognition requiresrejecting the solip
sistic strategiesof formulating models of mind without attending to the
-
way mind is embeddedin an information rich world. It is her view that
computational models of infant must
capacities be broad enough to include
Biological constraints 6S
enough. Two headsare indeed often better than one. Since it is so phe-
'
nomenonologically immediate, the use of one s own body is easily overlooked
. Jim Nevins, a researcherinto computer-controlled assembly, cites a
nice example(reported in Michie and Johnston 1984). One solution to the
problem of how to get a computer-controlled machineto assembletight -
fitting componentsis to design a vast seriesof feedbackloops that tell the
computerwhen it hasfailed to find a fit and get it to try again in a slightly
different way. The natural solution, however, is to mount the assembler
armsin a way that allows them to give along two spatial axes. Once this is
"
done, the parts simply slide into place just as if millions of tiny feedback
"
adjustmentsto a rigid systemwere being continuously computed ( Michie
and Johnston1984, 95).
A proponent of the ecological movement in psychology once wrote,
" ' ' "
Ask not what s inside your head but rather what your heads inside of
(Mace 1977, quoted in Michaels and Carello 1981). The positive advice, at
least, seemsreasonableenough. Evolution wants cheapand efficient solutions
to the problems of survival in a real, richly structured external environment
. It should come as no surprisethat we succeedin exploiting that
structureto our own ends. Justas the spongeaugmentsits pumping action
by ambient currents, intelligent systemsmay augment their information-
processingpower by devious use of external structures. Unlike the eco-
logical psychologists, we can ill afford to' ignore what goes on inside the
head. But we had better not ignore what s going on outside it either.
The moral, then, is to be suspiciousof the heuristic device of studying
intelligent systemsindependentlyof the complex structure of their natural
environment. To do so is to risk putting into the modeledheadwhat nature
leavesto the world . Classicalcognitivism, we shall later see, may be guilty
of just this cardinalsin.
4 GradualisticHolismand theHistoricalSnowball
A very powerful principle pervades the natural order. It is often cited,
though, asfar asI know, it is unbaptised. The principle was explicitly stated
by H. Simon (1962) and hassincebeenmentionedby a great"many writers.
The principle states that the evolution of a complex whole will generally
dependon its being built out of a combination of parts, eachof which has
itself evolved as a whole stable unit. The recursive application of such a
procedure enables us to account for the evolution .of complex wholes
without the explosive increasein improbability that would dog any claim
that such a complex evolved in a single step. For example, the one-step
evolution of such a structure as depided in figure 4.1 is much less likely
than the evolution of sucha structureif there are stableintermediateforms,
as illustrated in figure 4.2. The example is deliberately simplistic. But the
Biological constraints 67
60
Figure4.1
Complexst Ncturewith no simplerfonns
60 60
Figure4.2
fonns
Complexstructurewith stableintennediate
68 Chapter 4
requirement. Chanceand local factors will play some role at every stage
along the way. Sinceevery later developmentoccursin a spacedetermined
by the existing solutions (and materials), it is easy to seethat there will be
a snowballeffect. Every idiosyncrasyand arbitrarinessat stage51forms the
historical setting for a tinkering solution to a new problem at 52' As
complexity increasesand 51gives way to 5" , the solutions will come more
and more to depend on the particular history of the species.This may be
one reason why some evolutionary theorists (e.g ., Hull 1984) prefer to
regard a speciesas a historical individual picked out by the particular
circumstancesof its birth , upbringing, and culture, rather than as an instance
of a general, natural kind.
This historical snowballing effect, combined with the need to achieve
some workable total .system at each modification (holism), often makes
natural solutions rather opaque from a design-oriented perspective. We
have already seenone examplein the evolution of a breathing device from
a swim bladder. If we set out to build a breathing device from scratch, we
might, it seems,do a better job of it .
Two further examplesshould bring home the lesson. These examples,
beautifully describedby Dawkins (1986, 92- 95), concern the human eye
and the eye of the flatfish. The human eye, it seems, incorporatesa very
strangepiece of design. The light sensitive photoreceptor cells face away
from the light and areconnectedto the optic nerve by wires that facein the
direction of the light and passover the surfaceof the eye before disappearing
through a hole in the retina on their way to the brain. This is an odd
and seeminglyclumsy pieceof design. It is not sharedby all the eyes that
nature has evolved. An explanation of why vertebrate eyes are wired as
they are might be that the combination of some earlier historical situation
with the need to achievean immediateworking solution to someproblem
of sight forced the design choice. The wiring of the eye, then, has every
appearanceof being a kludge, a solution dictated by availablematerialsand
short term expediency. As a pieceof engineeringit may be neither elegant
nor optimal. But as a pieceof tinkering, it worked.
Dawkins' s secondexampleconcernsbony Aatfish, e.g., plaice, sole, and
halibut. All flatfish hug the seafloor. SomeAatfish, like skatesand rays, are
flattened elegantly along a horizontal axis. Not so the bony flatfish. It is
flattened along a vertical axis and hugs the seafloor by lying on its side.
This rather ad hoc solution to whatever problem forced thesefish to take
to the seabed must have raiseda certain difficulty . For one eye would then
be facing the bottom and would be of little use. The obvious tinkerer' s
solution is to gently twist the eye round to the other side. Overall, it is a
rather messy solution that clearly shows evolution favoring quick, cheap,
short-term solutions (to hug the sea bed, lie on your side) even though
they may give rise to subsequentdifficulties (the idle eye), which then get
72 4
Chapter
The list is odd in that some of the items (like emotional responseand
curiosity) seemeither too biological or too complex for current work to
address.Yet these, I suggest, are the building blocks of humancognition. It
seemspsychologically ill -advised to seekto model, say, natural language
understandingwithout attending to such issues. Paradoxically, then, the
protocognitive capacitieswe sharewith lower animals, and not the distinctive
human achievementssuch as chess-playing or story understanding,
afford, I believe, the best opportunities for design-oriented insight into
human psychology. This is not to say, of course, that all design-oriented
investigation into higher-level skills is unnecessary
, merely that it is insufficient
. Indeed, a design-oriented approachto the higher-level achievements
may be necessaryif we are to understand the nature of the task that
evolution may chooseto perform in somemore devious way. The general
point I have been stressmgis just that understandingthe natural solution
to an information-processing task may require attending at least to the
following set of biological motivated constraints:
. High value must be placed on robustnessand real-time sensory
processing(section 2).
. The environment should not be simplified to a point where it no
longer matters. Instead, it should be exploited to augment internal
processingoperations(section3).
. The general computational form of solutions to evolutionarily recent
information-processing demands should be sensitive to a requiremen
of continuity with the solutions to more basic problems
(section4).
5 TheMethodologyof MIND
The methodology of classicalcognitivism involved a valiant attempt to
illuminate the nature of thought without attending to such constraints.
There are many motivations for such a project, some highly respectable ,
somelessso. Among the respectablemotivations we find the belief that the
spaceof possibleminds far exceedsthe spaceof biologically possibleminds
and that investigating the larger, less-constrainedspaceis a key to understanding
the specialnature of the biological spaceitself (see, e.g., Sloman
1984). Also among the respectablemotivations we find the need to work
on isolable, tractable problem domains. It is thus clearly much easier to
work on a chess playing algorithm than to try to solve a myriad of
evolutionarily basic problems (vision, spatial skills, sensorimotor control,
etc.) in an integrated, robust, and flexible fashion. The evolutionary reflections
on the nature of human thought can only count againstthe direction
of work in the classicalcognitivist tradition if there is somerealisticalterna-
Biologicalconstraints 7S
tive approach. And some cognitive scientistsdo not seeany such alternative
(but seepart 2 for somereasonsfor optimism). It is also worth noting,
as stressedat the very beginning, that what I am calling the classicalcognitivist
tradition by no meansexhauststhe kinds of work alreadybeing done
even in what I shall later view as conventional artificial intelligence (the
intended contrast is with the PDP approachinvestigated in part 2). Thus,
early work on cybernetics, more recent work on low -level visual processing
, and some work in robotics can all be seen as attempting to do
somejustice to the kinds of biological constraintsjust detailed. To takejust
one recent example, Mike Brady of Oxford recently gave a talk in which
he explained his interest in work on autonomously guided vehicles as
rooted in the peculiartask demandsfacedby suchvehiclesas robot trucks
that must maneuver and survive in a real environment (see Brady et al.
1983). These included severetesting of modulesby the real environment,
three-dimensionalranging and sensing, real-time sensoryprocessing, data
fusion and integration from various sensors , and dealing with uncertain
information. Working on autonomously guided vehicles is clearly tantamount
to working on a kind of holistic animalmicro world: suchwork is
forced to respectmany (but not all) of the constraintsthat we saw would
apply to evolved biological systems.
Classicalcognitivism tries to make a virtue out of ignoring such constraints
. It concentrateson properties internal to the individual thinker,
paying at best lip serviceto the idea of processingthat exploits the world;
it seeksneat, design-oriented, mathematicallywell understoodsolutions to
its chosen problems; and it choosesits problems by fixating on various
interesting high-level human achievementslike consciousplanning, story
understanding, languageparsing, game playing, and so forth. Call this a
MIND methodology. MIND is a slightly forced acronymmeaning: focused
on Mature (i.e., evolutionarily recent) achievements ; seeking Intemalist
solutions to information processing problems (i.e. not exploiting the
world ); aimed at Neat (elegant, well-understood) solutions; and studying
systemsfrom an ahistorical, Design-orientedperspective . The methodology
of MIND thus involves looking at present human achievements , fixating
on various intuitively striking aspectsof those achievements(e.g., planning
, grammatical competence , creativity), then treating each such high-
level aspectas a separatedomain of study in which to seekneat, internalist,
design-oriented solutions, and hoping eventually to integrate the results
into a useful understandingof human thought. This general strategy is
reflectedin the plan of AI textbooks, which will typically featureindividual
chapterson, e.g., vision, parsing, search, logic, memory, uncertainty, planning
, and learning (this is the layout of Charniak and McDermott "1985).
Our earlier reflections(sections2 to 4) already give us causeto doubt the
76 4
Chapter
Figure4.3
of SaintMark's. Reproduced
Oneof the spandrels from GouldandLewontin
by pennission
1978, 582
Biological constraints
Figure 4.4
The ceiling of King' s College chapel Reproducedby permission&om Gould and Lewontin
1978, 583
80 Chapter 4
2 TheSpacebetween theNotes
'
The musicians talent, it is sometimessaid, lies not in playing the notes but
in spacingthem. It is the silencesthat makesthe great musiciangreat. As it
is with music, so it is with connectionism. The power of a connectionist
system lies not in the individual units (which are very simple processors )
but in the subtly crafted connedions between them. In this sensesuch
models may be said to be examplesof a brain' s eye view. For it has long
been known that the brain is composedof many units (neurons) linked in
parallelby a vast and intricate massof jundions (synapses ). Somehowthis
mixture of relatively simple units and ~omplex interconnectionsresults in
the most powerful computing machinesnow known, biological organisms.
Work in paralleldistributed processingmay be said to be neurally inspired
in the limited sensethat it too deploys' simple processorslinked in parallel
in intricate ways. Beyond that the differencesare significant. Neurons and
synapsesare of many different types, with properties and complexitiesof
interconnectionso far untouchedin connectionistwork. The POP " neuron"
is a vast simplification. Indeed, it is often unclearwhether a single POPunit
correspondsin any useful way to a single neuron. It may often correspond
to the summedactivity of a group of neurons. Despite all the differences ,
however, it remainstrue that connectionistwork is closer to neurophysio-
logical structure than are other styles of computational modeling (see
Durham 1987; McClelland, Rumelhart, and the POP ResearchGroup 1986,
vol. 2, chapters20- 23).
Neurally inspired theorizing has an interesting past. In one senseit is a
descendantof gestalt theory in psychology (see Kohler 1929; Baddeley
1987). In anothermore obvious senseit follows the path of cybernetics, the
study of self-regulating systems. Within cybernetics the most obvious
antecedentsof connedionist work are McCulloch and Pitts 1943; Hebb
1949; and Rosenblatt 1962. McCulloch and Pitts demonstratedthat an
idealizednet of neuronlike elementswith excitory and inhibitory linkages
could compute the logical functions and, or, and not. Standardresults in
logic show that this is sufficient to model any logical expression. Hebb
went on to suggestthat simpleconnectionistnetworks canact asapattern-
associatingmemory and that such networks can teach themselveshow to
weight the linkagesbetween units so as' to take an input pattern and give
a desiredoutput pattern. Roughly, Hebb s learning rule was just that if two
units are simultaneouslyexcited, increasethe strength of the connection
between them (see McClelland, Rumelhart, and the POP ResearchGroup
1986, vol. 1, p. 36, for a brief discussion). This simple rule (combinedwith
an obvious inhibitory variant) is not, however, as powerful as those used
'
by modem day connectionists. Moreover, Hebb s rules were not sufficiently
rigorously expressedto use in working models.
Paralleldistributed processing 8S
'
This deficiency was remedled by Rosenblatts work on the so-called
perceptron. A perceptron is a small network of input '
units connectedvia
somemediating units to an output unit. Rosenblatts work was especially
important in three ways, two of them good, one disastrous. The two good
things were the use of precise, formal mathematicalanalysisof the power
of the networks and the use of digital-computer simulationsof such networks
(see McClelland, Rumelhart, and the POP ResearchGroup 1986,
vol. 1, p. 154- 156). The disastrousthing was that someoverambitiousand
politically ill advised rhetoric polarized the AI community. The rhetoric
elevatedthe humble perceptron to the sole and sufficientmeansof creating
real thought in a computer. Only by simulating perceptrons, Rosenblatt
thought, could a machinemodel the depth and originality of humanthought.
'
This claimand the generalevangelismof Rosenblatts approachprompted
a backlashfrom Minsky and Papert. Their work Perceptrons (1969) was
received by the alienated AI community as a decisive debunking of the
usurping perceptrons. With the rigorous mathematicalanalysis of linear
threshold functions Minsky and Papert showed that the combinatorial
explosion of the amount of time neededto learn to solve certain problems
underminedthe practical capacity of perceptronlikenetworks to undergo
suchlearning. And they further showed that for someproblemsno simple
perceptron approachcould generatea solution. Rather than taking these
resultsas simply showing the limits of one type of connectionistapproach,
'
Minsky and Paperts work (which was as rhetorically excessiveas Rosen-
'
blatt s own) was seen as effectively burying connectionism. It would be
someyearsbefore its public resurrection.
But the miracle happened. A recent three-page advertisement in a
leading sciencejournal extols the slug as savant, claiming that the parallel
neural networks of the slug suggest powerful new kinds of computer
design. The designs the advertisershave in mind are quite clearly based
on the work of a recent wave of connectionists who found ways to
overcomemany of the problems and limitations of the linear-thresholded
architecturesof perceptrons. Landmarksin the rise of connectionisminclude
Hinton and Anderson 1981; McClelland and Rumelhart 1985a; and
McClelland, Rumelhart, and the POP ResearchGroup 1986. Other big
namesin the field includeJ. Feldman , D. Ballard
, P. Smolensky , T. Sejnowski,
and D. Zipser. It would be foolhardy to attempt a thorough survey of this
extensive and growing literature here. Instead, I shall try to convey the
flavor of the approaches by focusing on a few examples. Thesehave been
chosento display assimply aspossiblesomebasicstrategiesand properties
common to a large classof connectionistmodels. The precisealgorithmic
form of suchmodelsvaries extensively. The emergent propertiesassociated
with the general class of models bear the philosophical and biological
weight. This is reflectedin the discussionthat follows.
86 ChapterS
Figure5.1
A localhardwirednetwork
. FromMcClelland
, Rumelhart
, andHinton 1986, 28
Figure5.2
Inhibitorylinks
tween each burglar unit and the thirties unit. If, in addition, only
burglars are in their thirties, the thirties unit would be excitatorily
linked to the units representingburglars.
. Solid black spots signify individuals and are connectedby excitatory
links to the properties that the individual has, e.g ., one suchunit
is linked to units representingLance, twenties, burglar, single, Jet, and
junior-high-school education.
By storing the data in this way, the system is able to buy, at very little
computational cost, the following useful properties: content addressable
memory, graceful degradation, default assignment, and generalization. I
shall discusseachof thesein turn.
Contentaddressable memory
Consider the information that the network encodesabout Rick. Rick is a
divorced, high-school educatedburglar in his thirties. In a more conventional
approachthis informationwould be storedat one or severaladdress es,
with retrieval dependentupon knowing the address. But a designermay
want to make all this information accessibleby any reasonableroute. For
example, you may know only that you want data on a Sharkin his thirties,
or you may have a description that is adequateto identify a unique individual
but neverthelesscontainssomeerrors. Suchflexible (and in this case
error-tolerant) accessto stored information is known as content addressable
'
memory. Humanscertainly have it. To borrow McClellandand Rumelharts
"
lovely example, we caneasily find the item that satisfiesthe description: is
an actor, is intelligent, is a politician," despite the meagre and perhaps
partially falsedescription. Flexible, error-tolerant accessrequiressomecom-
putational acrobaticsin a conventional system. In the absence of errors, a
techniquecalledhash coding is quite efficient (seeKnuth 1973). The error-
tolerant case, however, requires an expensivebest-match search. Storing
the information in a network of the kind just describedis a very natural,
fast, and relatively cheapway of achieving the sameresult.
Paralleldistributed processing 89
Figure5.3
. &mburst=
Thepatternof activationfor a Sharkin histhirties.Hatching= inputactivations
units to which activationspreads. The diagramis basedon McClelland , and
, Rumelhart
Hinton 1986, p. 28, fig. 11.
Gracefuldegradation
, aswe sawin chapter4, comesin two, relatedvarieties.
Gracefuldegradation
The first demandsthat a system be capableof sustainingsome hardware
90 Chapter 5
Defaultassignment '
Supposethat you don t know that Lancewas a burglar. But you do know
that most of the junior-high-school-educated Jets in their twenties are
burglarsrather than bookies or pushers(seethe data table in figure 5.2). It
Paralleldistributed processing 91
Figure5.4
The patternof activationfor a Jet bookie with a junior-high-schooleducation . A unit
labeledwith a gang member ' s initial standsfor that individual. The input patternsare
markedwith hatching. The stronglyactivatedindividualunit is markedwith an x, andthe
nameunit it excitesis markedwith a sunburst . The diagramis basedon McClelland ,
Rumelhart, andHinton 1986, p. 28, fig. 11.
4 EmergentSchemata
In chapter2 (especiallysections2 and 3) we ,spokeof scriptsand schemata .
These are special data structures that encode the stereotypic items or
'
eventsassociatedwith somedescription. For example, Schankand Abelsons
restaurantscript held data on the course of events involved in a typical
Paralleldistributed processing 93
to how sensibleits stored knowledge is) even for the input pattern "bed,
bath, refrigerator." To show this in action, McClelland, Rumelhart, et al.
"
describea casewhere bed, sofa" is given as the input pattern. The system
then setsabout trying to find a pattern of activation for the rest of its units
that respects, as far as possible, all the various constraintsa$sodated with
the presenceof a bed and a sofa simultaneously. What we want' is a set of
values that allows the final active schemato be sensitive to the effectsof
the presenceof a sofa on our idea of typical bedroom. In effect, we are
asking for an unantidpated schema : a typical bedroom-with -a-sofa-in-it.
'
The systems attempt is shown in figure 5.5. It ends up having chosen
" "
large as the" size description (whereasin its basic bedroom scenariothe
size is set at medium" ) and having added " easy-chair." " floor lamp," and
" "
fireplace to the pattern. This seemslike a sound choice. It has added a
subpattern strongly associatedwith sofa to its bedroom schemaand has
adjustedthe size of the room accordingly.
Here, then, we have a concrete example of the kind of flexible and
sensibledeployment of cognitive resourcesthat characterizesnatural intelligence
. Emergentschemataobviate the need to decide in advancewhat
possiblesituations the system will need to cope with , and they allow it to
adjust its default values in a way maximally consistent with the range of
' '
possibleinputs. Justthis kind of natural flexibility and infonnationalholism
constitutes, I believe, a main qualitative advantageof POPapproaches over
their conventionalcousins.
5 DistributedMemory
.
. . . . ~v@n
. . .
. . .
- - - - - - - - - - - - - - -
-;-
. . . .
0000000000000000000
..000000000000
0000000000000000000
0000000000000
000000000000
000000000000
.0000000000000000000
.....
.aDa ..aa D
.Daa
.D.DaD .D
.D.D~
..Daa .aaOOD
..000000000000000000
.DaD .D.D.D.~D.
DDDDDDDDDDDDDDDDDDDD
. . . . . . . .
..CCCCCCCCCDDD
CCCCCCCCCCDD
CCCCCCCCCOOC
O DO
ODD
OO
OO
DDoooooaca
FigureS.S
Theoutputof theroomnetworkwith bed , sofa,andailing initially clamped. Theresultmay
be desaibedasa large, fancybedroom . White boxesindicateactiveunits. The verticalsets
of suchboxes, reading&om left to right, indicatethe successive statesof activationof the
network. The startingstate(far left) hasthe bed, sofa, andceilingunitsactive. Thefigur~ is
&om Rumelhart , Smolensky, et aI., 1986, 34.
J
98 ChapterS
. the use of learning rules in POP,
. the economy of POP " "
storage(so-called superpositionalstorage ),
. the POP capacity to mimic the
explicit storageof rules, prototypes,
and so on,
. the relation of POP to
experimentalpsychologicaldata, and
. the limits of networks without hidden units.
The model that McClelland and Rumelhart propose is a fairly standard
POP network of the kind discussedabove. The network is exposedto successive
setsof inputs given in terms of a fixed set of representationalprimitives
, i.e., a fixed set of featuresto which (on this interpretation) its units (or
sets of units) are seenas sensitive. Suchfeaturesmay include visual ones,
like color or size, and nonvisual ones, like names. As a model of memory
the task of the system is just this: given someinput with featuresh , . . . , f,.
(say, h , . . . , ho for definiteness ), the system needsto store the input in a
way that enablesit later to re-createthe input pattern from a fragment of it
acting as a cue. Thus, if the systemis given valuesfor h , . . . , 14' we want it
to fill in Is, . . . , ho with values somehow appropriate to its earlier experience
in which h , . . . , h 0 were active. A simple learning rule, called the
delta rule, suffices to produce this kind of behavior. The delta rule is
explainedformally in McClelland, Rumelhart, et al. 1986, vol. 1, chapters2,
8, 11, and 2, chapter 17. Informally, it works like this. Getting a systemto
re-createan earlier activation patternh , . . . , h 0 when given the fragmenth ,
. . . , 14amountsto requiring that the internalconnedions betweenthe units
in the net be fixed so that activation of the fragment h , . . . , 14 causes
activation of the rest of the patternh , . . . , ho . So we needstrong excitatory
links between h , . . . , 14 and the units Is , . . . , ho . After the system has
receiveda teachinginput of h , . . . , h 0 the delta rule simply gets the system
to check whether the internal connedions between the units that were
active would support sucha re-creation. If they would not - as is generally
the case- it modifies the overall pattern of connedivity in the required
diredion . The delta rule is strong enough to guarantee that subject to
certain caveats (more on which later) a system following it will learn
to re-create the pattern over its units from a suitable fragment of that
pattern.
To get a better idea of how this looks in action, let's considerMcClelland
'
and Rumelharts exampleof learning about dogs. The exampleis in many
ways similar to previous ones, but it will help bring out the main points
nevertheless . First, circumscribethe domain by fixing on a prototypical
dog. Take a picture of this, and describe it in terms of a fixed set of
representationalprimitives, say, sixteen features. Next , create a set of
specificdog descriptionsnone of which quite matchesthe prototype . These
Paralleldistributed processing 99
output unit
units
input
An5.6 or
Figure network
inclusive
unit
output
unit
hidden
Figure5.7
, Hinton, andWilliams
A simpleexclusiveor networkwith onehiddenunit. FromRumelhart
, 321.
1986
look very much as if POP goes at least someway to finding the domesand
'
archesout of which the spandrelsvisible to the mind s eye emerge. Many
of the distinctions drawn at the upper level may still reflect our proper
epistemologicalinterestswithout marking any differencesin the underlying
computational form or activity of the system. Thus, we distinguish, say,
memory and confabulation. But both may involve only the samepattern
completing capacity. The difference lies, not in the computational substructure
, but in the relation of that structure to states of the world that
impinged on the knower. (For a discussionof how POP from an internal
perspective blurs the distinction between memory and confabulation,
see McClelland, Rumelhart, and the POP ResearchGroup 1986, vol. 1,
p. 80 - 81.)
Obviously, there is considerableand biologically attractive economy
about all this. The use of one underlying algorithmic form to achieve so
many high-level goals and the superpositional storage of data should
appeal to a thrifty Mother Nature. And the double robustnessof such
systems( being capableof acting sensibly on partially incorrect data and
capableof surviving some hardware damage ) is a definite natural asset.
' use of its stored
Finally, the sheer flexibility of the system s knowledge
constitutesa major biological advantage(and philosophicaladvantagetoo,
as we shall see). The capacity sensibly to deploy stored knowledge in a
way highly sensitiveto particular (perhapsnovel) current needswill surely
be a hallmarkof any genuineintelligence.
In sum , the kind of approachdetailed in the present chapter seemsto
offer a real alternative (or complement) to MIND -style theorizing. What
makesPOP approaches biologically attractive is not merely their neuro-
physiological plausibility, as many seem to believe. They also begin to
meet a seriesof more gel:leral constraints on any biologically and evolu-
Honarily plausiblemodel of human intelligence. POP approaches may not
be the only kind of computationalapproachcapableof meeting such constraints
. But work within the tradition of semanticallytransparentsystems
has so far failed to do so and instead has produced fragile, inflexible
systemsof uncertainneural plausibility.
Chapter6
InformationalHolism
1 In Praiseof Indiscretion
Discretion, or at any rate discreteness , is not always a virtue. In the
previouschapterwe saw how a profoundly nondiscrete, paralleldistributed
encoding of room knowledge provided for rich and flexible behavior. The
systemsupporteda finely gradatedset of potential, emergentschemata . In
a very real sense , that system had no single or literal idea of the meaning
"
of, say, bedroom." Instead, its ideasabout what a bedroom is are inextricably
tied up with its ideasabout rooms and contents in generaland with
the particular context in which it is prompted to generate a bedroom
schema.The upshot of this is the double- edgedcapacity to shadeor alter
its representaHon of bedroomsalong a whole conHnuumof usesinformed
by the rest of its knowledge (this double- edgedcapacity is attractive yet
occasionally problemaHc- see chapter 9). This feature, which I shall call
the " informa Honalholism" of POP, consHtutes a major quaiitaHve difference
between POP and more convenHonal approaches. This differenceis
Hghtly linked to the way in which POP systemswill fail to be semantically
transparent , in the senseoutlined in chapterone.
The present chapter has two goals. First, to elaborateon the nature of
informaHonal holism in POP. Second, to discusswhat conceptualrelaHon
parallel distributed encoding bears to the more general phenomenonof
informaHonalholism. On the latter issue, opHonsrangefrom the very weak
(PDP sustainssuch holism, but so can work in the STS paradigm) to the
very strong (only POP can support suchholism). The truth, as ever, seems
to lie somewherein between.
Table6.1
~ ~of
Miaofeatures ~~- of
~ a~model -- case
~---role
~-~- usi ent
Feature Values
volume smaILmedium , large
pointiness pointed , rounded
breakability fragile, unbreakable
, hard
softness soft
Informational holism 109
esting upshot here is the lack of any ultimate distinction between meta-
phorical and literal usesof language. There may be central usesof a word,
and other usesmay sharelessand lessof the featuresof the centraluse. But
there would be no inn, God-given line between literal and metaphorical
meanings; the metaphoricalcaseswould simply occupy far-flung comersof
a semantic-state space. There would remainvery real problemsconcerning
how we latch on to just the relevantcommon features in understanding
a metaphor. But it begins to look as if we might now avoid the kind of
cognitivist model in which understandingmetaphor is treated as the computation
of a nonliteral meaningfrom a stored literal meaningaccordingto
high-level rules and heuristics. Metaphorical understanding, on the present
model, is just a limiting caseof the flexible, organic kind of understanding
involved in normal sentencecomprehension . It is not the icing on the cake.
It is in the cakemix itself.1
So far, then, we have seenhow the informational holism of distributed
models enablesthem to support the representationof subtle gradationsof
meaningwithout needing to anticipateeach suchgradation in advanceor
dedicate separatechunks of memory to each reading. And we also saw
hints of how this might undenninethe rigidity of somestandardlinguistic
categorieslike metaphoricalversusliteral use. One other interesting aspect
of this informationalholism concernslearning. The systemlearnsby altering
its connectivity strengths to enable it to re-create patterns in its inputs.
At no stagein this processdoes it generateand store explicit rules stating
how to go on. Superpositionalstorageof data meansthat as it learnsabout
one thing, its knowledge of much elseis automaticallyaffectedto a greater
or lesserdegree. In effect, it learnsgradually and widely . It doesnot (to use
the simplified model described) learn all about bats, then separatelyabout
balls, and so on. Rather it learns about all these things all at once, by
example, without formulating explicit hypotheses. And what it can be said
to know about each is informed by what it knows about the rest- recall
the caseof its knowledge that only hard things breakthings.
In sum, POP approaches buy you a profoundly holistic mode of data
storage and retrieval that supports the shading of meaningsand allows
gradual learning to occur without the constant generationand revising of
explicit rules or hypothesesabout the pattern of regularitiesin the domain.
3 SymbolicFlexibility
Smolensky(1987) usefully describesPOP models as working in what he
calls the subsymbolicparadigm. In the subsymbolicparadigm, cognition is
not modeled by the manipulation of machinestatesthat neatly match (or
stand for) our daily, symbolic descriptionsof mental statesand processes.
Rather, thesehigh-level descriptions(he cites goals, concepts, knowledge,
112 6
Chapter
4 Gradesof SemanticTransparency
" "
between the conceptualentity ( kitchen etc.) and the microfeaturesin the
" "
network. In a treatmentwith many microfeatures , where suchitemsas bed
" "
and sofa are merely approximatetop-level labelsfor subtle and context-
sensitivecomplexes of geometricand functionalproperties, the distancewill
be great indeed, and a conceptualmodel only a very high approximation.
Another route to the approximation claim is to regard the classical
accountsas describingthe competence of a system, i.e., its capacityto solve
-
a certain range of well posed problems (see Smolensky 1988, 19). In
idealized conditions (sufficient input data, unlimited processingtime) the
POP system will match the behavior speci6edby the competencetheory
" '"
(e.g ., settling into a standardkitchen schemaon being given oven and
" " -
ceiling as input). But outside that idealiseddomain of well posed problems
and limitless processingtime, the performance of a POP system will
diverge from the predictions of the competencetheory in a pleasingway.
It will give sensibleresponseseven on receipt of degradeddata or under
severetime constraints.This is becausealthoughdescribablein that idealized
caseas satisfying hard constraints, the system may actually operate by
satisfying a multitude of soft constraints. Smolenskyhere introduces an
analogy with Newtonian mechanics . The physical world is a quantum
system that looks Newtonian under certain conditions. Likewise with the
cognitive system. It looksincreasingly classicalas we approachthe level of
consciousrule following . But in fact , according to Smolensky, it is a POP
systemthrough and through .
'
In the samespirit Rumelhartand McClellandsuggest: it might be argued
that conventional symbol processing models are macroscopicaccounts,
analogousto Newtonian mechanics , whereasour modelsoffer more microscopic
accounts, analogous to quantum theory. . . . Through a thorough
understandingof the relationship between the Newtonian mechanicsand
quantum theory we can understand that the macroscopiclevel of description "
may be only an approximationto the more microscopic theory
(Rumelhartand McClelland 1986, 125). To illustrate this point, considera
simple exampledue to Paul Smolensky . Imagine that the cognitive task to
.
be modeled involves answeringqualitative questionson the behavior of a
particular electrical circuit. ( The restriction to a single circuit may appall
classicists , although it is defendedby Smolenskyon the grounds that a
small number of such representationsmay act as the chunks utilized in
-
generalpurposeexpertise seeSmolensky1986, 241.)' Given a description
of the circuit, an expert can answer questions like if we increasethe
resistanceat a certain point, what effect will that have"on the voltage, i.e.,
will the voltage increase , decrease , or remain the same?
-theoretic speci6-
Suppose , as seems likely , that a high-level competence
cation of the information to be drawn on by an algorithm tailored to
answerthis questioncites various laws of circuitry in its derivations (what
116 Chapter 6
" " ' '
Smolenskyrefersto as the hard laws of circuitry: Ohm s law and Kirchoffs
'
law). For example, derivations involving Ohm s law would invoke the
equation
voltage = current x resistance.
How does this description relate to the actual processingof the system?
The model representsthe stateof the circuit by a pattern of activity over a
set of feature units. These encode the qualitative changesfound in the
circuit variables in training instances. They encode whether the overall
voltage falls, rises, or remainsthe samewhen the resistanceat a certainpoint
goes
" up. Thesefeature"
units are connectedto a set of what Smolenskycalls
knowledge atoms , which representpatterns of activity acrosssubsetsof
the feature units. These in fact encode the legal combinations of feature
states allowed by the actual laws of circuitry. Thus, for example, "The
' '
systems mowledge of Ohm s law . . . is distributed over the many mowl -
edge atoms whose subpatterns "
encodethe legal feature combinationsfor
current, voltage and resistance (Smolensky1988, 19). In short, there is a
subpatternfor every legal combination of qualitative changes(65 subpatterns
, or mowledge atoms, for the circuit in question).
At first sight, it might seem that the system is merely a units-and-
connectionsimplementation of a lookup table. But that is not so. In fact,
connectionistnetworks act as lookup tables only when they are provided
with an overabundanceof hidden units and hence can simply memorize
input-output pairings"
. By contrast, the system in question encodeswhat
"
Smolenskyterms soft constraints, i.e., patterns of relations that usually
obtain betweenthe various featureunits (microfeatures). It thus hasgeneral
mowledge of qualitative relations among circuit microfeatures.But it does
nothave the generalmowledge encapsulatedin hardconstraintslike Ohm's
law. The soft constraints are two -way connectionsbetween feature units
and mowledge atoms, which inclinethe network one way or another but
do not compelit , that is, they can be overwhelmedby the activity of other
'
units (that s why they are soft). And as in all connectionistnetworks, the
systemcomputesby trying simultaneouslyto satisfy as many of thesesoft
constraints as it can. To see that it is not a mere lookup tree of legal
combinations, we need only note that it is capable of giving sensible
answersto (inconsistentor incomplete) questionsthat. have no answer in
a simple lookup table of legal combinations.
The soft constraintsare numericallyencodedasweighted inter-unit con-
r,ection strengths. Problem solving is thus achievedby " a seriesof many
node updates, each of which is a microdecision basedon formal numerical
rules and numericalcomputations" (Smolensky1986, 246).
The network has two properties of specialinterest to us. First, it can be
shown that if it is given a well-posed problem and unlimited processing
Informational holism 117
time, it will always give the correct answeras predicted by the hard laws
of circuitry. But, asalreadyremarked, it is by no meansbound by suchlaws.
Give it an ill -posed or inconsistentproblem, and it will satisfy as many as
it can of the soft constraints (which are all it really knows about). Thus,
"
outside of the idealised domain of well-posed problems and unlimited
"
processingtime, the system ' gives sensibleperformance (Smolensky1988,
19). The hard rules (Ohm s law, etc.) can thus be viewed as an external
theorist' s characterizationof an idealizedsubsetof its actualperformance(it
is no accidentif this brings to mind Dennett' s claimsabout the intentional
stance- seeDennett 1981).
Second, the network exhibits interesting serialbehavior as it repeatedly
tries to satisfy all the soft constraints. This serial behavior is characterized
by Smolenskyasa set of macrodedsions eachof which amountsto acommitment
of part of the network to a portion of the solution." Thesemacrodecisions
"
, Smolenskynotes, are approximately like the firing of production
rules. In fact, these productions' 'fire' in essentiallythe sameorder as in a
'
"
symbolic forward- chaininginferencesystem (Smolensky1988, 19). Thus,
the network will look as if it is sensitiveto hard, symbolic rules at quite a
Anegrain of description. It will not simplysolve the problem " in extension"
as if it knew hard rules. Even the stagesof problem solving may look as if
'
they are causedby the systems running a processinganalogueof the steps
in the symbolic derivations availablein the competencetheory.
But the appearanceis an illusion. The system has no knowledge of the
objects mentioned in the hard rules. For example, there is no neat subpattern
of units that can be seento stand for the generalidea of resistance ,
'
which figuresin Ohm s law. Instead, somesetsof units standfor resistance
at Rl ' and other sets for resistanceat R2. In more complex networks the
coalitions of units that, when active, stand in for a top-level concept like
resistanceare, as we saw, highly context-sensitive. That is, they vary
'
according to context of occurrence. Thus, to use Smolenskys own example
, the representationof coffee in such a network would not consist of
a single recurrent syntactic item but a coalition of smaller items (microfeatures
) that shift according to context. Coffee in the context of a cup
may be representedby a coalition that includes the features(liquid) and
(contacting-procelain). Coffeein the context of jar may include the features
'i
(granule) and' (contacting-glass ). There is thus only an approximateequivalence
of the coffeevectors acrosscontexts, unlike the " exactequivalence
' "
of the coffee tokens across different contexts in a symbolic processing
"
system ( 1988, 17). By thus replacing the conceptualsymbol
" " Smolensky
coffee with a shifting coalition of microfeatures , the so-called dimension
shift, suchsystemsdeprive themselvesof the structuredmental representations
deployedin both a classicalcompetencetheory and a classicalsymbol-
processingaccount (level 2). Likewise, in the simple network described,
118 6
Chapter
5 UnderpinningSymbolicFlexibility
POP approaches, we saw, are well suited to modeling the kind of symbolic
flexibility associatedwith human understanding. But what kind of relation
does the locution " well suited" so comfortingly gloss1 Here are some
possibilities.
(1) Only a POP style substructurecan support the kind of flexible
understandingrequired(uniqueness ).
2
( ) Maybe non-POP systems can do the job . But semanticallytransparent
es
approach simply cannot achieve such flexibility (qualified
liberalism).
(3) There is nothing specialabout a POP capacity to support flexible
understanding(unqualifiedliberalism).
Justwhich option we choosemust depend, in part, on what we understand
" "
by a POP approach. We could mean:
(a) A model based finnly on one of the current algorithmic forms
deployed by McClelland, Rumelhart, and the POP ResearchGroup
Informational holism 119
Searches
Even the most conventional work in AI acceptsa relationship between
degree of intelligence and efficiency of search. The point of heuristic
searches (chapter 1, section 4) is precisely to increasethe intelligence
of the
systemby reducing the extent of the searchspaceit must traverseto solve
a particularproblem. Someproblemsrequire Anding the best way of simultaneousl
satisfying a large number of soft constraints. In suchcasesPOP,
we saw, provides an elegant solution to the problem of efficient, content-
sensitive searches. A searchis driven by partial descriptionsand involves
simultaneouslyapplying competing hypothesesagainst one another. It is
not dependant on fully accurate or even consistent information in the
partial description. POP methods thus introduce a qualitatively different
kind of searchinto AI . This kind of searchmay not alwaysbe the best or
most efficient option; as always, it all dependson the nature of the space
.
involved.
Representation
We have seenhow POP models (or those POP models which interest us)
employ distributed representationsin superpositionalstorage. The deepin-
formaHonalholism of POPdependson this fact. Besidesexplainingshadings
of meaning, discussedabove, this mode of representaHon has important
side effects. Some of these were discussedinchapterS (e.g ., graceful
degradaHon, generalizaHon, content-addressibleretrieval). One very important
side effect that I have not yet fully discussedis crosstalk.
Cross talk is a disHncHve POP pathology that occurs when a single
network processes mulHple patterns with common features. Becausethe
patterns have common features, there is a tendency for the overall pattern
of activaHon appropriate to one pattern to suffer interference from the
other pattern becausethe system tries simultaneously to complete the
other pattern. At Hmes I have praised this property by calling it " free
generalization." When it occursin somecontexts, however, it is a sourceof
Infonnational holism 123
7 An EquivalenceClass of Algorithms
"
Call this the equivalence -class conjecture." Roughly, it states that POP
descriptionscould in principle be used to specify an equivalenceclassof
algorithmic forms, and deploying some memberof that classis a constitutive
requirementof bona fide thinking. It will be seenthat this conjecture
consists essentially in a combination of the uniquenessclaim (rejected
earlier) and the claim that the kind of flexible, context-sensitiveresponse
madepossibleby a POP architectureis essentialto thinking.
The latter claim, it seemsto me, is well founded. Many philosophical
worries about creating artificial intelligence were based precisely on an
observedlack of flexible, commonsensedeployment of stored information
and on a lack of fluid, analogicalreasoning.( Recallthe discussionof Dreyfus
in chapter 2, and seeHofstadter 1985.) Likewise, the capacityto shadethe
meaningof various high level conceptsand the tendencyto learn concepts
in large, mutually dependentgroups may reasonably be seenas part of the
very idea of graspinga concept. The featuresthat POPwork seemscapable
of supporting in time may thus be philosophically significant parts of an
accountof understanding.
It might be argued that this is to give our notion of understandingan
overly anthropocentrictwist. Humansand higher animalsmay well exhibit
the featuresmentioned. But why should it be requiredof all understanders
that they conform to our model? To this line of thought I can only reply
that the concept of understandingwas formed to describethe behavior of
humansand higher animals. It neednot be indefensiblyanthropocentricto
believe that certain features of these uses must carry over to any case
where the conceptis properly used.
Suppose, then, that we agree that essentialto understandingare some
of the featuresthat a POP approachlooks well suited to underpinning. For
its truth the equivalence -classconjecturewould then require, that we find
conceptual reasons why only membersof the POP classof modelscan support
suchfeatures. It is this part of the conjecturethat looks insupportable.
Conceptualties may be felt to exist between, say, the use of distributed
representationswith a microfeaturesemanticsand the kind of deep holism
and flexible understandingwe require. But this alone is insufficientto count
as a POP model according to our definition (section 4 above). What we
cannot rule out in advanceis the possibility of some as yet undiscovered
but distinctly non-POP architecturesupporting distributed representations
and superpositionalstorageand henceproviding all the requiredfeaturesin
a new way. If this were to prove possible" we would be in the midway
position describedin chapter 3, section 7, that seemsto blur the divide
betweenthe constitutive and the causal. The dreamthat POP can specify a
classof algorithms essentialto thinking is preciselythe dream of finding a
formal unity among the set of substructurescapableof supporting thought.
Of course, it may turn out that the only physicallypossibleway of achieving
126 6
Chapter
distributedrepresentationand superpositionalstorageto the requireddegree
involves the use of a (possibly virtual) value-passingnetwork. If we could
seejust why this should be so (as a result of, say, physical limitations on
processingimposed by the speedof light ), we would have quite a strong
result. The POP substructurewould be revealed as a naturally necessary
condition for the flexible behavior (including linguistic behavior) that is
conceptuallyessentialto the ascription of thoughts.
Briefly, the various possibilitieslook like this.
.
. The weakest claim is that a POP substructure
interesting supports
flexible understanding and behavior, and these are essentialfor an
ascription of understanding.
. The intermediateclaim is that a POP substructureis naturally necessary
for flexible understandingand behavior, and these are essential
for an ascription of understanding.
. The strong claim is that a POP substructureis
required on conceptual
grounds (i.e., of
independently physical limitations like the speed
of light ) for flexible understandingand behavior, and theseare essential
for an ascription of understanding.
I believe it is too early to try to force a choice between theseclaims. As
we saw, it seemsthat the strongest claim presently eludesus. And even
the intermediateclaim is suspect. That said, I believe philosophicalinterest
still attachesto the POP appraoch, since it begins to demonstratehow a
physical system is at least able to support various features that must be
exhibited by any system that warrantsdescriptionin a mentalisticvocabulary
. There are those whose idea of philosophicalinterest would placesuch
a result outside the sphereof their proper concerns. For such philosophers
nothing short of the truth of the equivalence -classconjecturewould motivate
a claim of philosophical significance.Here we must simply differ and
be content to get as clear as we can both about the possible relations
between POP models and thoughts and about the grounds for expecting
2
any particular relation to hold.
Chapter 7
The Multiplicity of Mind : A Limited Defence of
ClassicalCognitivism
1 Of Cloudsand ClassicalCognitivism
Every silver lining has a cloud, and POP is no exception. There are classes
of problemsto which POP approaches are apparently ill suited. Theseinclude
the serial-reasoningtasksof logical inference, the temporal-reasoning
tasksof consciousplanning, and perhapsthe systematic-generativetasksof
'
languageproduction. POP seemsto be natures gift to pattern recognition
tasks, low -level vision, and motor control.! But as we proceed to higher,
more-abstract tasks, the POP approach becomes less and less easy to
employ. This is, of course, just what we would expect on the basisof our
earlierconjectures.A POP architecturemay have beenselectedto facilitate
carrying out evolutionarily basic tasks involving multiple simultaneous
satisfactionof soft constraints. Vision and sensorimotorcontrol are prime
examplesof suchtasks. Other tasks- especially the relatively recenthuman
achievementsthat classiccognitivism focuseson- involve complexsequential
operationsthat may requirea systemto follow explicit rules. Conscious
reasoningabout chessplaying, logic, and consciousattempts to learn to
drive a car are .examplesof such tasks. Where the conscious-reasoning
aspectsof such tasksare concerned, the standardarchitectureof classical -
cognitivist models offers an excellent, design-oriented aid to their solution.
In these models an explicitly programmed CPU (central processingunit)
performs sequentialoperations on symbolic items lifted out of memory.
The architectureis perfectly suited to the sequentialapplication of explicit
rules to an ordered seriesof symbol strings.
Suchsequential,rule-following acrobaticsare not the forte of POP. They
may not be beyond the reachof a POPapproach, but they certainly do not
come naturally to it. It may be significant to notice, however, that these
'
sequentialrule-following tasks arent our forte either. They are the tasks,
that humanbeingsfind hardest, the oneswe tend to fail at. And often, after
we ceaseto find them hard (after we are good at chessor logic or driving
a car), we also ceaseto have the phenomenalexperienceof consciouslyand
sequentiallyfollowing rules as we perform them.
128 Chapter
7
2 Against Uniformity
All too often, the debate between the proponents and doubters of POP
approaches assumesthe aspectof a holy war. One reasonfor this, I suspect,
is an implicit adherenceto what I shallcall the generalversion of the uniformity
assumption. It may be put like this:
Everycognitive achievementis psychologically explicableusing only
the formal apparatusof a single computationalarchitecture.
I shall say more about the terms of this assumptionin due course. The
essentialidea is easy enough. Some classicalcogniHvists believe that all
cognitive phenomenacan be explainedby modelswith a single set of basic
types of operation (seethe discussionin chapter 1, section 4; insectionS
below; and in chapter 8). This set of basic operations definesa computa-
tional architecturein the senseoutlined in chapter 1, section4. Against this
view some POP theorists seemto urge that the kinds of basic operation
made available by their models will suffice to construct accuratepsychological
models of all cognitive phenomena(see chaptersS and 6 above).
Eachparty to this dispute thus appearsto endorseits own version of the
uniformity claim. The Classical-cognitivist version is:
Every cognitive achievementis psychologicallyexplicableby a model
that canbe desaibedusing only the apparatusof classicalcogniHvism.
The POP version is:
Every cognitive achievementis psychologicallyexplicableby a model
that can be describedusing only the apparatusof POP.
The argumentI develop will urge that we resist the uniformity assumpHon
in all its guises. Instead, I endorsea model of mind that consistsof a multitude
of possibly virtual computaHonalarchitecturesadaptedto various task
Themultiplicityof mind 129
3 Simulatinga VonNeumannArchitecture
"
In a recent paper (Clark 1987) I speculatedthat the human mind might
effectively simulatea serial, symbol-processing, Von Neumannarchitecture
for somepurposes(largely evolutionarily recent tasks, as discussedin chapter
'
4). If that is the case, I asked, wouldn t it follow that for suchtaskssome
classicalcognitivist computationalaccountcould prove to be correct- not
-
just approximately correct but correct , tout court? In short, if we suppose
'
that suchsimulation goes on, isn t the uniformity assumptionsimply false,
and the relation between conventional and connectionist AI not uniform
acrossthe cognitive domain?
When writing that paper, I had no idea about how such simulation
might occur. I simply relied on the idea that there is nothing unusualin the
simulation of one architectureby another. The pervasiveidea in computer
scienceof a virtual machine is precisely the idea that a machine can be
programmedto behaveas if it were operating a different kind of hardware
(see, e.g ., Tannenbaum1976). Since then, however, things have come to
seema little more concrete. Thus in Rumelhart, Smolensky, et al. 1986 we
find somefascinatingspeculativeideason the humancapacityto engagein
various kinds of conscious, symbolic reasoning. Thesespeculationscan be
used, I believe, to give substanceto the claim that we might occasionally
simulatea Von neumannarchitecture.
'
The POP Group raisethe following questions. if the humaninformation-
' '
processingsystemcarriesout its computationsby settling into a solution
rather than applying logical operations, why are humans so intelligent?
How can we do science , mathematics , logic etc.? How can we do logic if
the basic operations are not logical at all?" They suggestan answer that
involves a neat computational twist. Our capacity to engage in formal,
"
serial, rule-governed reasoning , they speculate , is a result of our ability to
createartifacts- that is, our ability to createphysical representationsthat
we can manipulate in simple ways to get answers to very difficult and
"
abstractproblems. (Both passagesare from Rumelhart, Smolensky, 1986,
44.)
Before seeing how this solution works, it is worth expanding on the
problem for humanintelligenceand POP that it is intended to solve. In the
first of the two quoted passagesthe problem is about our ability to do
science , mathematics , and logic. Earlier in the same section (Rumelhart,
Smolenky , et al. 1986 , 38) the problem areas identified are conscious
thought , serial processing, and the role of language in thought. Very
generally , it seems to me, there are two kindsof human capacity that POP
modelsat first glanceare hard put to captureor illuminate. Theseare:
(1) Processes of serial reasoningin which the ordering of operations
is vital ,
132 7
Chapter
(2) Process
esof generativereasoningin which unboundedset of structures
may be producedby the applicationof rulesto a data base.
Consciousplanning. logic, and much advancedabstractthought seemsto
involve capacity 1. The prime example of capacity 2 would seem to be
languageproduction. The kind of story discussedbelow is best suited to
explaining the sequentialconsciousphenomenaadverted to in capacity 1.
Perhapsit can be extendedto cover the generativephenomenaaswell, but
that, I believe, is a much harder issueand one I make no claimsto address
here. POP models, like the model of sentenceprocessingdiscussedin the
previous chapter, seembest suited to modeling some aspectsof language
understanding . Languageproduction, insofar as it involves finding and combining
the right constituentsin the right way to expressa message , raises
a whole host of other issuesthat lie largely beyond the scopeof this book.
As far as sequential,consciousthought is concerned, POPapproaches so
far offer a two -facetedaccount. First and most simply, there is seriality in
POP. Sinceat one time a network occupiesat most one state (a pattern
of activation over the simple units), there will be a sequenceof suchstatesas
exogenous and endogenousinputs cause it to becomeactive and then relax
into new stable states. Considered during the short periods of value-
passingactivity , it is a parallel distributed system. Consideredover longer
stretches of time, it can be seenas a sequenceof discretestates. This gives
the POP theorist the beginnings of an angle on consciousexperience.The
idea being that " the contents of consciousnessare dominated by the relatively
stablestatesof the system. . . . Consciousnessconsistsof a sequence
of interpretations- each representedby a stable state of the system"
(Rumelhart, Smolensky, McClelland, and Hinton 1986, 39). We are often
not consciousof, say, the processof finding a good metaphor, making a
pun, various creative leaps in scienti6cdiscovery (more on which below).
But we are consciousof, say, applying modusponensto two lines of a
logical proof or planning a sequenceof events. The POP account would
begin to explain this by positing that relaxation occursduring the unconscious
fast phenomenaand by treating consciousphenomenaas a perceived
sequenceof the resultsof suchrelaxation steps.
But the sequentiality of states alone is insufficient to cover even the
processes of serial reasoningassociatedwith capacity (1). For, in effect, all
we have so far is a kind of stream-of-consciousnessdisplay. In casesof
logical reasoning, long multiplications and so on, the ordering of operations
is vital. How are such orderings achieved? Here is where the second
facet of the account comes in, namely, the use of artifacts as physical
representations .
Consider a simple example. SupposeI ask you to take every second
number in a spoken series, add 2 to it , and sum up the total. Most of us
The multiplicity of mind 133
would find the task quite difficult. But supposeI allow you to use pen,
paper, and the Arabic numerals. The task becomessimple. For the series
7, 4, 9, 5, 2, 1, 6, 9 isolateevery secondnumber(4, 5, 1, 9), add 2 (6, 7, 3, 11),
and sum the total (27).
Rumelhart, Smolensky, et al. develop a similar exampleinvolving long
multiplication. Most of us, they argue, can learn to just seethe answerto
somebasicmultiplication questions, e.g., we can just seethat 7 x 7 is 49.
This, they suggest, is evidenceof a pattern-completing mechanismof the
usualPOP variety. But for most of us longer multiplications presenta different
kind of problem. 722 x 942 is hard to do in the head. Instead, we
avail ourselves(at least in the first instance- see below) of an external
formalism that reduces the bigger task to an iterated series of familiar
relaxation steps. We write:
722
x 942
6 BACON, an Illustration
Let me illustrate the position outlined above by consideringsdenti6c discovery
once again. A cognitivist model of sdenti6c discovery is given by
the BACON program, outlined in chapter 1, section 4 (for full details see
Langleyet al. 1987). To recapitulatebriefly, BACON derivessdenti6c laws
from bodies of data. Roughly, it works on recorded observationsof the
values of variablesand seeksfunctions relating such values. In its search
for suchfunctions it follows somesimpleheuristicsthat suggestwhat functions
to try Arst and how to proceed in casesof difficulty . The program
doesnot start out with any theoreticalbias or expectationsconcerningthe
outcome; it simply seeksregularities in data. As a kind of control experiment
, Simon employed a graduatestudent who knew nothing of Kepler's
third law (seechapter 1) and give him the sets of data upon which Kepler
worked.2 Told to And a function relating one column of 6gures(in fact, the
radii of planetary orbits) to another (in fact, the periods of planetary
revolution), it took the student 60 hours to discover the law. The BACON
-
program is much quicker, but the procedureis the same: a serial, heurisHc
guided searchfor a function relating x and .v, uninformed by any understanding
of the signi6canceof x, .v, or the enterpriseof sdentmc investigation
itself.
Simon notes, however, that there are elements in the processof real
sdenti6c discovery that are not easily amenableto such an approach. He
'
cites the exampleof Flemings spotting the signi6canceof the mouldy petri
dish. But any classicflash of insight will do. We might point to Stephen-
son's (allegedly) watching his kettle boil and conceiving the idea of the
'
steamlocomotive or someones studying the behavior of thermodynamic
systemsand conceiving the ideas behind POP. These aspectsof sdenti6c
140 7
Chapter
-
specialinterest in fixing on regularities and differencesin the causalcom
putational substrateof our thought except insofar as theseare of immediate
practicalsignificance.
Suppose,then, that we were to acceptsucha division within the domain
of scientific discovery. Would it not then be fair to seek psychological
modelsof the slow, serialcomponentwithin a classiccognitivist framework
and models of the fast componentwithin POP? An overallmodel of human
scientificthinking needsto includeboth and to addressthe issueof how the
resultsof eachcan be fed to the other in a cooperativeway. Still, it would
not follow that the slow serial model is a mere approximation of that
component.
In sum, I am advocating that cognitive scienceis an investigation of a
mind composedof many interrelating virtual machineswith correct psychological
models at eachlevel and further accountsrequiredfor the interrelations
betweenstIch levels. Only recognition of this multiplidy of mind,
I suspect, will save cognitive sciencefrom a costly holy war between the
3
proponentsof POP and the advocatesof more conventional appr~aches.
Chapter8
StructuredThought, Part 1
1 Weightingfor Godot?
Are some cognitive competencesbeyond the explanatory reach of any
POP model? Does the weighting game degenerateat some specifiable
l
point into waiting for Godot? Somephilosophersand cognitive scientists
believe so. The putative problem concernsthe systematicity of the processing
requiredfor suchsophisticatedcognitive achievementsas language
production and understanding. Two kinds of argument are advancedto
convince us that connectionismis unable to penetrate these systematic
domains. One is a lively (but ultimately implausible) set of arguments
detailed in Fodor and Pylyshyn 1988 and Fodor 1987. These arguments
seek to support a classical -cognitivist model of thought. The other, associated
with an influential critique of connectionismby Pinker and Prince
(1988), aims to justify at least a classicalstructuring of the componentsof
information-processingmodels of higher cognitive functions. The arguments
here are plausible and important but, I shall argue, are unable to
support any strong conclusionsabout the limits of POP.
The itinerary goes like this. The current chapter focuseson the arguments
of Fodor and Pylyshyn. These are shown to be generally uncom-
pelling Systematicityof effect may well argue in favor of systematicity of
.
cause. But classicalcognitivism involves much larger claims, which Fodor
and Pylyshyn give us no reasonto accept. The chapterendsby associating
their error with a pervasivefailure within the cognitive-sciencecommunity
to distinguish two kinds of cognitive science. One of these involves the
attempt to model the complex, holistic structure of thought (thoughts
here simply arethe contentful statesdescribedusing propositional-attitude
- -
ascriptions). The other is the attempt to develop modelsof the in the head,
computationalcausesof the intelligent behavior that warrantssuchthought
ascriptions. The projects, I shall argue, are distinct and nonisomorphic.
Chapter 9 goes on to consider some edited highlights of the Pinker and
Prince paper and then returns afresh to the question of mixed models of
cognitive processes, first raisedin chapter 7.
144 Chapter 8
2 TheSystematicity
Argument .
Fodor and Pylyshyn 1988 is a powerful and provocative critique aimed at
the very foundations of the connectionist program. In effect, they offer
the mend of connectionisman apparently fatal dilemma. Either connec-
tionism constitutes a distinctive but inadequatecognitive model, or if it
constitutes an adequate cognitive model, it must do so by specifying
an implementation of distinctively classicalprocessingstrategiesand data
structures. I shall argue that the critique of Fodor and Pylyshyn is basedon
a deep philosophicalconfusion.
I begin with an imaginary anecdote, the point of which should become
apparentin due course. One day, a famousgroup of AI workers announced
the unveiling of the world 's first, genuine, thinking robot. This robot, it was
claimed, really had beliefs. The great day arrived when the robot was put
on public trial. But there was disappointment. All the robot could do, it
seemed , was output a single sentence : "The cat is on the mat." (It was
certainly a sophisticatedmachine, since it generally respondedwith the
sentencewhen and only when it was in the presenceof a cat on a mat.)
Here is an extract from a subsequentinterchangebetween the designersof
the robot and some influential membersof the mildly outraged academic
community.
Designers : Perhapswe exaggerateda little . But it really is a thinking
robot. It really doeshave at leastthe singlebelief that the cat is on the
mat.
Scoffers : How can you say that? Imagine if you had a child and it
could produce the sentence"The cat is on the mat" but could not use
I " " I "
the words Icat, lion, and Imat in any other ways. Surely, you
would concludethat the child had not yet learnedthe meaningof the
words involved.
Designers : Yes, but the child could still think that the cat is on the
mat, even if shehasnot yet learnedthe meaningsof the words.
Scoffers : Agreed, but the caseof your robot is even worse. The child
would at least be capableof appropriate perceptualand behavioral
responsesto other situationslike the mat being on the cat. Your robot
exhibits no suchresponses .
Designers : Now you are just being a behaviorist. We thought all that
stuff was discreditedyears ago. Our robot, we can assureyou, has a
data structure in its memory, and that structure consistsof a set of
" " II "
distinct physical tokens. One token stands for the, one for cat,
" " " " " '
one for lIis, one for on, and one for mat. Unless you rebehav -
iorists, why ask more of a thought than that?
'
Scoffers: Behavioristsor not, we cant agree. To us, it is constitutiveof
Structuredthought 145
Thought is systematic.
So internal representationsare structured.
Connectionistmodels posit unstructuredrepresentations.
So connectionist accounts are inadequate as distinctive cognitive
models.
Classicalaccounts, by contrast, are said to posit internal representations
with rich syntactic and semanticstructure. They thus reach the cognitive
parts that connectionistscannot reach.
3 Systematicity
and StructuredBehavior
This argumentis deeply flawed in at least two places. First, it misconceives
the natureof thought asaiption, and with it the significanceof systematidty.
Second, it mistakenly infers lack of compositional, generative structure
from lack of what I shall call conceptual-level compositional structure.
Thesetwo .mistakesturn out to be quite interestingly related.
The point to notice on the nature of thought ascription is that system-
atidty , as far as Fodor and Pylyshyn are concerned , is a contingent
, empirical
fact. This is quite clear from their discussionof the systematidty of infra-
verbal, animalthought. Animals that can think aRb, they claim, cangenerally
thought 147
Structured
" " "
think bRaalso. But they allow that this neednot be so. It is, they write, an
empirical question whether the cognitive " capacitiesof infraverbal organisms
are often structured that way (1988, 41). Now , it is certainly true
that an animal might be able to respond to aRb and not to bRa. But my
claim is that in such a case(cetmsparibus) we should conclude not that it
" " "
has, say, the thought a is taller than b but cannot have the thought b is
"
taller than a. Rather, its patent incapacity to have a spectrumof thoughts
involving a, b, and the taller-than relation should defeat the attempt to
ascribeto it the thought that a is taller than b in the first place. Perhapsit
hasa thought we might try to describeas the thought that a-is-taller-than-
b. But it doesnot have the thought reportedwith the ordinary sententialapparatus
of our language. For grasp of sucha thought requiresa grasp of its
componentconcepts, and that requires the generality constraint.
' " satisfying
" observation that you don t
'
s
In short, Fodor and Pylyshyn empirical
find creatureswhose mental life consistsof seventy-four unrelatedthoughts
" " '
is no empiricalfact at all. It is a conceptual fact, just as the thinking robot s
failure to have a single, isolated thought is a conceptualfact. Indeed, the
one is just a limiting caseof the other. A radically punctatemind is no mind
at all.
Theseobservationsshould begin to give us a handleon the actualnature
of thought ascription. Though.t ascription, as we saw in chapter 3, is a
meansof making senseof a whole body of behavior (actual and counterfactual
). We ascribea networkof thoughts to accountfor and describea rich
variety of behavioral responses . This picture of thought ascription echoes
the claims made in Dennett (1981). The folk-psychological practice of
"
thought ascription, he suggests, might best be viewed as a rationalistic
calculusof interpretation and prediction- an idealizing, abstract, instru-
"
mentalisticinterpretationmethod that has evolved becauseit works (1981",
" instrumentalism
48). If we put asidethe irrealistic overtones of the term
(a move Dennett himselfnow approvesof- see Dennett 1987, 69- 81), the
" "
general idea is that thought ascription is an abstract, idealising, holistic
process, which therefore need not correspond in any simple way to the
detailsof any story of in-the-headprocessing. The latter story is to be "told
"
by what Dennett (1981) calls sub-personal cognitive psychology. In
short, there need be no neat and tidy quasireductivebiconditional linking
in-the-head processingto the sentential ascriptionsof belief and thought
madein daily language. Instead, a subtle story about in-the-headprocessing
must explain a rich body of behavior (actual and counterfactual , external
a
and internal), which we thenmakeholistic senseof by ascribing systematic
network of abstractthoughts.
It may now seemthat we have succeeded in merelyrelocatingthe systematicity
that Fodor and Pylyshyn require. For though it is a conceptualfact,
and henceasunmysteriousto a connectionistas to a classicist,that thoughts
148 Chapter 8
4 CognitiveArchitecture
Fodor and Pylyshyn also criticize connectionistsfor confusing the level of
psychologicalexplanationand the level of implementation. Of course, the
brain is a connectionistmachineat one level they say. But that level may
not be identical with the level of description that should occupy anyone
interestedin our cognitivearchitecture . For the latter may be best described
in the terms appropriate to some virtual machine (a classicalone, they
believe) implementedon a connectionist substructure. A cognitive architecture
"
, we are told, consists of the set of basic operations, resources ,
functions, principles, etc. .. . whose domain and range are the representational
statesof the organism" (Fodor and Pylyshyn, 1988, 10). Fodor and
'
Pylyshyn s claim is that such operations, resources , etc. are fundamentally
classical; they consist of structure-sensitiveprocesses definedover internal,
classical, conceptual-level representations . Thus, if we were convinced of
the need for classicalrepresentationsand processes, the mere fact that the
brain is a kind of connectionistnetwork ought not to impressus. Connec-
tionist architecturescan be implemented in classicalmachinesand vice
versa.
This argument in its pure form need not concernus if we totally reject
Fodor and Pylyshyn' s reasonsfor believing in classicalrepresentationsand
processes. But it is, I think, worth pausing to note that an intermediate
Structuredthought 151
and grammarsof (1) but also on the structureof the brain, psycholinguistic
evidence,and even, perhaps,evolutionary conjecturesconcerningthe origins
of speechand language(see, e.g., Tennant 1984). In short, what is neededis
clarity concerningthe goalsof various studies, not a _victory of one choice
of study over another. Devitt and Sterelnystrike a nice balance, concluding
"
that linguists are usefully studying not internal mechanismsbut the truth-
"
conditionally relevant syntactic properties of linguistic symbols (1984,
146), while nonethelessallowing that such studies may illuminate some
general featuresof internal mechanismsand hence (quite apart from their
intrinsic interest) may still be of useto the theorist concernedwith brain
structures.
What is thus true of the study of grammaris equallytrue, I suggest, of the
study of thought. Contentful thought is what is desaibed by propositional-
attitude ascriptions. These ascriptionsconstitute a classof objects susceptible
to various formal treatments, just as the sentencesjudged grammatical
constitute a class of objects susceptibleto various formal treatments. In
both cases , computationalapproaches can help suggestand test suchtreatments
. But in both casesthesecomputationaltreatmentsand a psychologically
realisticstory about the brain basisof sentenceproduction or holding
propositional attitudes may be expectedto come apart.
7 Is NaivePhysicsin theHead?
There is a type of work within cognitive scienceknown variously as naive
physics, qualitative reasoning, or the fonnalization of commonsenseknowledge
. I want to end this chapter by suggestingthat most of this work, at
least in its classicalcognitivist incarnations, may fall under the umbrella of
what I am calling descriptive cognitive science. If so, this is a clear casein
which descriptive cognitive scienceand causalcognitive sciencehave got
badly confused. For people working in the field of naive physics typically
conceiveof their work asvery psychologically realistic, in contradistinction
to much other AI work.
The basicidea behind naive physicsis simple and hasalreadybeenmentioned
in chapter 3, section 6. Naive physics is an attempt to capture the
kind of commonsenseknowledge that mobile, embodied beings need to
get around in the real world. We all know a lot about tension and rigidity :
't
you can push an object with a pieceof string. And we know about liquidity
, solidity, elasticity, spreading , and so on. The list is endless. An ~ the
'
knowledge is .
essential Without it we couldn t spreadmarmite on toast,
predict that beer will spill off a smooth unbounded table top, or drag a
a
shopping trolley along bumpy road. But the project still underspecified
is .
I gave the goal as that of capturing commonsenseknowledge. But the
-
metaphorof capturing is always dangerous, for it leavesthe criteria of suc
158 8
Chapter
8 Refusingthe SyntacticChallenge
Just to round off this chapter, let me say a word about what Fodor calls
intentional realism, i.e., the belief (sic) that beliefs and desiresare real and
are causesof actions. I suspectthat Fodor is driven to defend the position
that computationalarticulation in the brain mirrors the structure of ascriptions
of propositional attitudes by a fear that beliefsand desirescanonly be
causesif they turn up in formal guise as part of the physical story behind
intelligent behavior. But this neednot be the case. If belief and desiretalk is
a holistic net thrown over an entire body of intelligent behavior, we need
not expect regular syntactic analoguesto particular beliefs and desiresto
turn up in the head. All we need is that there should be somephysical,
causal story, and that talk of beliefs and desires should make sense of
behavior. Suchmaking sensedoes involve a notion of cause, since beliefs
do causeactions. But unless we believe that there is only one model of
causation, the physical, this needn't causeany discomfort (see also the
argumentin the appendix).
Fodor's approach is dangerous. By accepting the bogus challenge to
produce syntactic brain analogues
'
to linguistic ascriptions of belief contents
, he opens the Pandoras box of eliminative materialism. For if such
analoguesare not found, he must conclude that there are no beliefs and
desires. The mere possibility of such a conclusion is surely an effective
reductioad absurdum of any theory that gives it housespace.
Chapter9
StructuredThought, Part 2
2 ThePast-Tense
-AcquisitionNetwork
The particular POP model that Pinker and Prince use as the focus of their
attack is the past-tense-acquisition network described in Rumelhart and
162 Chapter 9
irregular ones. In the second the child over regularizesi she seems to
" "stage
have learned the regular -ed ending for English past tenses and can give
this ending for new and even made-up verbs. But she will now mistakenly
" "
give an -ed ending for irregular verbs , including ones she got right at
stage one. The overregularization stage has two substages, one in which
" " " "
the present form gets the - ed ending (e.g ., come becomes comed ) and
" " " " " "
one in which the past form gets it (e.g ., ate becomes ated and came
" "
becomes camed ). The third and final stage is when the child finally gets
" "
it right , adding -ed to regulars and novel verbs and generating various
irregular or subregular forms for the rest.
Classical models , as Pinker and Prince note , account for this data in an
intuitively obvious way . They posit an initial stage in which the child has
effectively memorized a small set of forms in a totally unsystematic and
unconnected way . This is stage one. At stage two , according to this story ,
the child manages to extract a rule covering a large number of cases. But
the rule is now mistakenly deployed to generate all past tenses. At the final
stage this is put right . Now the child uses lexical , memorized , item -indexed
resources to handle irregular cases and nonlexical , rule -based resources to
handle regular ones.
Classical models , however , typically exhibit a good deal more structure
than this bare minimum (see, e.g ., the model in Pinker 1984). The processing
is decomposed into a set of functional components including a lexicon of
structural elements (items like stems, pre Axes, suffix es, and past tenses), a
structural rule system for such elements, and phonetic elements and rules.
A classical model so constructed will posit a variety of mechanisms that
represent the data differently (morphological and phonetic representations )
with access and feed relations between the mechanisms. In a sense, the
classical models here are transparent with respect to the articulation of
linguistic theory . Distinct linguistic theories dealing with , e.g ., morphology
and phonology are paired with distinct in -the-head, information -processing
mechanisms.
The POP model challenges this assumption that in - the -head mechanisms
mirror structured , componential , rule -based linguistic theories. It is not
necessary to dwell in detail on the Rumelhart and McClelland model to see
why this is so. The model takes as input a representation of the verb constructed
entirely out of phonetic microfeatures . It uses a standard POP
pattern assooator to learn to map phonetic microfeature representations of
the root form of verbs to a past-tensed output (again expressed as a set of
phonetic microfeatures ). It learns these pairings by the usual iterated process
of weight adjustments described in previous chapters. The basic structure
of the model is thus: phonetic representations of root forms are input
into a POP pattern associator, and phonetic representations of past forms
result as output . !
164 Chapter9
3 ThePinkerand PrinceCritique
Pinker and Prince (1988) raise a number of objections to a POP model of
'
childrens acquisitionof the past tense. Someof thesecriticismsare specific
to the particular POP model just discussed " , while the others are at least
sugg ~ stive of difficulties with any nontn vial POP model of such a skill.2 I
shall only be concernedwith difficulties of this last kind. Suchcasescan be
roughly groupedinto four types. Theseconcern(1) the model's overreliance
on the environmentasa sourceof structure(2) the power of the POP learning
algorithms (this relates to the counterfactualspaceoccupied by such
models, a spacethat is argued to be psychologicallyunrealistic), (3) the use
of the distinctive POP operation of blending, and (4) the use of microfeature
representations .
Overreliance on theenvironment
The Rumelhartand McClelland model, we saw, made the transition from
stage1 (rote knowledge) to stage2 (extraction of regularity). But how was
this achieved? It was achieved, it seems,by Arst exposing the network to a
population mainly of irregular verbs (10 verbs, 2 regular) and thenpresenting
it with a massiveinflux of regular verbs (410 verbs, 344 regular). This
suddenand dramatic influx of regular verbs in the training population is
'
the sole causeof the model s transition from stageone to stagetwo. Thus,
"The model' s shift from correct to over
regularizedforms does not emerge "
from any endogenousprocess: it is driven diredly by shifts in the input
(Pinker and Prince 1988, 138). By contrast, some developmentalpsychologists
(e.g ., Karmiloff-Smith [1987]) believe that the shift is causedby an
internally driven attempt to organize and understandthe data. Certainly,
there is no empiricalevidencethat a suddenshift in the nature of the input
population must precedethe transition to stage 2 (seePinker and Prince,
1988, 142).
The general point here is that POP models utilize a very powerful
-
learning mechanismthat, when given well chosen inputs, can learn to
166 9
Chapter
produce almost any behavior you care to name. But a deep reliance on
highly structured inputs may reduce the psychological attractivenessof
such models. Moreover, the spaceof counterfactualsassociatedwith an
input-driven model may be psychologically implausible. Given a different
set of inputs, these models might go straight to stage 2, or even regress
from stage2 to stage 1. It is at least not obvious that humaninfants enjoy
the samedegreeof freedom.
Blending
We saw in section 2 above how the model generateserrors by blending
two such patterns as from " eat" to " ate" and from " eat" to " eated" to
" " '
produce the pattern from eat" to ated: By contrast a conventional rule-
basedaccountwould posit a mechanismspecificallygeared to operate on
the stems of regular verbs, inflecting them as required. If this nonlexical
" "
componentwere mistakenly given "
ate as a stem, it would simply inflect
it sausage -machine fashion into ated: ' The choice, then, is between an
explanationby blending within a single mechanismand an explanationof
misfeedingwithin a systemthat hasa distinct nonlexicalmechanism . Pinker
and Prince (1988, 157) point to evidencewhich favors the latter, classical
option.
If blending is the psychological processresponsible, it is reasonableto
expect a whole classof such errors. For example , we might expect blends
of common middle-vowel changesand the " -ed" ending (from " shape" to
" " " " " "
shipped and from sip to sepped). Children exhibit no ~uch errors. If,
on the other hand, the guilty processis misfeedto a nonlexicalmechanism ,
we should expect to find othererrors of inflection basedon a mistakenstem
" " "
(from went" to wenting ). Children do exhibit sucherrors.
Structuredthought 167
Microfeaturerepresentations
The Rumelhartand McClelland model relies on the distinctive POP device
of distributed microfeaturerepresentation.The use of sucha form of representation
buys a certain kind of automatic generalization. But it may not
be the right kind. The model, we saw, achievesits ends without applying
computational operations to any" syntactic entities with a semantics
" or " suffix." Insteadproledible
given by such labels as stem , its notion of
stemsis just the center of a state spaceof instancesof strings presentedfor
inAectioninto the past tense. The lack of a representationof stemsas such
deprivesthe system of any meansof encoding the generalidea of a regular
" "
past form (i.e., stem + ed ). Regular forms can be produced just in case
the stem in a newly presentedcaseis sufficiently similar to those encountered
in training runs. The upshot of this is a much more constrainedgen-
eralization than that achievedwithin a classicalmodel, which incorporates
a nonlexicalcomponent. For the latter would do its work whateverwe gave
it as input. Whether this is good or bad (as far as the psychologicalrealism
of the model is concerned) is, I think, an open question. For the moment, I
simply note the distinction. (pinkerand Princeclearly hold it to be bad; see
Pinkerand Prince 1988, 124.)
A more generalworry , stemmingfrom the sameroot, is that generaliza-
tion basedon pure microfeaturerepresentationis blind. Pinker and Prince
note that when humansgeneralize, they typically do so by relying on a
theory of which microfeaturesare importantin a given context. This knowledge
of salient featurescan far outweigh any more quantitative notion of
similarity based simply on the number of common microfeatures. They
write, "To take one example, knowledgeof how a set of perceptualfeatures
'
was caused. . . can override any generalizationsinspired by the object s
featuresthemselves:for example, an animal that looks exactly like a skunk
will nonethelessbe treated as a raccon if one is told that the stripe was
"
paintedonto an animal that had racconparentsand raccoonbabies (pinker
and Prince 1988, 177). Human generalization, it seems, is not the same
as the automatic generalization according to similarity of microfeatures
found in POP. Rather, it is driven by high-level knowledge of the domain
concerned.
To bring this out, it may be worth developing a final example of my
own. Consider the processof understandingmetaphor, and assumethat a
successfulmetaphor illuminates a target domain by meansof certain features
of the home domain of the metaphor. Supposefurther that both the
metaphorand the target are eachrepresentedas setsof microfeaturesthus:
( MMF1, . . . , MMFII) and ( TMF1, . . . , TMFII) ( MMF = metaphor microfeature
, TMF = target microfeature). It might seem that the necessary
capacityto conceiveof the target in the terms suggestedby the metaphor
is just another example of shading meaning according to context, a ca-
168 9
Chapter
'
pacity that as we ve seen, POP systemsare admirably suited to exhibit.
Thus, just as we earlier saw how to conceiveof a bedroom along the lines
suggestedby inclusion of a sofa, so we might now expect to seehow to
conceiveof a raven along the lines suggestedby the contextual inclusion
of a writing desk.
But in fact there is a very importance difference. For in shading the
meaning of bedroom, the relevantmicrofeatures(i.e., sofa) were already
specified. Both the joy and mystery of metaphorlies in the lack of any such
specification. It is the job of one who hearsthe metaphor to find the salient
featuresand thento shadethe target domain accordingly. In other words,
we need somehow to fix on a salient subsetof ( MM Ft , . . . , MMF " >. And
such fixation must surely proceed in the light of high-level knowledge
concerningthe problem at hand and the target domain involved. In short,
not all microfeaturesare equal, and a good many of our cognitive skills
depend on deciding accordingto high-levelknowledgewhich ones to .
attend
to in a given instance.
4 Pathology
And the bad newsjust keepson coming. Not only do we have the charges
of the Pinker and Prince critique to worry about. There is also a body of
somewhatrecalcitrantpathological data.
Consider the disorder known as developmental dysphasia. Developmental
dysphasicsare slow at learning to talk, yet appear to suffer from
no sensory, environmental, or general intellectual defect. Given the task
of repeatinga phonological sequence , developmentaldysphasicswill typically
return a syntactically simplified version of the sentence.For example,
" ' " " " " "
given He cant go home, they produce He no go or He no can go.
The simplifications often include the loss of grammatical morphemes-
suffixes marking tense or number- and generally do not affect word
stems. Thus "bees" may become"bee," but " nose" does not become" no,"
( Theabove is basedon Harris and Coltheart 1986, 111.) The existenceof a
deficit that can impair the production of the grammaticalmorphemeswhile
leaving the word stem intact seemsprima fade to be evidencefor a distinct
nonlexical mechanism . We would expect such a defidt whenever the nonlexical
mechanismis disengagedor its output ignored for whatever reason.
Or again, consider what is known as surfacedyslexia.3 Some surface
dyslexics lose the capadty correctly to read aloud irregular words, while
retaining the capadty to pronounceregular words intact. When facedwith
an irregular words, suchpatientswill generatea regular pronoundation for
it. Thus, the irregular word " pint" is pronounced as if it rhymed with
" "
regular words like mint. This is taken to support a dual-route accountof
reading aloud, i.e., an accountin which a nonlexical componentdealswith
Structuredthought 169
"
regular words. If the reading systemdoes include thesetwo separateprocessing
components, it might be possiblethat neurological damagecould
impair one component whilst leaving the other intact, to produce [this]
" Harris and Coltheart 1986, 244).
specific pattern of acquired dyslexia (
Suchdata certainly seemsto support a picture that includesat least some
distinct rule-basedprocessing, a picture that on the faceof it is ruled out by
single-network PDP models.
However, caution is needed. Martin Davies has pointed out that sucha
conclusionmay be basedon an unimaginative idea of the ways in which a
single network could sufferdamage(Davies , forthcoming, 19). Davies does
not develop a specificsuggestionin print,4 but we can at least imagine the
following kind of case. Imaginea single network in which presentedwords
must yield a certain level of activation of someoutput units. And imagine
that by plugging into an often-repeatedpattern, the regular words have, as
it were, worn a very deepgroove into the system. With sufficienttraining,
the system can alsolearn to give correct outputs (pronoundation instructions
) for irregular words. But the depth of groove here is always lessthan
that for the regularwords, perhapsjust abovethe outputting threshold. Now
imaginea kind of damagethat decrementsall the connectivity strengthsby
10 percent. This could move all the irregular words below the threshold,
while leaving the originally very strong regular pattern functional. This
kind of scenariooffers at least the beginnings of a single network account
of surfacedyslexia. For someactualexamplesof the way PDP modelscould
be used to account for pathological data, see Mcdelland and Rumelhart
1986, which dealswith various amnesicsyndromes.
Pathologicaldata, I conclude, at best suggests a certain kind of classical
structuring of the human information-processing system into lexical and
nonlexical components. But we must concludewith Davies that such data
is not compelling in advanceof a thorough analysisof the kinds of breakdown
that complex PDP systemscan exhibit. It seems, then, that we are
left with the problemsraisedby the Pinker and Princecritique. In the next
section I shall argue that although theseproblemsare real and significant,
the conclusionsto which they lead Pinker and Prince are by no means
commensuratewith their content.
models are not mere isotropic node tangles, they will themselveshave
propertiesthat callout for explanation. We expect that in most casesthese
-
explanationswill constitute"the macro theory of the rules that the system
would be said to implement ( Pinker and Prince 1988, 171). There are two
claimshere that need to be distinguished.
(1) Any POP model exhibiting some classicalcomponential structuring
is just an implementationof a classicaltheory.
(2) The explanationof this broad structuring will typically involve
the useof classicalrule-basedmodels.
Claim (1) is clearly false. Evenif a large connectionist system needs to
-
deploy a complete, virtual, symbol processingmechanism(recallchapter7),
it by no meansfollows that the overall system produced merely implements
a classicaltheory of information processingin that domain. This is
probably best demonstratedby someexamples.
Recall the example (chapter 8, section 4) of a subconceptuallyimplemented
rule interpreter. This is a virtual symbol processor- a symbol processor
and rule-userrealizedin a POPsubstructure.Now take a task suchas
the creation of a mathematicalproof. In such a case, we saw, the system
could use characteristicPOP operations to generate candidaterules that
would be passedto the rule interpreter for inspection and deployment.
Sucha system has the best of both worlds. The POP operations provide
an intuitive (best-match), context-sensitive choice of rules. The classical
operationsensurethe validity of the rule (blendsare not allowed) and its
strict deployment.
Somesuchstory could be told for any truly rule-governed domain. Take
chess,for example. In sucha domain a thoroughly soft and intuitive system
would be prone to just the kinds of errors suggestedby Pinker and Prince.
The fact that someonelearnsto play chessusing piecesof a certain shape
ought not to causeher to treat the: bishopsin a new set as pawns because
of their microfeaturesimilarity to the training pawns. Chessconstitutes a
domain in which absolutehard, functional individuation is calledfor; it also
demandscategorical and rigid rule-following . It would be a disaster to
allow the microfeature similarity of a pawn to a bishop to prompt a
blending of the rules for moving bishopsand pawns. A blend of two good
rules is almost certain to be a bad one. Yet a combinedPOP and virtual
symbol-processingsystemwould again exhibit all the advantagesoutlined.
It would think up possible moves fluidly and intuitively , but it could
subject these ideas to very high-level scrutiny, identify pieces by hard,
functional individuation and be absolutely precisein its adherenceto the
explicit rules of the game.
As a secondexample, considerthe problem of understandingmetaphor
raised earlier. And now imagine a combined POP and virtual symbol-
172 9
Chapter
processing( VSP) system that operates in the following way. The VSP
system inspectsthe microfeaturerepresentationof the metaphor and the
target. On the basis of high-level knowledge of the target domain it
choosesa salient set of metaphor microfeatures. It then activates that set
and allows the characteristicPOP shadingprocessto amendthe representation
of the target domain as required.
Finally, consider the three-stage- developmentalcaseitself, and imagine
that there is, as classicalmodels suggest, a genuine distinction between
lexical and nonlexical processingstrategies. But suppose, in addition, that
the nonlexicalprocessis learnedby the child and that the learning process
itself is to be given a POP model. This yields the following picture:
Stage1. Correct use, unsystematic. This stageis explainedby a pure
POP mechanismof storageand recall.
Transition. A POP model involving endogenous(and perhapsinnate)
structuring, which forces the child to generatea nonlexical processing
strategy to explain to itself the regularitiesin its own language
production
Stage2. Overregularizationdue to suddenrelianceon a newly formed
nonlexical strategy
Transition . A POP model of tuning by correction
Stage 3. Normal use. The coexistenceof a pure POP mechanismof
lexical accessand a nonlexical mechanismimplementedwith POP
If some suchmodel were accurate(and something like this model is in fact
'
contemplatedin Kanniloff-Smith 1987), we would not havea classicalpicture
of development, although we might have a classicalpicture of adult use.6
To sum up, the mere fact that a system exhibits a degree of classical
structuring into various components(one of which might be a rule interpreter
) does not force the conclusionthat it is a mere implementationof a
classicaltheory. This is so because(a) the classicalcomponentsmay call
and accesspowerful POP operations of matching, search, blending and
generalization and (b) the developmental process by which the system
achievessuchstructuremay itself requirea POPexplanation. daim (1) thus
fails. It may be, however, that to understandwhy the final system must
have the structure it does, we will need to think in classical , symbol-
manipulatingterms. This secondclaim (claim 2, p. 171) is consideredin the
next section.
6 TheTheoretical
Analysisof MixedModels
What are the theoreticalimplicationsof mixed POP and VSP models ?
Onethought, whichwe havealreadyrejected , is that any suchmodelmust
Structuredthought 173
1 ThePieces
All the pieces of the jigsaw are now before us, and their subgroupings
are largely complete. SemanticallytransparentAI models have been described
and comparedwith highly distributedconnectionistsystems.Various
worries about the power and methodology of both kinds of work have
been presented. The possibility of mixed models of cognitive processing
has been raised, and the nature of folk-psychologicaltalk and its role in a
scienceof cognitive processinghas been discussed . Along the way I have
criticized the argumentsin favor of Fodor's radical cognitivism, and I was
forced to distinguish two projectswithin cognitive science:one descriptive
and involving the essentialuse of classicalrepresentations ; the other concerned
with modeling the computationalcauses of intelligent behavior and
typically not dependenton suchrepresentations . At the end of the previous
chapter I also drew a distinction internal to causal cognitive science: the
distinction between the project of psychological explanation(laying out
the computational causesof intelligent behavior) and that of instantiation
(making a machinethat actually has thoughts). These two projects, I suggested
, may come apart. This final chapter(which also functions as a kind
of selective summary and conclusion) expands on this last piece of the
jigsaw and tries to display as clearly as possible the overall structure of
what I have assembled . In effect, it displays a picture of the relations of
various parts of an intellectualmap of the mind.
One word of warning. Since I should be as precise.as possible about
what each part of this intellectual map is doing, for the duration of the
chapter I shall largely do away with shorthand talk of representations ,
beliefs, and so on to describecontemporarycomputer models(recall chapter
6, section 2, and chapter 5, footnote 4). At times this will result in
languagethat is somewhatcumbersomeand drawn out.
178 Chapter 10
2 Buildinga Thinker
What does it take to build a thinker? Somephilosophersare scepticalthat a
sufficientcondition of being a thinker is satisfying a certain kind of fonnal
description (see chapter 2). Such worries have typically focused on the
kindsof fonnal descriptionsappropriate to semanticallytransparentAI . In
one sensewe have seen virtue in such worries.l It has indeed begun to
seem that satisfying certain fonnal descriptions is vastly inadequateto
ensurethat the creaturesatisfying the descriptionhasa cognitive apparatus
organized in a way capable of supporting the rich, flexible actual and
counterfactualbehavior that warrants an ascription of mental states to it.
(Apologies for the lengthy fonnulation- you were warned!) Somereasons
for thinking this were developedin chapter6, where I discussedthe holism
and flexibility achievedby systemsthat usedistributed representationsand
superpositionalstorage.
In short, many worries can usefully be targeted in what I am calling the
project of instantiation . They can be recast as worries to the effect that
satisfying the kind of fonnal description that specifiesa conventional,
semantically transparent program will never isolate a class .of physical
mechanismscapableof supporting the rich, Rexibleactualand counter /actual
behavior that warrants ascribing mental statesto the system instantiating
suchmechanisms . The first stagein an accountof instantiationthus involves
the description of the general structure of a mechanismcapableof supporting
suchrich and flexible behavior at thegreatestpossiblelevelof abstraction
from particularphysicaldevices . Searleseemsto believe that we reach
this level of abstraction beforewe leave the realms of biological description
(seechapter2). I seeno reasonto believe this, although it could con-
ceivably turn out to be true. Instead, my belief is that somenonbiological,
microfunctional description, such as that offered by a value-passingPOP
approach, will turn out to specify at leastone classof physical mechanisms
capable of supporting just the kind of rich and flexible behavior that
warrantsascribingmental states.
This is not to say, however, that its merelysatisfying some appropriate
fonnal descriptionwill warrant calling somethinga thinker. Instead, we need
to imaginea set of conditionsjointly sufficientfor instantiatingmental states,
one of which will involve satisfying somemicrofunctional description like
those offered by POP. I spoke of systemsthat could be properly credited
with mental statesif they instantiated such descriptions. And I spoke also
of a mechanismthat, suitably embodied, connectedand locatedin a system
would allow us to properly describeit in mentalistictenns. Theseprovisos
indicate the secondand final stageof an accountof instantiation.
Instantiating a mental state may not be a matter of possessinga certain
internalstructure alone. In previous chapterswe discoveredtwo reasonsto
Reassemblingthe jigsaw 179
believe that the configuration of the external world might figure among
the conditions of occupying a mental state. The first reason was that
the ascription of mental states may involve the world (chapter 3). The
content of a belief may vary according to the configuration of the world
(recall the twin earth cases[chapter 3, section 4]). And some beliefs (e.g .,
those involving demonstratives) may be simply unavailablein the absence
of their objects. What this suggestsis that instantiatingcertainmental states
may involve being suitably located and connectedto the world. (It does
not follow , as far as I can see, that a brain in a vat can have no thoughts at
all.)
We also noted in chapters4 and 7 a secondway in which external facts
may affect the capacity of a system to instantiate mental states. This is
the much more practical dimension of exploitation. A system (i.e., a brain
or a POP machine) may need to use external structuresand bodily operations
on such structuresto augment and even qualitatively alter its own
processingpowers. Thus, suppose that we accept that stage one of an
instantiationaccount(an accountof brain structures) involves a microfunctional
specificationof something like a POP system. We might also hold
that instantiating somemental states(for example, all those involving conscious
, symbolic and logical reasoning) requiresthat suchsystemsemulate
a different architecture. And we might believe that suchemulation is made
possibleonly by the capacity of an embodiedsystem located in a suitable
environment to exploit real-world structures to reduce complex, serial
processingtasksto an iterated seriesof POP operations. POP systemsare
essentially learning devices, and learning devices (e.g., babies) come to
occupy mental statesby interacting with a rich and varied environment.
For these very practical reasonsthe project of full instantiation may be
as dependenton embodiment and environmental structure as on internal
structure.
Most important of all, I suspect, is the holistic nature of thought ascription
. Thoughts, we may say, just are what gets ascribedusing sentences
expressingpropositional attitudes of belief, desire, and the like. Suchascriptions
are made on the basisof whole nexusesof actual behavior. If this is
the case, to have a certain thought is to engage in a whole range of
behaviors, a range that, for daily purposes, is usefully codified and explained
by a holistically intertwined set of ascribed beliefs and desires.
Sincethere will be no neat one-to-onemapping of thoughts so ascribedto
computational brain states (see chapters3 and 8 especially), it follows a
fortiori that therewill be no computationalbrain statethat is a sufficientcondition
of having that thought. The project I have calleddescrip Hvecognitive
sciencein effect gives a formal model of the internal relations of sentences
usedto ascribesuchthoughts. This is a usefulproject, but instantiating that
'
kind of formal description certainly wont give you a thinker. For the
180 10
Chapter
3 Explaininga Thinker
The project of instantiation and the project of psychologicalmodeling and
explanationare different. This may seemobvious, but I suspecta great deal
of confusion within cognitive scienceis a direct result of not attending to
this distinction.
First and most obviously, the project of instantiation requiresonly that
we delimit a classof mechanismscapableof providing the causalsubstructure
to ground rich and varied behavior of the kind warranting the ascription
of mental states. There may be many suchclassesof mechanisms , and
an instantiation project may thus succeedwithout Arst delimiting the class
of mechanismsthat human brains belong to. But we may put this notion
aside at least as far as our interest in POP is concerned. POP is certainly
neurally inspired and aims to increase our knowledge of the class of
mechanismsto which we ourselvesare in somesignificantway related.
Secondand more importantly, even if the microfunctional description
that (for the instantiation project) delimits the classof mechanismsto which
we belong is entirely specifiedby a POP-style account, correct psychological
models and explanations of our thought may also require accounts
couchedat many different levels. To bring this out, recall my account of
Marr' s picture of the levels of understandingof an information processing
task (chapter1, section5). Psychological explanation , accordingto Rumelhart
thejigsaw 181
Reassembling
"
and McClelland (1986, 122- 124) is committed to an elucidation of the
" -
algorithmic level, i.e., Marr' s level 2. For the story at this level the level
that specifiesthe mode of representationand actual processingsteps-
provides the explanation of such phenomenaas speed, efficiency, relative
easein solving various problems, and graceful degradation (performance
with noise, inadequatedata, or damagedhardware). That is, the story at
the algorithmic level provides the explanation of the performancedata
with which real psychology is typically interested.
Supposewe accept this broad characterizationof the level of psychological
, a singlecomputationalmodel
interest. It will not follow that in general
servesto explainall suchdata relatingto the perfonnance of a given task. One
reasonfor this hasdirectly to do with the notion of virtual machines.Thus,
imagine a POP system engagedin the full or partial simulation of a more
conventional processor(e.g ., the environment manipulator for full simulation
and the mathematicalprover for partial simulation). In suchcaseswe
will needto advert to at leasttwo algorithmic descriptionsof the systemto
explain various kinds of data. The relative easewith which the system
solvesvarious problemsand the nature of the transformationsof representations
involved will often require an accountcouchedin terms of the top-
level virtual machine, e.g., a production system or a list processor. But
speedand graceful degradationwill need to be explainedby adverting to
an algorithmic descriptionof the POP implementationof someof the functions
found in the top level virtual machine. The thought is that not only
may different tasksrequire different forms of computational explanation,
but different kinds of data pertaining to a single task may likewise require
severaltypes of computationalmodels.
At this point someonemight object as follows. It may be convenientto
use a classicalserial model at times. But becauseof the underlying POP
implementation, a full and correct psychologicalexplanationalways can in
principle be given in POP algorithms alone.
This is a generalreductionist argumentthat in the extremeis sometimes
thought to threaten the integrity of the entire project of psychological
explanation. But the specterof reduction need not be feared. For explanation
is not just a matter 9f showing a structurethat is sufficientto induceor
constitute a certain higher-level state or process. It is also a matter of
depicting the structure at the right level. And the right level here is determined
by the need to capturegeneralizations about the phenomenapicked
out by the sciencein question. The general point has been made often
enough (see, e.g., Pylyshyn 1986, chapter I ), and I shall not labor it here.
Instead, I shall merely sketch the relevant instances. Consider the casesof
full simulation Horn chapter 7. Here we have ( by hypothesis) a POP
substructuresupporting the mode of representationof the input and output
and the processingstepsand pattern-matching characteristicsof a regular
182 Chapter 10
4 SomeCaveats
Part of my project is to assessthe importanceand role of POP models in
understandingthe humanmind. And the conclusionseemsto be that such
modelshave a major part to play in eachof the two projects (instantiation
and explanation) just distinguished. Sucha conclusion, however, needsto
be qualifiedin at least two regards.
First, POP mechanismsmay turn out to be just one among many kinds
of mechanismscapableof supporting the rich, flexible actual and counterfactual
behaviordemandedof a genuinecognizer. Thus, even if semantically
transparentapproaches lack the capacity to ground such behavior (as I
suspect), it doesnot follow that all thought requiresa POP substrate.
Second, the particular algorithms currently being explored by POP theorists
are almost certainly still inadequateto the task. The brain seemsto
employ many kinds of parallel cooperativenetworks, using different kinds
of units and connectivity patterns. And this variety may be essentialto its
power but is not yet present in POP work, which usesa simple, idealized
neuronlike unit. In a recent article on the computation of motion, two
"
leading theorists comment, Nerve cells exhibit a variety of information-
processingmechanisms ; the nerve cell membraneproducesand propagates
""
many different types of electricalsignals. One can think of the McCulloch
and Pitts model [seechapter 5 above] as equating a neuron with a single
transistorwhereasour model suggeststhat neuronsare more like computer
"
chips . with hundredsof transistorseach (Poggio and Koch 1987, 42 and
48). The idealized neurons of current POP models, it is fair to say, are
only a little finer and exhibit only a little more variety than the original
McCulloch and Pitts versions. Current models, then, may suffer many
severelimitations until workers in the field are in a position to introduce
greaterdetail and variety.
In a similar vein, the learning algorithms currently in favor (the generalized
delta rule and Boltzmann machine learning procedure) are most
probably inadequatein variousways. For example, as Rudi Lutz haspointed
out, the generalizeddelta rule requiresus in effect to tell the machinewhen
it is to learnand when it is simply to behaveon the basisof what it already
knows. Yet this explicit switching of modesis quite counterintuitive aspart
of any psychologicalmodel of humanlearning.
In short, the very broad brushstrokes of POP, the general idea of a
parallel, value-passingarchitectureencodinginformation in distributed patterns
of activity and connectivity, will probably constitute its positive
contribution to understandingthe mind, not the particular algorithms and
idealizedneurons currently under study. ( This is not, of course, any criticism
of the current work; creating such models and then trying to understand
their limits is the best way of improving upon them.)
184 10
Chapter
'
This little story won t make much senseunless it is read in the context
provided by chapter4, sectionS, and with an eye to the generaldistinction
1
betweensemanticallytransparentand semanticallyopaquesystems.
One me day, a high-level architect was idly musing (reciting Words-
worth) in the cloisteredconfinesof Kings CollegeChapel. Eyesraisedto that
- "
magni6cantceiling, she recited its well publicized virtues ( that branching
roof, self-poisedand scoopedinto ten thousandcells, where light and shade
"
repose. . . ). But her musingswere rudely interrupted.
From a far comer, wherein the fabric of reality was oh so gently parting,
"
a hypnotic voice commanded: High level Architect, look you well upon
the splendoursof this chapel roof. Mark well its regular pattern. Marvel
at the star shapesdecoratedwith rose and portcullis. And marvel all the
more as I tell you, thereis no magichere. All you see is complex, physical
architecture such as you yourself might re-create. Make this your project
: go and build for me a roof as splendid as the one you see before
you."
The high-level architectobeyed the call. Alone in her Aneglassand steel
office, shereflectedon the qualitiesof the roof shewas to re-create. Above
all, she recalledthose star shapes , so geometric, so perfect, the vehicle of
" " "
the rose and portcullis design itself. Those shapes , sheconcluded, merit
detailed attention. Further observation is called for. I shall return to the
"
chapel.
There ensuedsome days of patient observation and measurement . At
the end of this time the architect had at her command a set of rules to
locate and structure the shapesin just the way observed. Theserules, she
felt sure, must have been followed by the original designer. Here is a small
'
extract from the high-level architects notebook. .
I To createceiling shapesinstruct the builder (ChristopherPaul Ewe?)
! as follows:
if (build-shapes
) then
[(space-shapes(3-foot intervals ,
(align-shapes(horizontal ,
186 Epilogue
(arrange-shapes(point-to -point ,
(locate-shapes(intersection-of-pillar-diagonals ].
Later she would turn her attention to the pillars, but that could wait.
When the time came, shefelt, somemore rules would do the trick. Shehad
an idea of one already. It went " If (locate-pillar) then (make-pillar (45 ,
star-shape . It was a bit rough, but it could no doubt be reAned. And of
course, there'd be lots more rules to discover. " I do hope," she laughed,
" that
ChristopherPaul Ewe is able to follow all this. He'll need to be a Ane
"
logical thinker to do so. One thought, however, kept on returning like a
bad subroutine. 'Why are things arrangedin just that way? Why not have
some star shapesspacedfurther apart? Why not have some in a circle
insteadof in line? Just think of all the counterfactualpossibilities. What an
"
unimaginativesoul the original architectmust have been after all.
'
Fortunately for our heroines career, this heresy was kept largely to
herself. The Society for the Examination and Reconstitution of Chapels
gave her large researchgrants and the project of building a duplicate
ceiling went on. At last a prototype was ready. It was not perfect, and the
light and shadowhad a subtly different feel to it. But perhapsthat was mere
supersition. C. P. Ewe had worked well and followed instructions to the
letter. The fruits of their labors were truly impressive.
One day, however, a strange and terrible thing happened. An earthquake
(unusual for the locale) devastated the original chapel. Amateur
video, miraculouslypreserved, records the event. The "high-level architect,
upon viewing the horror, was surprisedto notice that the star shapesfell
and smashedin perfect coincidencewith the sway and fall of neighbouring
" " '
pillars. How strange, she thought; 1 have obviously been missing a
certain underlying unity of structure here." The next day she added a
new rule to her already massivenotebooks: 'if (pillar-falls) then (make-fall
"" " "
(neighboring-star-shape)). Of course, the architectadmitted, sucha rule
'
is not easyfor the builder to follow . But it s nothing a motion sensorand
somedynamite can't handle."
212 Notes to pages 160- 185
Chapter9
1. Theactualstrudureof themodeliscomplicated
in various waysnotgennane
to present
concerns
. SeeRumelhart andMcClelland
1986for a fulla ount.
2. A trivial modelwouldbe onethat merelyuseda PDPsubstrateto implementaconventional
theory. But therearecomplications ; seesections.
here
3. Thisexampleis mentionedin Davies , forthcoming, 19.
4. Thanksto Martin Daviesfor suggestive conversations concerningtheseissues
.
s. I owe this suggestionto JimHunter.
6. This point wasmadein conversation by C. Peacocke .
7. Forexample , Sacl
<sreportsthecaseof Dr. P., a musicteacherwho, havinglost theholistic
ability to recognizefaCel', makesdo by recognizingdistinctivefacialfeaturesandusing
theseto identify individuals . Sackscommentsthat the processingthesepatientshave
intactis machinelike, by whichhemeanslike a conventional computermodel. As heputs
it:
dassicalneurology... has alwaysbeen mechanical .... Of coune, the brain is a
machineanda computer.. . , but our mentalprocess eswhichconstituteour beingand
our life, are not just abstractand mechanical [they] involve not just classifyingand
categorising , but continualjudging and feelingalso. If this is missing , we become
computer -like, as Dr. P. was.... By a sort of comicand awful analogy , our current
cognitiveneurologyandpsychologyresembles nothingsomuchaspoorDr. P.I (Sacks
1986, 18- 19)
Sacksadmonish escognitivesdencefor being" too abstractandcomputational ." But he
as well have said " too - "
might rigid, rule
-bound
, coarsegrained , andserial
.
Chapter 10
1. This is not to saythat the philosophers
who raisedthe worrieswiDagreethat they are
. They won't.
bestlocalizedin the way I go on to suggest
Epilogue
1. Thestoryisinspiredbytwosources
: theGouldandLewontin ofadaptationist
critique
thinking in chapter
, reported 4, andDouglas 'sbriefcomments
Hofstadter onoperating
systems , 641-642
(1985 ).
Appendix
BeyondEliminativism
1 A Distributed Argument
Levell , thenumericallevel
The most precisecharacterizationof the actual processingof a particular
connectionist network is mathematicalin nature. Suchnetworks, we saw,
consistof interconnectedunits. The connectionsare weighted and the units
are miniprocessors that receiveand passon activation accordingto mathematical
. Thus, the theorist can give a precisecharacterization
specifications
of the state of such a system at a particular time by stating a vector of
numericalvalues. Eachelementin the vector will correspondto the activation
value of a single unit. Likewise, it is possibleto specify the evolving
behavior of a system by an " activation-evolution equation." This is a differential
equation that fixes the dynamicsof the network. If, as is generally
the case, the network is set up to learn, then it will be necessaryto specify
the dynamics of its learning behavior. This is done by meansof another
differential equation, the IIconnection-evolution equation." Such specifications
give a completemathematicalpicture of the activation and processing
Beyond eliminativism 189
Level2, thesubsymboliclevel
For all that, however, Smolenskyseemsespeciallyfond of a slightly higher
level of analysisthat he calls the subconceptual(or subsymbolic) level. It is
at that level, and not at the numerical(or mathematical)" one, that we find
" "
the completeformal accountof cognition. He writes, Complete, formal
and precisedescriptions of the intuitive (i.e. connectionist) processorare
generally tractable not at the conceptual level, but only at the subcon-
" -
ceptuallevel (Smolensky1988, 6 7). But there is, in fact, no inconsistency
here. For Smolenskyviews the subsymbolicas just the semantic(microsemantic
) description of the syntactic (units and activation) profile of level
1. He is thus committed to the semanticinterpretability of the numerical
variablesspecifying unit activations. This interpretation takes the form of
specifying the subsymboliC (or microfeatural) content to which the unit
activation correspondsin the context of a particular activation vector.
" ' '
Hence, the name subsymbolicparadigm is intended to suggestcognitive
descriptionsbuilt up of constituentsof the symbols used in the symbolic
paradigm; these fine-grained constituentscould be called subsymbolsand
they are the activities of individual processingunits in connectionistnetworks
"
(Smolensky1988, 3).
The semanticshift from symbolic to subsymbolicspeci6cationis one of
the most important and distinctive featuresof the connectionistapproach
to cognitive modeling. It is also one of the most problematic. One immediate
questionconcernsthe nature of a subsymbol. The level of description
at issueis clearly me~ t to be a level that ascribescontent, a level that
rather preciselyinterpretsthe numericalspeci6cationof an activation vector
by associatingthe activation of eachunit with a content. In an activation
vector that amountsto a distributed representationof coffee, we saw how
the activation of a single unit may representsuch featuresas hot liquid,
burnt odor, and so on. Such examplesmake it seemas if a subsymbolic
feature (or microfeature) is just a partial description, in ordinary-language
terms, of the top-level entity in question (coffee). This is certainly the case
190 Appendix
fact renders each individual unit pretty well expendable, since its near
neighborswill do almostthe samejob in generatingpatterns of activation.
And it is this samefact that allows suchsystemsto generalize( by grouping
the semanticallycommon parts of various items of knowledge), to extract
prototypes, and so on. Classicalrepresentationdoes not involve any such
built-in notion of semanticmetric.
Distributed (i.e., microfeatural) representationswith a built -in semantic
metric are also responsiblefor the context dependenceof connectionist
representationsof concepts. Recall that in what I am calling pure distributed
connectionismthere are no units that representclassicalconceptual-
level features, suchascoffee. Instead, coffeeis representedas a set of active
microfeatures.The point about context dependenceis that this set will vary
" "
according to the surrounding context. For example, coffee in cup may
involve a distributed representationof coffeethat includescontacting pro-
"
celain as a microfeature. But coffee in jar" would not. Conceptual-level
" "
entities (or symbols, to fall in with a misleadingtenninology) thus have
no stable and recurrent analogueas a set of unit activations. Instead, the
unit-activation vector will vary according to the context in which the
symbol occurred. This, we saw, is an important feature (though at timesit
may be a positive defect). It is directly responsiblefor the oft-cited fluidity
of connectionistrepresentationand reasoning.
If it is not the dimension shift in itself so much as the dimensionshift in
conjunctionwith a built -in semanticmetric that is the crucialfact in connec-
tionist processing , then a question arisesabout the status of the subsymbolic
level of description. For such descriptions seemedto involve just
listing a set of microfeaturescorresponding to an activation vector. But
such a listing leaves out all the facts of the place of each feature in the
general metric embodied by the network. And these facts seemto be of
great semanticsignificance. What a microfeature meansis not separable
from its place in relation to all the other representationsthe system embodies
. For this reason, I would dispute the claim that subsymbolicdescription
( least, if it is just a listing of microfeatures) affords an accurate
at
interpretationof the full numericalspecificationsavailablein level ! . Perhaps
the resourcesof natural language(however cannily deployed) are in principle
incapableof yielding an accurateinterpretation of an activation vector.
At first sight, sucha concessionmay seemto give the eliminativist an easy
victory . Fortunately, this impressionis wrong, aswe shall seein due course.
Level3, clusteranalysis
'
This level of analysisdoes not appearin SmolenskyStreatment. Rather, it
occursaspart of the methodology developedby Rosenbergand Sejnowski
for the analysisof NETtaik. (Foran accountof NETtaik though not, alas, of
cluster analysis, see Sejnowskiand Rosenberg1986.) I include it here for
192 Appendix
tion its hidden unit spacemore subtly (in fact, into a distinctive pattern for
eachof 79 possibleletter to phonemepairings). Cluster analysisas carried
out by Rosenbergand Sejnowskiin effect constructsa hierarchy of partitions
on top of this baselevel of 79 distinctive stable patterns of hidden-
unit activation. The hierarchy is constructed by taking each of the 79
patternsand pairing it with its closestneighbor, i.e., with the pattern that
has most in common with it . Thesepairings act as the building blocks for
the next stage of analysis. In this stage an average activation profile
between the "membersof the original pair is calculatedand paired with its
nearestneighbor drawn from the pool of secondaryfigures generatedby
averagingeachof the original pairs. The processis repeateduntil the final
pair is generated. This representsthe grossestdivision of the hidden-unit
spacethat the network learned, a division that, in the caseof NETtalk,
turned out to correspondto the division between vowels and consonants.
Cluster analysisthus provides a picture of the shapeof the spaceof the
'
possiblehidden-unit activationsthat power the network s performance.
A few comments. First, it is clearthat the clusteringslearnedby NETtalk
( g ., the vowel and consonantclusteringsat the top level) do not involve
e.
novel, unheard of subsymbolic features. This may be due in part to the
'
systems relianceon input and output representationsthat reflect the classical
theory. Even so, the metric of similarity built into the final set of
weights still offers someclear advantagesover a classicalimplementation.
Suchadvantageswill include generalizationvariousforms of robustness ,
and gracefuldegradation.
For our purposes , the most interestingquestionsconcernthe statusof the
'
cluster-theoretic description. Is it an accuratedescription of the systems
processing1One prominent eliminativist, Churchland (1989), answersfirmly
in the negative. Cluster analysis, he argues, is just another approximate,
'
high-level descriptionof the systems grossbehavior (seemy commentson
levels 4 and 5) and doesnot yield an accuratedescriptionof its processing.
The reasonfor this is illuminating. It is that the systemitself knows nothing
about its own clustering profile, and that profile does not figure in the
statement of the formal laws that govern its behavior (the activation-
evolution and connection-evolution equations of level 1). Thus, Church-
"
land notes, the learning algorithm that drives the systemto new points in
weight spacedoes not careabout the relatively global partitions that have
beenmade in activation space. All it caresabout are the individual weights
and how they relate to apprehendederror. The laws of cognitive evolution,
therefore, do not operate primarily at the level of the partitions. . . . The
"
level of the partitions certainly correspondsmore closely to the conceptual
"
level . . . , but the point is that this seemsnot to be the most important
dynamicallevel" (1989, 25).
194 Appendix
But if you give the system an ill -posed problem or arti8cially curtail its
" "
processingtime, it still gives what Smolenskycalls sensibleperformance,
This is explainedby the underlying subsymbolicnature of its processing,
which will always satisfy as many soft constraintsas it can, even if given
limited time and degradedinput. The moral of all this, asSmolenskyseesit ,
is that the theorist may analyzethe systemat the higher level of, e.g., a set
of production rules. This level will capture some facts about its behavior.
But in less ideal circumstancesthe system will also exhibit other behavior
that is explicable only by describing it at a lower level. Thus, the unified
account of cognition lies at one of the lower levels (level 2 or level I ,
accordingto your preference). Hence the famousanalogy with Newtonian
mechanics . Symbolic AI describescognitive behavior, much as Newtonian
mechanicsdescribesphysical behavior. They eachoffer a useful and accurate
account in a circumscribeddomain. But the uni8ed account lies.elsewhere
(in quantum theory in physics, and in connectionismin cognitive
science). Thus, commenting on the model for solving circuitry problems,
"
Smolenskynotes, A system that has, at the micro-level, soft constraints
satis8ed in parallel, appearsat the macro-level, under the right circumstances
'
to have hard constraints, satis8ed serially. But it doesnt really,
" " '
and if you go outside the Newtonian domain you see that it s really
" "Newtonian "
been a quantum system all along (1988, 20). Such analyses
are concededto be useful in that they may help describe interrelations
betweencomplex patterns of activity that approximatevarious conceptual
constructs in which the theorist is interested. As Smolenskypoints out
"
(1988, 6), such interactions will not be directly described"by the formal
"
definition of a subsymbolicmodel ; instead, they must be computed by
the analyst."
vidual words used in a belief ascription will not have discrete, recurrent
"
analoguesin the actual processingof the" system. Thus, the word chair"
will not have a discrete analogue, since chair" will be representedas an
activation vector acrossa set of units that stand for subsymbolicmicrofeatures
, and it will not have a single recurrent analogue(not even as an
activation vector), since the units that participateand the degreeto which
they participatewill vary Horn context to context.
The radical eliminativist takes these facts and conjoins them with a
condition of causalefficacy, which states: a psychologicalascription is only
warranted if the items it posits have direct analoguesin the production (or
'
possibleproduction) of behavior. Thus ascribingthe belief that cows cant
fly to John is justified only if there is some state in John in which we can
in principle identify a discrete, interpretablesubstatewith the meaning of
" "
cow," fly ," and so on. Since, according to connectionism, there are no
such discrete, recurrent substates , the radical eliminativist concludesthat
commonsensepsychology is mistaken and does not afford an accurate
higher-level description of the systemin question(John). This is not to say
that suchdescriptionsare dispensablein practice; it is to say only that they
are mistakenin principle.
In the next section I shall sketch an account of explanation that dissociates
the power and accuracy of higher-level descriptions Horn the
condition of causal efficacy, which thereby gives a more liberal, more
plausible, and more useful picture of explanation in cognitive scienceand
daily life.
3 ExplanationRevisited
The eliminativist arguesher caseas follows.
Step1. Supposethat pure distributed connectionismoffers a correct
accountof cognition.
Step 2. It follows that there will be no discrete, recurrent, in-the-
head analogues to the conceptual-level terms that ftgure in folk-
psychologicalbelief ascription.
Step3. Henceby the condition of causalefficacy, suchascriptionsare
not warranted, since they have no in-the-head counterpart in the
causalchainsleading to action.
Step 4. Hence, the causal explanations given in ordinary terms of
beliefs and desires(e.g., " She went out becauseshe believed it was
"
snowing ) are technically mistaken.
My claim will be that even if pure distributed connectionismoffers a
corred and (in a way) completeaccountof cognition, the eliminativist con-
Beyond eliminativism 197
' ' '
clusion (step 4) doesnt follow . It doesnt follow for the simple reasonthat
good causalexplanation in psychology is not subject to the condition of
causalefficacy. Likewise, even if pure distributed connectionismis true, it
does not follow that the stories told by symbolic AI are mere approximations
. Instead, I shall argue, these various vocabularies(e.g., of folk-
psychology and of symbolic AI ) are gearedaccuratelyto capturelegitimate
and psychologically interesting equivalenceclasses , which would be invisible
if we restricted ourselves to subsymbolic levels of description. In a
'
sense , then, I shall be offering a version of Dennett s well-known position
on folk-psychological explanation but extending it , in what seemsto me
to be a very natural way, to include'the constructs of symbolic AI (e.g .,
schemata , productions.) If I am right , it will follow that many defenders
of symbolic AI and folk psychology (especiallyFodor and Pylyshyn) are
effectively shooting themselvesin the feet. For the defencesthey attempt
make the condition of causalefficacy pivotal, and they try to argue for
neat, in-the-headcorrelatesto symbolic descriptions(see, e.g ., Fodor 1987;
Fodor and Pylyshyn 1988). This is accepting terms of engagementthat
surely favor the eliminativist and that, as we shall see, makenonsenseof a
vast number of perfectly legitimate explanatory constructs.
What we need, then, is a notion of causalexplanationwithout causal
efficacy . I tried for such a notion in dark , forthcoming. But a superior case
has since been made by Frank Jacksonand Philip Pettit, so I begin by
drawing on their" account. Jacksonand Pettit ask the readerto considerthe
following case. Electrons A and B are acted on by independentforces FA
and FBrespectively, and electron A then acceleratesat the same rate as
electron B. The explanation of this fact is that the magnitude of the two
forces is the same.. . . But this samenessin magnitude is quite invisible to
A . . . . This sameness doesnot makeA move off more or lessbriskly" (1988,
- '
392 393). Or again, We may explain the conductor's annoyanceat a
concert by the fact that someonecoughed. What will have actually caused
the conductor's annoyancewill be the coughing of someparticularperson,
Fred, say" (Jacksonand Pettit 1988, 394). This is a nice case. For suppose
" "
someone , in the interestsof accuracy, insistedthat the proper (fully causal)
'
explanationof the conductor's"annoyancewas" in fact Freds coughing. There
is a good sensein which their more accurate explanationwould in fact be
" "
lesspowerful. For the explanationwhich uses someone has the advantage
"
of making it clear that any of a whole range of membersof the audience
coughing would have causedannoyancein the conductor" (Jacksonand
Pettit 1988, 395). This increasein generality, bought at the cost of sacri6cing
the citation of the actual entity implicated in the particular causalchain in
question, constitutes (I want to say) an explanatory virtue, and it legitimizes
a whole range of causalexplanationsthat fail to meet the condition
198 Appendix
4 TheValueof High-LevelDescriph
~
Consider once again the various higher-level analysesof pure distributed
connectionistsystems. The Arst of thesewas clusteranalysis. Clusteranalysis
, recall, involved charting the hierarchy of divisions (or partitions) that
the network had learned to make using its hidden units. Recall also the
attitude of at least one leading eliminativist, Paul Churchland, to such a
level of analysis. It was that when undergoing conceptual change (e.g.,
learning) the system would behavein ways not responsiveto the various
partitionings (which the systemdoes not really know about), but it would
behavein ways responsiveto the actualconnectionweights (which it does
really know about). '
This point is well taken asfar asit goes. (It is like saying, if you want to
'd better know the values
predict the actualaccelerationof electron A, you
of the forcesacting on it and not just that they are the sameas thoseacting
"
on electron B. ) But it would be a grave mistaketo assumethat this point
shows that the level of analysisadopted by the cluster analyst is inferior,
approximate, unnecessary , or downright mistaken. For an analysis that
cites partitionings, like one that cites the samenessof the forces acting
on the electrons, may likewise have virtues that other analysescannot
reach. For example, it is an important fact about cluster analysis (a fact
recognizedby Churchland [1989, 24]) that networks that have come to
embody different connection weights may have identical cluster analyses.
Thus, Sejnowskinotes that versions of NETtalk that begin with different
random distributions of weightings on the hidden units will , after training,
makethe samepartitions but by meansof different arrangementsof weights
on the individual connections. Now considera particularcognitive domain,
'
say, converting text to phonemes. Isn t it a legitimate psychological fact
that only certain systems can success fully negotiate that domain? And
don't we want some level of properly psychological, or cognitive, explanation
with the meansto group such systemstogether and to make some
generalizationsabout them (e.g., that suchsystemswill be prone to certain
illusions)? Cluster analysisis the very tool we need, it seems.Anyone of a
whole range of networks, we can say, will be able to negotiate that
cognitive domain. And we can give an account that specifieswhat networks
belong in that range (or in the equivalenceclass in question) by
requiring that they have a certain cluster analysis. In the terminology '
introduced in section3, the clusteranalysiscausallyprogramsthe systems
successfulperformance,but it is not part of any processexplanation.
Let us now move up another level to the descriptionsoffered by symbolic
AI . Suppose, for the sake of argwnent, that we describeNETtalk
at this level as a discrimination tree, or better, as a production system
with one production for each conversion of text to phoneme that it has
200 Appendix
learned. We have clearly lost explanatory power for explaining the per-
fonnance of an individual network. For as we saw, the network will per-
fonn well with degraded infonnation in a way that cannot be explained
by casting it as a standard symbolic AI system. But as with the cluster
analysis, we gain something else. For we can now deAne an even wider
equivalenceclassthat is still, I suggest, of genuine psychologicalinterest.
Membershipof this new, wider classrequiresonly that the systembehave
in the ways in which the pure production system would behave in some
central classof cases. The production-system model would thus act as an
anchor, dictating membershipof an equivalenceclass, just as the cluster
analysisdid in the previous example. And the benefitswould be the same
too. Supposethere turns out to be a lot of systems(some connectionist,
someclassical , someof kinds still undreamedof) all of which nonaccidentally
approximatethe behavior of the pure production system in a given range
of cases . They are all united by being able to convert text to phonemes. If
we seeksome principled and informative way of grouping them together
(i.e., not a bare disjunction of systemscapableof doing suchand such), we
may have no choicebut to appealto their sharedcapacity to approximate
the behavior of such and such a paradigmaticsystem. We can then plot
how eachsystemmanages , in its different way, to approximateeachseparate
production. Likewise, there may be a variety of systems (some con-
nectionist, some not ) capable of supporting knowledge of prototypical
situations. The symbolic AI construct of a schemaor frame may help us
understandin detail, beyond the gross behavior, what all these systems
have in common (e.g ., somekind of content addressability, default assignment
, override capacity, and so forth ). In short, we may view the constructs
of symbolic AI , not as mere approximationsto the connectionistcognitive
truth, but as a means of highlighting a higher level of unity between
otherwise disparite groups of cognitive systems. Thus, the fact that a
connectionistsystemIl and somearchitecturallynovel systemof the future
b are both able to do commonsensereasoningmay be explainedby saying
that the fact that Il and b eachapproximatea classicalscript or frame-based
system causally programs their capacity to do commonsensereasoning.
And this meansthat a legitimate higher-level sharedproperty of Il and b is
invisible at the level of a subsymbolicanalysisof Il. This is not to say, of
course, that the subsymbolicanalysis is misguided. Rather, it is to claim
that that analysis, though necessaryfor many purposes, does not render
higher levels of analysisdefunct or of only heuristic value.
Finally, let us move on to full -Aedgedfolk-psychological talk. On the
presentanalysis, such talk emergesas just one more layer in rings of evermore
explanatory virtue. The position is beautifully "illustrated by Daniel
"
Dennett' s left-handersthought experiment. " Suppose, Dennett says, that
the sub-personalcognitive psychology of somepeople turns out to be dra-
Beyond eliminativism 201
"
matically different horn that of others. For example, two peoplemay have
very different sets of connection weights mediating their conversionsof
text to phonemes. More radically still, it could be that left-handedpeople
have one kind of cognitive architectureand right -handed people another.
For all that, Dennett points out, we would never concludeon thosegrounds
alone that left-handers, say, are incapableof believing.
Let left- and right -handersbe as internally different as you like, we
already know that there are reliable, robust patterns in which all
behaviourallynonnal peopleparticipate- the patternswe traditionally
describe in terms of belief and desire and the other terms of folk
psychology. What spreadaround the world on July 20th, 19691The
belief that a man had steppedon the moon. In no two people was the
effect of the receipt of that information the same .. . , but the claim
that therefore they all had nothing in common . . . is false, and obviously
so. There are indefinitely many ways one could reliably distinguish
thosewith the belief Horn thosewithout it. (Dennett 1987, 235)
In other words, even if there is no single internal state (say, a sentencein
the languageof thought) common to all those who are said to believe that
so and so, it doesnot follow that belief is an explanatorily empty construct.
The samenessof the forces acting on the two electrons is itself causally
inefficaciousbut nonethelessfigures in a useful and irreducible mode of
explanation (program explanation), which highlights facts about the range
of actualforcesthat can producea certainresult (identical acceleration ). Just
so the posit of the sharedbelief highlights facts about a range of internal
cognitive constitutions that have somecommon implicationsat the level of
grossbehavior. This grouping of apparentlydisparateplaysicalmechanistns
into classesthat reAect our particular interests is at the very heart of the
scientific endeavor. To supposethat the terms and constructs proper to
such program explanations are somehow inferior or dispensableis to
embracea picture of scienceas an endlessand disjoint investigation of
individual causalmechanisms .
There is, of course, genuine questionabout what constructsbest serve
a
our needs. The belief construct must earn its keep by grouping together
.
creatureswhose gross behaviors really do have something important in
common (e.g., all those likely to harm me becausethey believeI am a
predator). In recognizing the value and statusof program explanationsI am
emphaticallynot allowing that anything goes. My goal is simply to counter
the unrealistic and counterproductive austerity of a model of explanation
"
that limits " real explanations to those that cite causallyefficaciousfeatures.
The eliminativistargument, it seems , dependscruciallyon a kind of austerity
that the explanatory economy can ill afford.
202 Appendix
5 Self-Monitoring Connectionist
Systems
The previous four sectionshave, I hope, establishedthat even if pure distributed
connectionismconstitutes a complete and accurateformal model
of cognition (as Smolensky claims), it does not follow that higher-level
analyses(like cluster analysis, symbolic AI , and folk psychology) are misguided
, mistaken, or even mereapproximations. Instead, they may be accurate
and powerful groupingexplanationsof the kind examinedin sections3
and 4.
In this more speculativesectionI want to use the samekind of observations
to cast some doubt on the idea that pure distributed connectionism
constitutesa complete and accurateformal model of cognition. There is a
very natural developmenthere, sincethe virtues of grouping explanations
as third-person, theorist's constructs have analoguesin first-person processing
. In short, there may be pressureon individual cognizersto monitor
and group their own internal states in the samegeneral way as there is
'
pressurefor program explanationsthat group other peoples internal states.
If this is the case, then parts of our internal cognitive economy may begin
to look distinctly classicaland symbolic. Without presuming to decide
what is clearly an empirical issueabout cognitive architecture, this section
aims to depict the kinds of pressurethat might make sucha mixed cognitive
economy attractive.
Pure distributed connectionisminsists that discretesymbolic constructs
" " " "
(e.g ., dog, office ) exist only as instruments of interpersonalcommunication
in a public languageand as constructsof higher-level, third-person
analysesof shifting, 8uid activation vectors. Let us bracket for now the
questionof public language. All theorists agreethat we use and processat
somelevel the symbolic entities of public discourse(words). My questionis
whether such symbolic entities (discrete recurrent items and conceptual
semantics ) have any role to play in individual cognition beyond whatever
is necessaryto produce and interpret language. The pure distributed con-
nectionist thinks not. Sheagrees, with Smolensky, that suchentities at best
emergeat a higher level of analysisof what are through and through subsymbolic
(see, e.g ., Smolensky (1988, 17). That is to say, such
" " systems
entities are visible only to the ext~rnaI theorist and do not figure in the
'
systems own inner workings. All the systemknows about, on this account,
are its manifold activation vectors. The rest are theorist' s fictions, a useful
and (accordingto our earlierargwnents) even indispensableaid to grouping
systemsinto equivalence classes, but not a featureof individual processing
'
(recall Churchlands comment about cluster analysis revealing partitions
that the systemitself knows nothing about).
Pure distributed connectionism , we saw in chapter 9, is heir to some
interesting problems. For it seems that systems lacking the distinctive
Beyond eliminativism 203
6 A DoubleE" or
The eliminativist, I have argued, makesa double error. First, her conditional
argumentis flawed. Evenif pure distributed connectionismwerea complete,
fonnal accountof individual processing , it would not follow that the higher-
level constructsof symbolic AI and folk psychology were inaccurate , misguided
. Instead, such constructsmay be the essentialand
, or dispensable
accurategrouping principles for explanationswhich prescind from causal
'
processto causalprogram. The eliminativist s conditional argument was
seento rest on an insupportablecondition of causalefficacy, a condition
that, if applied, would rob us of a whole range of perfectly intelligible and
legitimate explanationsboth in cognitive science and in daily life.
'
Second, the antecedentof the eliminativist s conditional is itself called
into doubt by the power and usefulness of higher-level articulations of
processing. Such articulations, I argued, may do in the first person what
program explanationsdo in the third. That is, they may enablethe system
to observeand mark the common contributions of a variety of activation
vectors. Such marking would provide the system with the resourcesto
reflect on (and henceimprove and debug) its own basicprocessingstrategies
. In this way, entities that the pure distributed connectionistviews as
-
emergentat a higher level of externalanalysis( high level clusterings, types,
208 Appendix
Chapter2
1. Thanks
toLesley
Benjamin theshampoo
forspotting andforsome
example stimulating
about
conversation involved
theproblems inparsing
it.
210 Notes to pages28- 103
Chapter 3
1. At a minimum , the eliminativematerialistmust believethat this is the primarypoint
of the practice. In conversationPaulChurch land hasacceptedthat folk-psychological
talk servesa varietyof otherpurposesalso, e.g., to praise , to blame, to encourage , and
so on. But, he rightly says, so did witch talk. That in itselfis not sufficientto savethe
Notesto pages111- 155 211
Chapter 6
1. A relatedissuehereconcernsour capacityto changeverbsinto nounsand vice versa.
Thusnewly coineduseslike "Shewantsto thatcherthe organization " or "Don' t crack
"
wisewith me are easilyundentood. This might be partiallyexplainedby supposing
that the verb/noundistinctionis (at best) onemicrofeature amongmanyandthat other
factors , canforcea changein this featureassignment
, like positionin the sentence while
leavingmuch of the semanticsintact. (This phenomenonis dealt with at length in
Benjamin , forthcoming).
2. Many of the ideasin this section(includingthe locution"an equivalence -classof algorithms
" are
) developedout of conversations with BarrySmith . He is not to blame , of
course , for the particularviewsI advance.
Chapter 7
1. See, e.g., Hinton 1984.
2. Lecturegivento the BritishPsychological , April 1987.
Society
3. Thischapterowesmuchto Hofstadter(1985), whosesuggestive comments havedoubtless
shapedmy thoughtin morewaysthanI amaware . Themaindifference , is
, I suspect
that I amkinderto the classical .
symbolicaCcounts
Chapter 8
1. As ever, this kind of claimmustbe readcarefullyto avoidChurch -Turingobjections .A
taskwill countasbeyondthe explanatoryreachof a POPmodeljust in caseits performance
requiresthat to carry it out, the POPsystemmust simulatea differentkind of
processing (e.g., that of a Von Neumannmachine ).
2. He allowsthat intentionalrealismcanbe upheldwithout acceptingthe LOT story (see
Fodor1987, 137). ButasI suggestin section3, hedoesseemto believethatsome fonn of
physicalcausation is necessaryfor the truth of intentionalrealism. This, I argue,is a very
dangerous assumption to make.
3. A parsingtreeis a datastructurethat splitsa sentence up into its partsandassociates
thosepartswith grammatical categories. Forexample , a parsingtreewould separate the
" "
sentence"Fodor exists" into two components"Fodor" and exists and associatea
grammatical labelwith each.Theselabelscanbecomehighly complex . But the standard
simpleillustrationis
s
...~
NP VP
I I
Fodor exists
212 Notes to pages 160- 185
Chapter9
1. Theactualstrudureof themodeliscomplicated
in various waysnotgennane
to present
concerns
. SeeRumelhart andMcClelland
1986for a fulla ount.
2. A trivial modelwouldbe onethat merelyuseda PDPsubstrateto implementaconventional
theory. But therearecomplications ; seesections.
here
3. Thisexampleis mentionedin Davies , forthcoming, 19.
4. Thanksto Martin Daviesfor suggestive conversations concerningtheseissues
.
s. I owe this suggestionto JimHunter.
6. This point wasmadein conversation by C. Peacocke .
7. Forexample , Sacl
<sreportsthecaseof Dr. P., a musicteacherwho, havinglost theholistic
ability to recognizefaCel', makesdo by recognizingdistinctivefacialfeaturesandusing
theseto identify individuals . Sackscommentsthat the processingthesepatientshave
intactis machinelike, by whichhemeanslike a conventional computermodel. As heputs
it:
dassicalneurology... has alwaysbeen mechanical .... Of coune, the brain is a
machineanda computer.. . , but our mentalprocess eswhichconstituteour beingand
our life, are not just abstractand mechanical [they] involve not just classifyingand
categorising , but continualjudging and feelingalso. If this is missing , we become
computer -like, as Dr. P. was.... By a sort of comicand awful analogy , our current
cognitiveneurologyandpsychologyresembles nothingsomuchaspoorDr. P.I (Sacks
1986, 18- 19)
Sacksadmonish escognitivesdencefor being" too abstractandcomputational ." But he
as well have said " too - "
might rigid, rule
-bound
, coarsegrained , andserial
.
Chapter 10
1. This is not to saythat the philosophers
who raisedthe worrieswiDagreethat they are
. They won't.
bestlocalizedin the way I go on to suggest
Epilogue
1. Thestoryisinspiredbytwosources
: theGouldandLewontin ofadaptationist
critique
thinking in chapter
, reported 4, andDouglas 'sbriefcomments
Hofstadter onoperating
systems , 641-642
(1985 ).
Bibliography
, H. 1975a
Putnam . The meaningof "meaning ." In H. Putnam, Mind, Llmguage , andReality,
pp. 215 - 271. CambridgeCambridgeUniversityPress
: .
, H. 1975b. Philosophyand our mentallife. In H. Putnam
Putnam , Mind, Language and
, pp. 291- 303. Cambridge
Reality : CambridgeUniversityPress .
, H. 1981. Reductionism
Putnam and the natureof psychology . In J. Haugeland, ed., Mind
Design , pp. 205- 219. Cambridge : MIT Press .
Pylyshyn , Z. 1986 . Comput Rtion RndCognition . Cambridge : MIT Press .
Ridley , M. 1985 . TheProblems of Evolution . Oxford:OxfordUniversity Press .
Ritchie , G., andHanna , F. 1984 . AM: A casestudyin AI methodology . Artifit:illllnteUigence
23: 249- 268.
Robbins , A Unpublished . Representing typeandcategory in PDP.Draftdoctoraldissertation
. University of Sussex .
Rosenblatt , F. 1962 . Principles ofNeurod ,Vn Rmics . NewYork: Spartan Books .
Rumelhart , Hinton , G., andWilliams , R. Learning internal representations by errorpropagation
. In Rumelhart , McClelland , andthePOPResearm Group , PRrRllelDistributed Processing
: &: plorRtions in theMicrostructure of Cognition , voL1, pp. 318- 362. Cambridge :
MIT Press .
Rumelhart , D., andMcClelland , J. 1986 . On learningthepasttenses of Englishverbs . In
J. McCleUand , D. Rumelhart , andthePOPResearch Group , PRrRllelDistributed Processing
: &: plorRtions in theMicrostructure of Cognition , voL2, pp. 216- 271. Cambridge :
MIT Press .
Rumelhart , D., andMcClelland , J. 1986 . POPmodels andgeneral issues in cognitive science .
In D. Rumelhart , J. McClelland , andthePOPResearch Group , PIlrRUel Distributed Pr0-
cessing : &: plorRtions in theMicrostructure of Cognition , voL1, pp. 110- 146.Cambridge :
MIT Press .
Rumelhart , D., Mclelland , J., andthePOPResearch Group , 1986 . PRrRllelDistributed Processing
: &: plorRtions in theMicrostructure of Cognition , voL1, Cambridge : MIT Press .
Rumelhart , D., andNormanD. 1982 . Simulating a skilledtypist: A studyin skilledmotor
performance . Cognitive Science 6: 1- 36.
Rumelhart , D., Smolensky , P., McCleUand , J., andHinton , G. 1986 . Schemata andsequential
thoughtprocess esin POPmodels . In J. McCleUand , D. Rumelhart , andthePOPResearch
Group , PRrRllelDistributed Processing : &: plorRtions in theMicrostructure of Cognition
, voL2, pp. 7- 58. Cambridge : MIT Press .
Rutkowska . J. 1984 . Explaining infantperception : Insights Hornartificial intelligence . Cognitive
studies research paper005. University of Sussex .
Rutkowska . J. 1986 . Developmental psymology ' s contributionto cognitivescience .
In KS. Gill, ed. Artifit:illllnteUigence for Society , pp. 79- 97. Chichester , Sussex : John
Wiley.
Ryle,G. 1949 . TheConcept ofMind. London : Hutchinson .
Sacks, O. 1986 . TheManWhoMistook HisWifeforRBRt.London : Picador .
Schank , R., andAbelson , R. 1977 . Scripts , PIRns , GORis , RndUnderst Rnding . Hillsdale , N.j.:
Lawrence Erlbaum Associates .
Schilcher , C., andTennant , N. 1984 . Philosophy , Evolution , RndHumlmNRture . London :
Routledge andKegan Paul
Schreter , Z., andMaurer , R. 1986 . Sensorimotor spatiallearningin connectionist artiftcial
organisms . Research abstract FPSE . University of Geneva .
Searle, J. 1969 . Speech Acts : An ~ y in thePhilosophy of LAng URge . Cambridge : Cambridge
University Press .
Searle, J. 1980 . Minds , brains , andprograms . Reprinted in J. Haugeland , ed., MindDesign ,
pp. 282- 307.Cambridge : MIT Press , 1981 .
Searle, J. 1983 . lntmtionillity . Cambridge : Cambridge University Press .
Bibliography 219
SearJe,J. 1984. IntentionaJity andits placein nature.SynthtSt61: 3- 16.
Sejnowski , T., and Rosenberg , C. 1986. NETtaJk: A parallelnetwork that learnsto read
aloud. JohnHopkinsUnivenity TechnicalReportJHU/EEC-86/01.
Shortlife , E. 1976. Computer Based MedicalConsult aHons : MYCIN New York: Elsevier .
Simon,H. 1962. The architecture of complexity.Reprintedin H. Simon , ed., TheSciences of
theArtificial. Cambridge : CambridgeUnivenity Press , 1969.
Simon , H. 1979. Artificial intelligenceresearchstrategiesin the light of AI modelsof
scientificdiscovery . Proceedings of the SixthInternaHonalJoint Conference on Artificial
Intelligence 2: 1086 - 1094 .
Simon , H. 1980. Cognitivescience : The newestscienceof the artificial. Cogni HveScience 4,
no. 2: 33- 46.
Simon , H. 1987. A psychologicaltheory of scientificdiscovery . Paperpresentedat the
annualconference of the BritishPsychological Society.Univenity of Sussex .
Sloman , A 1984. Thestructureof the spaceof possibleminds. In S. Torrance , ed., TheMind
andTheMachine . Sussex : EllisHorwood.
Smart , J. 1959. Sensations andbrainprOcesses . Philosophical Review 68: 141- 156.
Smith, M. 1984. The; evolutionof animalintelligence . In C. Hookway, ed., Minds, Machines
andEvolution . Cambridge : CambridgeUnivenity Press .
SmoJensky , P. 1986. Informationprocessing in dynamicalsystems : Foundations of hannony
theory. In O. Rumelhart , J. McClelland , andthe POPResearch Group. ParallelDistributed
Processing: Explora Honsin the Microstn4dure of Cognition , vol. I , pp. 194- 281.
Cambridge : MIT Press .
Smolensky , P. 1987. Connectionist AI, andthe brain. ArtificialIntelligence Review1: 95- 109.
Smolensky , P. 1988 On the proper treatment of connectionism . Behavioural and Brain
Sciences 11: 1- 74.
Sterelny , K. 1985. Reviewof StichFromFolkPsychology to CognitiveScience . Australasian
Journalof Philosophy 63, no. 4: 510- 520.
, S. 1971. What everyspeakerknows. Philosophical
Stich Review80: 476- 496.
Stich, S. 1972. Grammar , psychology , andindeterminacy . Reprintedin N. Block , ed., Readings
in Philosophy of Psychology , vol. 2, pp. 208- 222. London: MethuenandCo., 1980.
, S. 1983. FromFolkPsychology
Stich to Cognitive Science . Cambridge : MIT Press .
Tannenbaum , A. 1976. Stn4dured Computer Organization . EnglewoodCliffs, N.J.: Prentice -
Hall.
Tennant , N. 19848 . Intention aJity, syntacticstructure , and the evolutionof language . In
C. Hookway, ed., Minds, Machines , andEvolution . Cambridge : CambridgeUnivenity
Press .
Tennant , N., and Schilcher , C. 1984. Philosophy , Evolution , and HumanNature . London:
Routledge Kegan and Paw .
Tennant , N. 1987. Philosophyandbiology: Mutualenrichmentor one-sidedencroachment .
Lanuovacritica 1- 2: 39- 55.
Thagard , P. 1986. Parallelcomputationand the mind-body problem. CognitiveScience 10:
301- 318.
Torrance , S. 1984. Philosophyand AI: Someissues . Introductionto S. Torrance , ed., The
Mind andtheMachine , pp. 11- 28. Sussex : EllisHorwood.
Turing, A 1937. On computablenumbeDwith an applicationto the Entscheidungs problem
. Proceedings of theLondon Mathematical Society 42: 230- 265.
Turing, A 1950. Computingmachineryandintelligence . Mind 59: 433- 460.
VanFraasen . B. 1980. TheScientific Image . Oxford: Oxford Univenity Press .
Vogel S. 1981. Behaviourandthe physicalworld of an animal.In P. BatesonandP. Klopfer,
eds., Pmpectives in Ethology , vol. 4. New York: PlenumPress .
Walker, S. 1983. AnimalThought . London: RoutledgeandKeganPaw.
220 Bibliography
, M., 68- 69
Ridley Spreading activation , 86- 92
Robbins, A , 203 SPSShypothesis , 12- 13, 32, 135. Setalso
Rooms example , 94- 96 Physical -symbol-systemhypothesis
Rosenberg , c., 191- 193 Sterelny , K., 155- 156
Rosenblatt , F., 85 , 5., 37, 39- 41, 155- 156
Stich
Rules STStheory. SeeSemantic transparency
context-free,27 Subconceptuaileve L 189- 191. Setalso
explicitversus implicit, 20- 21, 127 Connectionism
input-output , 33- 34 Subpenonal cognitivepsychology , 147
in PDPsystems , 99- 100,112,115- 118, Subsymbolic paradigm . 111- 114, 175, 188.
120- 121,132,162- 165 SeealsoConnectionism
Rumelhart , D., 84, 88, 92- 104, 115, Surfacedyslexia , 168
130- 131,160- 163,167203 - 204 Swimbladder , 69
Rutkowska , J., 64- 65 Symbiosis , 69- 70
SymbolicAI, 194- 195
Schank . R., 25, 30- 31, 92- 93 Symbolicatoms, 12
Schemata , 92- 96. SeealsoFrame -based Symbolicparadigm , 112- 113, 175
reasoning Symbolicreasoning , 131- 136
Scientificcreativity, 13- 17. Seealso Symbolprocessing , 2, 9, 11- 13, 17- 21,
BACON 112, 131- 137. SeealsoCognitivism ;
Scientificessence , 11 Representations
Scripts . SeeFrame -basedreasoning Syntax.SeeRepresentations
Searle, J., 30- 34, 55, 135, 154, 178 Systematicity argument , 144- 150
Sejnowski , T., 191- 193, 199
SeIf-monitoring,202- 207 Tacitrules, 20- 21. SeeRisoRules
Semantically specificdeficits , 151 Taskanalysis , 18, 159
Semantic metric, 190- 191 Tasks, 15, 17
Semantic transparency , 2, 17- 21, 35, 105, Technological AI, 153
107, 111- 120. SeealsoRepresentations ; Tennant , N., 51
Cognitivism ; Connectionism Thagrad , P., 129
Sentence -processing models , 107- 111 Thought. SeeRisoFolkpsychology ;
Sententialism , 80. SeealsoCognitivism ; Holism ; Representations
Folkpsychology ascriptionof, 48- 50
Sequential thought, 132- 136 structurescapableof supporting ,
Seriality , 14, 15 124- 126
SHRDLU , 25- 27 two kindsof studiesof, 152- 160
Simon , H., 9- 17, 66, 135, 139- 141 Truth conditions , 42- 43
Simulation . SeeVirtualmachines Turing, A , 9 - 10
Situatedintelligence , 22, 28, 63- 66 Turingmachine , 10, 12, 23
Sloman , A., 12, 74 Twin earth , 42- 44. SeeRisoBroadcontent
Slugs, 85
Smart , J., 22 . 2, 128- 130,
Uniformityasswnption
Smith, Maynard,51 137- 139
Smolensky , P., 3, 17- 21, 81, 111- 118,
131- 132, 137- 139, 151- 152, 173, Variablebinding , 170
188- 195, 202 Verticallylimitedmicroworlds. See
Snowballeffect, 71 Microworlds
Solipsism , 45, 54, 64 Virtualmachines
Spandrels , 77- 80, 94 connectionist, 121- 122
Speedof processing , 119, 121- 122 explanations gearedto, 174- 175, 181
Sponges , 63 - 66 symbolic - 136, 140- 141
, 131
226 Index
Visualcliffis2
VogeLSo , 63- 64
Walker
, S., 73
Warrington , C., 151
Winograd , T., 25- 26
Winston, P., 26
Wood6eld , A , 42, 47
Zytkow,J., 16