Language and Other Cognitive Systems. What Is Special About Language?
Language and Other Cognitive Systems. What Is Special About Language?
Noam Chomsky
To cite this article: Noam Chomsky (2011) Language and Other Cognitive Systems. What
Is Special About Language?, Language Learning and Development, 7:4, 263-278, DOI:
10.1080/15475441.2011.584041
The traditional conception of language is that it is, in Aristotle’s phrase, sound with meaning. The
sound-meaning correlation is, furthermore, unbounded, an elementary fact that came to be under-
stood as of great significance in the 17th century scientific revolution. In contemporary terms,
the internal language (I-language) of an individual consists, at the very least, of a generative pro-
cess that yields an infinite array of structured expressions, each interpreted at two interfaces, the
sensory-motor interface (sound, sign, or some other sensory modality) for externalization and the
conceptual-intentional interface for thought and planning of action. The earliest efforts to address
this problem, in the 1950s, postulated rich descriptive apparatus—in different terms, rich assump-
tions about the genetic component of the language faculty, what has been called “universal grammar”
(UG). That seemed necessary to provide for a modicum of descriptive adequacy. Also, many puzzles
were discovered that had passed unnoticed, and in some cases still pose serious problems. A primary
goal of linguistic theory since has been to try to reduce UG assumptions to a minimum, both for stan-
dard reasons of seeking deeper explanations, and also in the hope that a serious approach to language
evolution, that is, evolution of UG, might someday be possible. There have been two approaches to
this problem: one seeks to reduce or totally eliminate UG by reliance on other cognitive processes;
the second has approached the same goal by invoking more general principles that may well fall
within extra-biological natural law, particularly considerations of minimal computation, particularly
natural for a computational system like language. The former approach is now prevalent if not dom-
inant in cognitive science and was largely taken for granted 50 years ago at the origins of inquiry
into generative grammar. It has achieved almost no results, though a weaker variant—the study of
interactions between UG principles and statistical-based learning-theoretical approaches—has some
achievements to its credit. The latter approach in contrast has made quite considerable progress. In
recent years, the approach has come to be called “the minimalist program,” but it is simply a con-
tinuation of what has been undertaken from the earliest years, and while considered controversial, it
seems to me no more than normal scientific rationality. One conclusion that appears to emerge with
considerable force is that Aristotle’s maxim should be inverted: language is meaning with sound, a
rather different matter. The core of language appears to be a system of thought, with externalization
a secondary process (including communication, a special case of externalization). If so, much of the
speculation about the nature and origins of language is on the wrong track. The conclusion seems
to accord well with the little that is understood about evolution of language, and with the highly
productive studies of language acquisition of recent years.
Correspondence should be addressed to Noam Chomsky, Department of Linguistics and Philosophy, Massachusetts
Institute of Technology, 77 Massachusetts Avenue, 32-D808, Cambridge, MA 02139. E-mail: [email protected]
264 CHOMSKY
The question “What is special about language?” covers vastly too much ground for me to try
to address it more than very superficially, cutting a lot of interesting corners. I would like to
concentrate on two approaches to the question, which differ radically in assumptions. Since space
is brief, I will have to draw lines too sharply, but not too much to identify some issues worth
thinking about, I hope.
The two approaches differ on whether the topic of this symposium, and my specific topic as
well, makes any sense in the first place. Both the general symposium and my topic are based
on a presupposition, that is, that language exists—by which I mean that language is a coherent
system that can be described by principles of Universal Grammar (UG). But a major tendency
in cognitive science, possibly the dominant one by now, holds that it does not, and sometimes
states that forcefully and explicitly in a form that I will discuss below. It should be recalled that
there is nothing new in this stance. Fifty years ago, it was widely held by the most prominent
philosophers and psychologists that language is just a matter of conditioning and some obscure
general notion of “induction” or “analogy.” A widely held view in professional linguistics was
that languages can differ arbitrarily (within very restricted constraints, like choice of phonetic
features, perhaps just properties of the articulatory apparatus), and that the subject consists of
nothing more than an array of procedures to reduce a corpus to an organized form in one or
another way, selected on the basis of the specific goals of the inquiry, with no other criterion of
right or wrong.
Later versions of the “nonexistence” conception were that rules of language can be justi-
fiably postulated only if they are “in principle” accessible to introspection, a dogma—largely
incoherent in my opinion—that excludes almost everything. There are other variants, among
them the insistence, again by prominent philosophers and others, that language must be regarded
as a socio-political entity of some kind, hence dependent on continuity of empires and literary
cultures, national myths, military forces, and so on.
The symposium topic—Language and Other Cognitive Systems—not only presupposes the
existence of language but also a modular approach to the mind, taking it to be much like the
rest of the organism, a complex of subsystems, often informally called “organs,” with enough
internal integrity so that it makes good sense to study each in abstraction from the others with
which it is integrated in the life of the organism; for example, the visual, immune, digestive, and
other organs “below the neck” metaphorically speaking, and the various mental organs: language,
planning, the various structures of memory, organization of action, and so on—whatever the
right analysis turns out to be. Randy Gallistel (2000) has observed that the biological norm is
modular systems with special growth/learning mechanisms in different domains and in different
species. There is every reason to expect human language to keep to the biological norm in this
respect. There are in fact crucial features of human language that appear to have no significant
analogue in the biological world. They also seem to have emerged very recently in evolutionary
time, many millions of years after the separation of modern humans from any other surviving
species.
My own assumption is that language does exist as a module of the mind/body, mostly the
brain, but that the nonexistence approach in its contemporary form in the cognitive sciences is
actually raising the right questions—though pursuing them in a way that is very likely to fail, at
least as failure and success have been understood for centuries in the sciences.
The study of evolution of language is a hot topic these days, judging by the number of publica-
tions that pour out with such titles. That is rather odd in many respects. Much simpler questions
LANGUAGE AND OTHER COGNITIVE SYSTEMS 265
are scarcely investigated: the evolution of communication of the hundreds of species of bees, for
example, plainly a far simpler question, but recognized to be too hard to say much about. Very
little is known about evolution of cognition generally. Furthermore, it is quite possible that noth-
ing much can be learned by currently available methods, as the prominent evolutionary biologist
Richard Lewontin (1998) has argued in unfortunately neglected essays. A look at the literature on
evolution of language reveals that most of it scarcely even addresses the topic. Instead, it largely
offers speculations about the evolution of communication, a very different matter. It is also often
based on very strange beliefs about evolution, to some of which I will briefly return.
Let me illustrate with a recent essay that encapsulates clearly many of the assumptions of
the nonexistence approach to language and its evolution. In a recent issue of Science magazine,
there is a review-article discussing books on evolution of language by N.J. Enfield (2010) of the
Max Planck Institute. He finds essentially nothing of value in the books reviewed, apart from
some discoveries about the lowered larynx in mammals, which have at best a remote relation to
language and its evolution. Most of the rest of the contents of the books, Enfield argues, is lethally
tainted by the existence assumption, the belief that there are rule systems that determine form-
meaning relations and conditions of language use; that includes phonology, formal semantics
and pragmatics, and narrow syntax—all of which falls within what he calls “syntax,” plausibly
because it all involves internal mental computations, and hence technically is syntax in the broad
sense.
To illustrate the fallacy of the existence approach, the article is accompanied by a photograph
of three infants, suitably interracial, apparently noticing one another. The caption reads: “com-
munication without syntax.” The point apparently is to show that rule systems of the kind studied
under the existence assumption are not necessary for communication. The photo could have been
replaced by a picture of three bacteria, making the same point.
The title of Enfield’s (2010) article captures another of the major criticisms of the existence
approach: it ignores social context when it seeks to determine the properties of language.
To make the matter concrete, take the sentence (1) and two corresponding interrogative forms,
(2) and (3):
Sentences (2) and (3) clearly differ in status: unlike (2), (3) is severely deviant, a violation of
the Empty Category Principle (ECP), in technical terms.1 To investigate such questions as these,
according to Enfield (2010) and apparently the editors of Science, we have to consider the social
context of actual normal use of these expressions (there is effectively none). It is a mistake to
raise the question anyway because the sentences are constructed as an experiment and not drawn
from a massive corpus.
1 The ECP, which states that an overt subject is required in this position, is a descriptive principle holding under
various conditions that have been extensively studied. There have been efforts to explain it, some I think promising,
but that would carry us too far afield here. It suffices here to recognize that the phenomenon illustrated falls under a
descriptive principle of broad scope, and when properly formulated, should turn out to be universal for human language.
266 CHOMSKY
2 Statistically speaking, language use is overwhelmingly internal–“speaking to oneself.” If one chooses to call this
“communication,” thus depriving the term of much significance, then imagined social context is relevant.
LANGUAGE AND OTHER COGNITIVE SYSTEMS 267
we move beyond language that no one would dare to propose it, even for systems as closely
related to language as arithmetic. We do not study arithmetical capacity by constructing mod-
els based on statistical analysis of masses of observations of what happens when people try to
multiply numbers in their heads, without external memory. At least I hope no one does.
Enfield (2010) also puts forth a far-reaching thesis, quite standard today in the cognitive sci-
ences, and a clear expression of the nonexistence thesis: “language is entirely grounded in a
constellation of cognitive capacities that each—taken separately—has other functions as well.”
Which means that language exists only in the sense that there is such a thing as today’s weather, a
constellation of many factors that operate independently. Take the phenomena (1)–(3), for exam-
ple. Enfield cites a source to justify his conclusion, but it has little relation to the thesis, not
because he has chosen badly, but because there are none that do better. But he is correct, I think,
in saying that this is what many cognitive scientists believe.
Another influential version of the idea that language does not really exist is illustrated in a
contribution to a handbook of child development by Michael Tomasello (2006). According to
the conception he proposes, there are no linguistic rules and little to say about apparent descrip-
tive regularities—say Empty Category Principle (ECP), as in (1)–(3). Rather, there is nothing but
“a structured inventory of meaningful linguistic constructions—including both the more regular
and the more idiomatic structures in a given language (and all structures in between).” All of these
are “meaningful linguistic symbols [that] are used in communication,” his topic, there being no
language apart from this inventory. The inventory is structured only in that its elements—words,
idioms, sentences like this one, etc.—are acquired by processes of pattern-finding, schematiza-
tion, and abstraction common to primates, and a few other processes, all left obscure. So there
are individual instances of an “interrogative construction,” a “passive construction,” and so on,
but we must not analyze them into more fundamental processes that hold also for a variety of
other constructions, for example, displacement principles that apply quite broadly.
To take a standard example from decades ago, we can assign (4) to the passive construction
and (5) to the raising construction, illustrated by such expressions as “John seems to be
intelligent” (where John is interpreted as in “it seems that John is intelligent,” and presumably
is derived by raising from its infinitival counterpart “seems John to be intelligent,” John being
barred in that position by standard case principles). But we must assign (6) ambiguously to
both:
We are forbidden to notice that a very general rule of displacement applies—in fact, the
elementary rule Internal Merge (IM) discussed below—to all three cases (and many others), one
step in dissolving the constructions, now recognized to be descriptive artifacts, into elementary
components of great generality. We are, in short, forbidden to pursue the path of normal rational
inquiry.
The same must be true of examples (1)–(3): (1) and (2) are learned just the way the child
learns “river,” “how do you do,” “kick the bucket” (meaning “die”), and so forth. The child
somehow learns the same way that (3) is not usable for communication as (1) and (2) are. I
leave it to others to try to give some rational interpretation of such conclusions, which abound.
268 CHOMSKY
Presumably expressions could have virtually any other properties in the next language we look
at. The inventory is, in effect, an arbitrary collection of unanalyzed “linguistic symbols,” and also
is finite, apart from some hand-waving (see, e.g., Tomasello, 2006).
These influential approaches and others like them ignore even the most elementary properties
of language, or sometimes propose alleged solutions that simply beg the question. A harsh judg-
ment, but easy to support, I think. They are also typically concerned not with language but with
communication, and hence, sensibly, insist that social context is essential, virtually by definition,
as in communication among organisms generally, from bacteria to humans.
Enfield (2010) expresses a closely related belief that is also widely held: “There are well-
developed gradualist evolutionary arguments” to support the conclusion that there is no such
thing as language, except as a complex of independent cognitive processes. Again, no relevant
source is cited, nor does one exist. He invokes these gradualist evolutionary claims in a critique
of what he calls the “saltationist argument” that the transition from finite to unbounded was not
gradualist. He attributes the saltationist argument to me, but that is like attributing to me the claim
that 2 + 2 = 4. It is a logical truth that the transition was saltationist—a dirty word in many cir-
cles on the basis of a curious but widespread misunderstanding of evolutionary biology, worth
exploring, perhaps, but I will put it aside here apart from noting that the “saltationist” heresy
is considered unproblematic within actual evolutionary biology. Thus, one recent book by two
prominent evolutionary biologists (Kirschner & Gerhart, 2005) takes as its “central problem”
the question “how can small random genetic changes be converted into complex useful inno-
vations,” giving many examples. A leading paleoanthropologist (Tatersall, 1998) concludes that
the innovation “that set the stage for language acquisition . . . depended on the phenomenon of
emergence, whereby a chance combination of preexisting elements results in something totally
unexpected,” a “sudden and emergent event,” presumably “a neural change . . . in some popu-
lation of the human lineage, . . . rather minor in genetic terms, [which] probably had nothing
whatever to do with adaptation.”3 All “saltationism” with a vengeance, at least if the term is
supposed to have any meaning.
Though we know very little about the evolution of language, there are a few fairly clear con-
clusions, and they are suggestive. It is perhaps worth mentioning that the phrase “evolution of
language” can be misleading. Languages do not evolve, organisms do—at least in the biological
sense of the term “evolution.” Languages are constantly changing, but that is not evolution. When
we speak of evolution of language, then, we mean evolution of the species.
There is good evidence that language capacity is the same for all human groups. If an infant
from a remote tribe in the Amazon is raised in Boston, its language will be that of my grandchil-
dren, and conversely. There are individual differences, but no known group differences. It follows
that there has been no meaningful evolutionary change with regard to language since the time our
common ancestors, perhaps a very small group, left Africa and spread around the world, about
50,000 years ago, it is commonly assumed. If we go back roughly 50,000–100,000 years before
that, there is no evidence in the archaeological record for the existence of language. Somewhere
in that narrow window, there seems to have been a sudden explosion of creative activity, com-
plex social organization, symbolic behavior of various kinds, records of astronomical events, and
so on—a “great leap forward” as Jared Diamond (cited in Carroll, 2005) called it—generally
3 Elsewhere Tatersall (2005) suggests that human intelligence more generally is an “emergent quality, the result of a
chance combination of factors, rather than a product of Nature’s patient and gradual engineering over the eons.”
LANGUAGE AND OTHER COGNITIVE SYSTEMS 269
4 The suggestion is borrowed from Zellig Harris (1951), who proposed it for identification of morphemes, keeping
to the procedural assumptions of the day on largely “level-by-level” analysis. But morphemes are much more abstract
elements, lacking the beads-on-a-string property necessary for the statistical analysis.
270 CHOMSKY
transition probabilities are aligned with phrasal prosodic constituents (Shukla, White, & Aslin,
in press).5
Though real results remain sparse, except in the novel sense of “success” adopted by recent
computational cognitive science, the role of statistical reasoning and other cognitive processes
in language acquisition is potentially a significant area of research, something that has never
been in doubt since the early origins of current work, contrary to many misperceptions. There
are also presumably conditions imposed on language by the structure of the brain, though too
little is known to draw conclusions as far as I am aware, despite interesting recent progress
in neurolinguistics. It may be, as Randy Gallistel (Gallistel & King, 2009) has argued, that a
fundamental reorientation of centuries of study of the brain will be necessary to discover the
neural roots of the computational capacities not only of human language but also even of insects,
where they are indeed astonishing.
Let’s turn to UG. The question of whether language exists is, basically, the question of whether
UG exists. Though, as already noted, this is commonly denied, I know of no coherent alterna-
tive. In the early work in the 1950s, it appeared as though UG must be extremely rich to achieve
a degree of descriptive adequacy. One major goal of theoretical linguistic research since that
time has been to reduce the postulated complexity of UG in accounting for phenomena of lan-
guage. The reasons are straightforward. The first is standard rational inquiry, seeking to achieve
greater explanatory depth, for example, to analyze (4)–(6) into simple and general components,
overcoming redundancy, eliminating the stipulated artifacts (“constructions”), and deepening
explanation, relying on the third factor principle of minimal computation. Another reason is
the hope for an eventual serious study of evolution of language. Evidently, this task, to the extent
that it is feasible at all, is rendered more difficult to the extent that the postulated target, UG, is
more complex.
The nonexistence approach I referred to shares the same goal: to reduce UG, to zero in this
conception. There are several salient differences between these distinct approaches. The first is
with regard to results. I think it is fair to say that there are virtually none in the nonexistence
literature, except in terms of the curious notion of “success” that has been contrived, departing
from all of science. In contrast, there are quite substantial results in the existence literature.
In part these derive from investigating interaction between UG and other cognitive systems,
as in cases mentioned earlier. But, overwhelmingly, they result from investigating third factor
considerations of computational complexity, even though the inquiry has often not been
phrased in these terms. These steps include, for example, dissolving “constructions” into more
general components, eliminating phrase structure grammar with its rich stipulations, radically
reducing the complexity of the transformational grammars that were designed to accommodate
noncontiguous relations such as the ubiquitous phenomena of displacement and morphological
discontinuity, and finally unifying these two generative systems under the simplest computa-
tional operation, which functions in some manner in any generative system. Recent inquiry
into these topics is often called “the minimalist program,” but the term is misleading: it is just
ordinary science, extending the main thrust of theoretical linguistics since the early days of the
biolinguistic approach in the 1950s.
5 The Shukla et al. (in press) paper expands on results of Charles Yang (2002). For more on such interactions, see Yang
(2003), and for a lucid introduction, see Yang (2006). This material is rarely if ever cited in the triumphalist literature on
successes of computational cognitive science, perhaps because it commits the heresy of assuming that language exists.
LANGUAGE AND OTHER COGNITIVE SYSTEMS 271
We understand that the question is whether eagles can swim, not whether they can fly: the Aux
element can is associated with swim, not fly. That is obvious from the interpretation, and also
from morphology, as in “are eagles that fly swimming,” “have eagles that fly been swimming.”
But why should that be the case? Ease of computation would suggest that fronted-Aux should
be associated with the closest verb, hence fly, not swim. Putting the problem differently, two
concepts of minimal distance conflict: minimal linear distance would relate (7) to “eagles that
can fly swim,” while minimal structural distance relates it to “[eagles that fly] can swim.” The
question, then, reduces to why the language learner reflexively minimizes the property of struc-
tural distance (the principle informally called structure-dependence) rather than adopting the
computationally far simpler property of linear distance.
Note that while there is a good answer to the What question—namely, minimal structural
distance—the How and Why questions remain. We can clarify what is at stake by spelling out
the structure of (7) a little more carefully. It has been recognized for a long time that clauses
272 CHOMSKY
Minimal linear distance relates can and v∗ ; minimal structural distance relates can and v.
Interpretation keeps to the latter, reflexively. While v here is only a notational device indicating
the position of interpretation, it is actually more than that, as we see directly.
Aux-inversion has been the topic of a considerable industry in computational cognitive sci-
ence, seeking to show that the child acquires this knowledge on the basis of statistical analysis
of a corpus of data, in accord with the nonexistence thesis. New papers come out regularly, some
now in press in major cognitive science journals (see Berwick, Pietroski, & Chomsky, 2010, for
a review). They have curious properties. One is that each fails, dramatically so, though they are
regularly cited as successes in the literature, as they are, in terms of the novel conception of
“success” mentioned earlier: roughly approximating unanalyzed data. Another is that each effort
ignores the simple explanation, which in fact generalizes to many other constructions in all lan-
guages: structural distance is minimized. A third is that it would hardly matter if the approaches
succeeded, since they would leave the basic question untouched: why is structural rather than
linear distance minimized universally, in all languages and constructions in which the question
arises? For the most part, the methods, or very similar ones, would work just as well in a pseudo-
language that used linear rather than structural distance for interpretation. A background question
is how the child even knows what the intended interpretation is in such cases as (7), unless it is
already relying on the structure-dependence principle without any data at all.
Another case discussed in the nonexistence literature has to do with binding theory, as
illustrated in (9):
If “John” is missing, then they is the antecedent of each other, but not if it appears. This
again is one of the rare cases of any significance discussed in the nonexistence literature (Chater
& Christiansen, 2010).6 The authors propose that the binding theory relation between they and
each other is simply “an instance of a general cognitive tendency to resolve ambiguities rapidly
in linguistic and perceptual input,” specifically, to establish the antecedent-anaphor relation as
6 Chater and Christiansen (2010) also pose what they take to be a lethal dilemma for the minimalist program (con-
cerning poverty of stimulus arguments), but their concerns are based on complete misunderstanding both of the program
and these arguments, and failure to understand the reasons why for half a century theoretical linguistics has sought to
overcome these arguments–adopting the minimalist program, even if not the name. For a sophisticated study of binding
theory, within the minimalist program as it is actually understood, see Reuland (2011).
LANGUAGE AND OTHER COGNITIVE SYSTEMS 273
quickly as possible in comprehension.7 Hence, the facts might rely on an innate constraint but
not one that is domain specific.
The conclusion might turn out to be correct, but it is hard to see how their proposal supports
it. Thus, if John is present in (9), then the quickest way to find the anaphor is to take “they” to be
its antecedent, since John cannot be. Even if there is some way around this apparent refutation
of their proposal, it also fails even if John does not appear, as in (10):
(10) Who do they expect to see each other next week?
The reason is intuitively clear: there is an antecedent for each other closer to it than
they, namely the unpronounced element in the position of the variable in the interpretation
of (10) as (11):
(11) For which persons x, they expect persons x to see each other next week
As soon as we pay attention to the most elementary facts, it appears that we have to reintroduce
rule systems of the kind that Chater and Cristiansen (2010) are trying to avoid, and much else
if we go beyond these. Or, preferably, we should follow the course of theoretical linguistics
for the past half century and try to determine to what extent the apparent complexities can be
eliminated in favor of other considerations, general cognitive processes if this proves possible
(though not what they propose) or third factor considerations of computational complexity, which
have typically succeeded in the past.
Note that these are not exotic examples, constructed to test the Chater-Christiansen proposal.
Rather, they are among the earliest cases that binding theory sought to address.
Despite their widespread endorsement, it is hard to find evidence or argument to support
the nonexistence approaches. I will therefore continue to assume that there is indeed something
special about language, and that this symposium accordingly has a topic.
If so, then the first question is to determine the nature of the GP that yields structured expres-
sions over an infinite range and their interpretations at the interfaces. Embedded somehow in any
such process is a combinatorial operation, call it Merge, which takes objects already generated
and forms from them a new one. In the simplest case, then, Merge(X, Y) = Z. Uncontroversially,
we should seek to adhere to the overriding condition of Minimal Computation (MC), unless the
evidence requires further complication. We therefore take Merge(X,Y) = {X,Y}, leaving X and
Y unaltered and unordered. X and Y in Merge(X,Y) are either distinct or they are not; in the
latter case, unless further complications are introduced, one is contained within the other, say Y
is contained in X (technically, is a term of X, an object already generated at an earlier stage of the
computation). Call the case of (X, Y) distinct External Merge (EM), and the case of containment
Internal Merge (IM). Barring some stipulation—hence complication of UG—both are freely
available. Skipping details, GP will generate S = “John read that book” by EM, forming succes-
sively {that, book}, {read, {that, book}}, {John, {read, {that, book}}. These generated objects
are submitted to the conceptual-intentional (CI) interface (roughly, systems of thought and orga-
nization of action), for determining semantic relations on the basis of the structure provided. S so
far is unordered, but order appears to be irrelevant for this interpretation. The sensory-motor (SM)
7 Putting aside the irrelevant matter of use of language, the basic observation has an important element of truth, and
in fact restates the formulation of binding theory principles in terms of minimal search (with the crucial qualifications
investigated long ago).
274 CHOMSKY
interface of course requires some kind of ordering (depending on the modality, e.g., different for
speech vs. sign). Hence, the externalization of S imposes linearization.
Suppose it can be shown that linearization is never required for interpretation at CI
(conceptual-intentional). Then we would expect it to be introduced solely as a reflex of SM
(sensory-motor), where it is plainly needed. That would carry us a step farther towards answer-
ing the How and Why questions that remain for Aux-inversion: minimal structural distance
(structure-dependence) is the only option (given the third factor consideration MC, Minimal
Computation): linear order is simply not available to the computational system at the point where
the C-inflection relation is established. Though the conceptual arguments supporting this move
are clear, there are many empirical difficulties to face. There also remain interesting additional
puzzles, not recognized in the published literature, but I will put them aside here.
Suppose we proceed further, generating S’ = “that book, John read.” The semantic properties
of “that book” include its properties in S, along with the extra property of topicalization. Hence,
S’ will be generated from S by IM, Internal Merge, yielding (12):
(12) S’ {{that book}, S} = {{that book}, {John, {read, {that, book}}}
In (12) there are two copies of the object {that, book}. S’, with the two copies, has the right
form for CI (conceptual-intentional) but not of course for SM (sensory-motor)—first because
linearization is required and second because the hierarchically less prominent copy is not pro-
nounced. The latter property follows at once from MC: at least the most prominent copy must be
pronounced, or there will be no evidence that topicalization took place at all, but computation is
reduced if at most one copy is pronounced—massively reduced in nontrivial cases.8
The same considerations hold for the earlier examples discussed. In (8), v is actually can. The
expression with two copies has the right form for CI: the hierarchically lower copy enters into
the appropriate semantic relations, as in (7); the higher one, in the position of C, indicates that
this is an interrogative construction. The lower copy is not pronounced. The basic facts adhere
closely to the overriding third factor consideration MC.
The same is true of (10), generated as (13), with two copies of who:
(13) Who do they expect who to see each other next week?
The interpretation (11) follows directly: deletion of the lower copy yields (10).
These and many other examples suggest consequences for cognitive architecture. Keeping
to the principle of MC, the GP yields forms that are appropriate for semantic interpretation at
CI but not for production and perception at sensory-motor (SM), though this SM inadequacy
also follows from MC. The inadequacy is severe. Anyone who has worked on parsing programs
knows that a major difficulty is posed by “filler-gap” problems: given who in (10), the problem
for parsing/perception is to locate the gap where it receives its interpretation in the argument
structure of the sentence, not a trivial matter in general. These problems would largely be obvi-
ated if all copies were pronounced, violating MC. There are many similar cases. Garden path
sentences and ambiguities, for example, raise difficulties for perception, but they appear to be
produced by allowing GP to function without stipulation or constraint. To the extent that they
8 There are interesting cases where some residue of the lower copy is pronounced, or when some other modification
marks position of copy erasure, lending further support to the general approach outlined here.
LANGUAGE AND OTHER COGNITIVE SYSTEMS 275
are understood, the same seems to be true of “islands,” such as (3): an expression that can be
“thought” but not articulated, without a complex paraphrase.
In brief, where there is a conflict between communicative and computational efficiency, the
latter seems to win, hands down. It appears that Aristotle’s dictum should be reversed: language
is not sound with meaning but meaning with sound, a very different matter. Externalization by the
SM system appears to be a secondary property of language. Externalization is also in part inde-
pendent of modality, as work of the past few decades on sign language has revealed. Sometimes
externalization is employed for communication—by no means always, at least if we invest the
term “communication” with some significance. Hence, communication, a fortiori, is a still more
ancillary property of language, contrary to much conventional doctrine—and of course language
use is only one of many forms of communication.
These conclusions seem exotic to many commentators, possibly, in some cases at least,
because of “gradualist” mythologies of the kind already mentioned. In contrast, they seem not
only natural, but almost obvious to leading evolutionary biologists and paleoanthropologists.9
The general approach just informally outlined (in one of several variants) is the only one
I know of that offers any hope of answering What-questions: what are properties of human
language, and what is special about them? Every approach to the How-questions is based, neces-
sarily, on some assumed answer to the What-questions. The more we understand about these, the
more seriously we can address questions of language acquisition. And the relation is reciprocal:
what is learned about language acquisition can in principle, and has often in practice, served as
a guide to investigating the What-questions.
As for Why-questions, in particular why UG has this rather than some other form, the best
answer would be that a sudden and very slight evolutionary event yielded Merge, and that the
rest follows from natural law. That thesis—“the Strong Minimalist Thesis (SMT)—would fit
well with the little about evolution of language that seems reasonably well confirmed. We are far
from reaching that goal, and do not know whether the huge gaps can be overcome or whether
much richer assumptions are required about UG (or, perhaps, about second-factor considerations
involving cognition and the brain, now unknown). But, though of course remote, the goal seems
a good deal more realistic than it did not very long ago.
Suppose that the SMT is even approximated, that the core system of human language is
something like a snowflake. That possibility, if even close to accurate, yields some suggestions
about the How-questions and other matters. Externalization is a hard problem. It requires relat-
ing two systems that arose quite independently, so it appears. Current evidence indicates that
the sensory-motor system was present hundreds of thousands of years before the emergence
of human language and the “great leap forward,” and evidence about possible adaptations for
language is thin. In learning a language, first or later, one attends almost entirely to the external-
ization: phonetics and phonology, morphology, ordering, and so forth. No one is taught Empty
Category Principle (ECP), structure-dependence, the binding properties illustrated in (10)–(11),
(13), or any nontrivial properties of syntax, nor can they be learned from unanalyzed data, so it
appears, despite many claims of the kind mentioned earlier. For semantic properties, evidence is
virtually non-existent, and it is generally assumed, quite realistically, that the principles and prop-
erties are universal. Externalization is also easily subject to change, sometimes radical change
9 For quotes from Ian Tatersall and evolutionary biologists (Nobel laureates) Salvador Luria and François Jacob, see
Chomsky (2010).
276 CHOMSKY
(e.g., the Norman invasion). That does not appear to be true for syntax-semantics. It is hard to
make any sense of the invention of sign languages (sometimes recently invented, as in the case
of Nicaraguan sign) except on the assumption that the language was already basically present,
internalized, as is almost certain anyway given the very strong evidence that there has been little
if any evolution of the language faculty at least since the trek from Africa. What was missing
was a form of externalization.
With emergence of the GP for language—in the simplest case, just binary Merge satisfy-
ing MC–primitive elements of an already existing conceptual system would enter for the first
time into a “language of thought,” for a single individual. Transmission to offspring would yield
a community sharing this capacity, so that the secondary process of externalization might be
usefully undertaken—a hard cognitive problem, as mentioned, and one that might be solved in
various ways, of course subject to overriding constraints, at least in some cases traceable to UG,
so it appears. It is sometimes claimed that there must have been a prior “language of thought,” but
that speculation adds nothing, merely transferring the problem of its origin one step back. The
same is true of the belief that there must have been “protolanguages,” simplified forms of exter-
nalization (or maybe of language itself). There is, of course, no empirical evidence for that, and
no conceptual argument either. Transition from protolanguage to full language, like transition
from 7 word sentences to the unbounded character of human language, is no simpler than “one
fell swoop.” Similar questions arise about acquisition and have been investigated with interesting
results that should be familiar in work by Lila Gleitman and others (Shipley, Smith, & Gleitman,
1969; Gleitman, Cassidy, Nappa, Papafragou, & Trueswell, 2005).
I cannot end without at least mentioning another extremely serious problem, which has been
barely addressed. A computational procedure requires certain atoms of computation–in our case,
a lexicon of minimal elements. But even the simplest of these pose fundamental problems: how
do they relate to the mind-external world?
There are two aspects to the question: meaning and sound, the latter ancillary, if the reasoning
above proves accurate. For sound, the answers lie in articulatory and acoustic phonetics. The
problems are difficult. They have been studied intensively for many years, yielding some answers
but leaving many outstanding problems. What about meaning? A standard answer for the core
cases is provided by referentialist doctrine: the word cow picks out cows, maybe by a causal
relation, and so forth. Something like that seems to be true for animal communication. Symbols
appear to relate to physically identifiable external or internal states: motion of leaves elicits a
warning cry (maybe an eagle is coming); “I’m hungry”; etc. Nothing remotely like that is true for
even the simplest elements of human language: cow, river, person, tree—pick any one you want.
There are inklings of that understanding in classical philosophy, in Aristotle’s Metaphysics,
particularly. It was considerably enriched, with a shift from metaphysics to epistemology and
cognition, in the 17th and 18th centuries, in the work of British neo-Platonists and classical
empiricists. They recognized that there is no direct link between the elementary elements of
language and thought and some mind-independent external entity. Rather, these elements provide
rich perspectives for interpreting and referring to the mind-independent world involving Gestalt
properties, cause-and-effect, “sympathy of parts,” concerns directed to a “common end,” psychic
continuity, and other such mentally-imposed properties. In this respect, meaning is rather similar
to sound: every act of articulating some item, say the internal syllable [ta], yields a physical
event, but no one seeks some category of physical events associated with [ta]. Similarly, some
(but by no means all) uses of the word river relate to physically identifiable entities, but there
LANGUAGE AND OTHER COGNITIVE SYSTEMS 277
REFERENCES
Baillargeon, R., Spelke, E. S., & Wasserman, S. (1985). Object permanence in five-month-old infants. Cognition, 20,
191–208.
Berwick, R., Pietroski, P., & Chomsky, N. (2010). Poverty of the stimulus revisited. Unpublished manuscript.
Carroll, S. B. (2005). Endless forms most beautiful: The new science of Evo Devo and the making of the animal kingdom.
New York, NY: W. W. Norton & Co.
Chater, N., & Christiansen, M. H. (2010). Language acquisition meets language evolution. Cognitive Science, 34,
1131–1157.
Chomsky, N. (1975). The logical structure of linguistic theory. New York, NY: Plenum Press. Excerpted from 1956
manuscript.
Chomsky, N. (1966). Cartesian linguistics: A chapter in the history of rationalist thought. New York, NY: Harper
& Row.
Chomsky, N. (1996). Powers and prospects: Reflections on human nature and the social order. Boston, MA: South End
Press.
Chomsky, N. (2010). Some simple evo devo theses: How true might they be for language? In R. Larson, V. Deprez,
& H. Yamakido (Eds.), The evolution of human language: Biolinguistic perspectives (pp. 45–62). Cambridge,
England: Cambridge University Press.
Enfield, N. J. (2010). Without social context? Science, 329, 1600–1601.
Gallistel, C. R. (1996). Neurons and memory. In M. S. Gazzaniga (Ed.), Conversations in the cognitive neurosciences
(pp. 71–89). Cambridge, MA: MIT Press.
Gallistel, C. R. (2000). The replacement of general-purpose learning models with adaptively specialized learning mod-
ules. In M. S. Gazzaniga (Ed.), The new cognitive neurosciences (2nd ed., pp. 1179–1191). Cambridge, MA: MIT
Press.
Gallistel, C. R. (2011, this issue). Prelinguistic thought. Language Learning and Development, 7, 253–262.
Gallistel, C. R., & King, A. P. (2009). Memory and the computational brain: Why cognitive science will transform
neuroscience. Chichester, England: Wiley-Blackwell.
Gleitman, L. R., Cassidy, K., Nappa, R., Papafragou, A., and Trueswell, J. C. (2005). Hard words. Language Learning
and Development, 1, 23–64.
Harris, Z. S. (1951). Methods in structural linguistics. Chicago, IL: University of Chicago Press.
Hubel, D. H., & Wiesel, T. N. (1959). Receptive fields of single neurons in the cat’s striate cortex. Journal of Physiology,
148, 574–591.
10 For some discussion, see Chomsky (1966), the first two chapters of Chomsky (1996), and particularly McGilvray
(2005).
278 CHOMSKY
Kirschner, M. W., & Gerhart, J. C. (2005). The plausibility of life: Resolving Darwin’s dilemma. New Haven, CT: Yale
University Press.
Lewontin, R. (1998). The evolution of cognition: Questions we will never answer. In D. Scarborough & S. Sternberg
(Eds.), An invitation to cognitive science, vol. 4: Methods, models, and conceptual issues (2nd ed., pp. 107–132).
Cambridge, MA: MIT Press.
McGilvray, J. (2005). Introduction. In J. McGilvray (Ed.), The Cambridge companion to Chomsky (pp. 1–18).
Cambridge, England: Cambridge University Press.
Reuland, E. (2011). Anaphora and language design. Cambridge, MA: MIT Press.
Shipley, E., Smith, C., & Gleitman, L. R. (1969). A study in the acquisition of language. Language, 45, 322–342.
Spelke, E. S. (1985). Perception of unity, persistence, and identity: Thoughts on infants’ conceptions of objects.
In J. Mehler & R. Fox (Eds.), Neonate cognition: Beyond the blooming buzzing confusion (pp. 89–113). Hillsdale,
NJ: L. Erlbaum Associates.
Spelke, E. S. (1990). Principles of object perception. Cognitive Science, 14, 29–56.
Tatersall, I. (1998). The origin of the human capacity. New York, NY: James Arthur Lecture Series, American Museum
of Natural History, 1–27.
Tatersall, I. (2005). Patterns of innovation in human evolution. Evolution und Menschwerdung, Nova Acta Leopoldina,
345(93), 145–157.
Tomasello, M. (2006). Acquiring linguistic constructions. In W. Damon, R. Lerner, D. Kuhn, & R. Siegler (Eds.),
Handbook of child psychology (6th ed.), Vol. 2: Cognition, perception, and language (pp. 255–298). New York,
NY: John Wiley & Sons, Inc.
Ullman, S. (1979a). The interpretation of visual motion. Cambridge, MA: MIT Press.
Ullman, S. (1979b). The interpretation of structure from motion. Proceedings of the Royal Society of London Series B,
203(1153), 405–426.
Yang, C. D. (2002). Knowledge and learning in natural language. Oxford, England: Oxford University Press.
Yang, C. D. (2004). Universal grammar, statistics, or both? Trends in Cognitive Sciences, 8, 451–456.
Yang, C. D. (2006). The infinite gift: How children learn and unlearn the languages of the world. New York, NY:
Scribner.