Lila Gleitman and Barbara Landau (Editors) - The Acquisition of The Lexicon - The MIT Press (1994)
Lila Gleitman and Barbara Landau (Editors) - The Acquisition of The Lexicon - The MIT Press (1994)
Lila Gleitman and Barbara Landau (Editors) - The Acquisition of The Lexicon - The MIT Press (1994)
A Bradford Book
The MIT Press
Cambridge, Massachusetts
London, England
Contents
Preface (L. Gleitman and B. Landau) 1
Section 1: Nature of the mental lexicon
E. Williams, Remarks on lexical knowledge 7
B. Levin and M. Rappaport Hovav, A preliminary analysis of
causative verbs in English 35
Section 2: Discovering the word units
A. Cutler, Segmentation problems, rhythmic solutions 81
M.H. Kelly and S. Martin, Domain-general abilities applied
to domain- specific tasks: Sensitivity to probabilities in
perception, cognition, and language 105
Section 3: Categorizing the world
S. Carey, Does learning a language require the child to
reconceptualize the world? 143
F.C. Keil, Explanation, association, and the acquisition of
word meaning 169
Section 4: Categories, words, and language
E.M. Markman, Constraints on word meaning in early
language acquisition 199
S.R. Waxman, The development of an appreciation of
specific linkages between linguistic and conceptual
organization 229
Preface
For many years, the topic of lexical acquisition was a stepchild in linguistic inquiry. While the acquisition of syntax was
acknowledged to be organized according to a set of deep and highly structured innate principles, word meanings were assumed
to be acquired by a simple and particularistic associative procedure that mapped perceptual experience onto phonetic entities.
We believe this stance was anomalous from the start, as had been pointed out eloquently by Plato and, in modern times, by
Quine. Not all words are perceptually based; even those that are map quite abstractly from perceptual categories; and neither
perception nor learning is 'simple'. During the past decade, it has become clear from linguistic inquiry that the lexicon is more
highly structured than heretofore thought; moreover, that much of grammar turns on critical and universal links between
syntactic and lexical-semantic phenomena.
Hence the present volume. The papers grew out of a workshop held at the University of Pennsylvania. Its aim was to bring
together psychologists, computer scientists, and linguists whose joint concern was the lexicon and its acquisition; these
researchers have typically worked quite separately and on problems thought to be disparate. The volume is organized in six
sections:
1. Nature of the mental lexicon: The two essays that open the collection are from
linguists gleaning general perspectives on lexical learning from the linguistic facts
themselves. The first, by Edwin Williams, argues from the real complexity and
idiosyncratic nature of the mental lexicon (including lexical phrases) that innate
linguistic and categorization principles are not enough, that real children require a
learning theory more sophisticated than usually has been supposed. A detailed
analysis of English causative alternation verbs by Levin and Hovav eloquently
supports Williams on the complexity of lexical structure.
2.Discovering the word units: To acquire a vocabulary, the learner requires some
procedures for segmenting the continuously varying sound wave into word-sized
pieces. Cutler discusses this problem, presenting evidence that a bias toward
rhythmic alternation serves as a powerful bootstrap for infants (and adults, whatever
their specific language background) solving this problem. Kelly and Martin then
show that, armed with such biases, learners
word's meaning; Pinker showed further that such a procedure would be marvelously
useful for the construction of phrase structure ('semantic bootstrapping'). Landau and
Gleitman (1985) emphasized that learning might sometimes, even usually, be the
other way round; specifically, that syntactic structure plays a necessary role in
narrowing the search space for verb meanings ('syntactic bootstrapping'). The three
articles in this section hotly debate how the semantics-syntax links (specifically, the
relation between verb argument structures and subcategorization structures) are
implicated in the learning process for verbs. Fisher, Hall, Rakowtiz, and Gleitman
emphasize the role of syntax in narrowing the range of interpretations made
available by experience; Pinker argues that logically the child must work from
experience to determine the syntax in the first place; and Grimshaw stalwartly
proposes a reconciliation of these two views.
6.Procedures for verb learning: The positions just described, even if correct, can
suggest only some boundary conditions on a learning theory for verbs. Fisher et al.,
as just described, assert that some rudiments of syntax are necessary for learning the
verb meanings. But as Pinker points out, the question then would be: How did the
child acquire the syntax if not by exploiting the word meanings themselves? Brent,
in the best tradition of computer science, offers a discovery procedure for verb
subcategorization that uses string-local surface cues only, and does surprisingly well.
The final article in the volume, from Steedman, was designed as a specific
commentary on Brent's work. But this short article is much more general than that. It
strikes a number of sane cautionary notes about current theories of verb learning, of
which the most important is that real acquisition is bound to be a messy business,
with syntactic, semantic, and prosodic cues recruited by the child more or less catch-
as-catch-can. Steedman also points out, in what we find an appropriately laudatory
comment, that it is only in the presence of explicit computational models such as
Brent's that the bootstrapping theories can ever be developed and seriously
evaluated.
It is the hope of the editors that this compendium of topics and views on lexical learning will be of particular use to the linguists
who constitute the Lingua audience, in two ways. First, we suppose it will be useful to see the kinds of methodological and
substantive contributions that scientific psychology can make to the question of language learning and hence to the theory of
language. Second as we have emphasized earlier because syntactic and lexical structure are so closely entwined, we expect that
the various articles will be informative as to how these linkages enter into the learning procedure for vocabulary.
We thank Lingua for the opportunity to air these works in this Special Issue, and we particularly thank the Lingua Chief Editor.
Teun Hoekstra, for this continuing aid and support of this project. We thank also the National Science Foundation which,
through an STC grant to the University of Pennsylvania Institute for Research in Cognitive science, made the Workshop
possible. In that regard, we are especially grateful to the staff of IRCS (Trisha Yanuzzi and Chris Sandy) and to Carol Miller,
Kimberly Cassidy, and Sally Davis (graduate students in the Department of Psychology) who ran the Workshop with verve,
efficiency, and considerable good humor. Finally, we thank Steven and Marcia Roth for a grant to Lila Gleitman which aided us
in the preparation of the Volume.
Philadelphia, June 1993
Lila Gleitman
Barbara Landau
Section 1
Nature of the mental lexicon
1. Introduction
I will survey two aspects of lexical knowledge which I feel pose special problems for learning. In both cases the remarks will
tend to emphasize the extensiveness abstractness, and at the same time the language-particularity of lexical knowledge, and will
consequently magnify the learning problem. One conclusion that could be drawn from these observations is that intricate
structures can be learned, and this learning is not adequately modelled by either parameter setting or list learning.
It is useful to distinguish two notions of lexicon, one Bloomfieldian and the other grammatical. The Bloomfieldian lexicon is the
repository of all of a language's idiosyncracies. The grammatical lexicon is the linguist's theory of the category of linguistic
object we call a 'word'. These are quite different conceptions, but have come to be identified in modern times. The result is a
picture of grammar in which we have a clean streamlined syntax, where syntax is the theory of phrase, and a notion of word that
entails that words are idiosyncratic and irregular at heart, with partial regularities expressed as 'redundancy rules' whose name
implies low level generalizations where exception is the rule.
I think that this picture is wrong. A more correct picture is I think given in Disciullo and Williams (1986). In that view, both the
word formation and the
syntactic system are 'clean' streamlined systems, independent of the lexicon. The lexicon is the repository of forms about which
something special must be learned. This certainly includes, for example, all the monomorphemic words, but it also includes
composed units of both syntax and word formation, composed units whose properties do not all follow from the rules which
compose them. This would include a great deal of words, but also a great deal of phrases. In fact, I now think there are more
lexical phrases than there are lexical words, but this remains a speculation.
In addition, I think that lexical knowledge includes knowledge of complex abstract structures that cannot be arrived at through
parameter setting, and which must be learned from the data in a strong sense. I will discuss two of these: 'abstract idioms' in
section 2, and paradigms in section 3. If my view of these is correct, the rich structure each exhibits results not from a rich
innate predefined linguistic structure, but rather from a richly structured learning strategy.
If nothing else, then, the remarks that follow draw attention to the learning problems that might arise if first, more of linguistic
knowledge is lexical than has been thought, and second, if acquired lexical knowledge is more abstract and structured in
language-particular ways than has been thought. I think that the sort of structure that is found in each case reveals the hand of
the learning strategy.
2. Idioms
The phrases in the lexicon are called 'idioms'. We generally think of idioms as 'frozen' expressions. I will use the term idiom to
refer to any defined unit whose definition does not predict all of its properties. This will include the phrase kick the bucket, a
phrase, whose idiomatic meaning is die, but also transmission, which unpredictably means 'such and such car part'. We don't
think of transmission as an idiom, but it will be useful and I believe correct to include it.
It has been commonly assumed that idioms are well-formed structures, but I think lately this view has been questioned. To say
that an idiom is well formed is to say simply that it conforms to the rules of formation for the type that it is. Of course, the
idiom does not conform to all of the rules, else it would not be an idiom. What rules do idioms obey? It is useful to consider the
rules of 'form' apart from the rules of interpretation. We find idioms violating both types of rules.
The idioms that do not obey the rules of interpretation are deviant in meaning. The rules of argument structure are not obeyed in
some cases, and in others, the rules of reference are not obeyed. We might refer to both these types of idioms as 'semantic'
idioms.
It is perhaps misleading for me to say that there are idioms which do not obey the rules of form, for it is an overwhelming fact
that idioms obey the basic rules of form in a language. For example, all idioms in English obey the 'head-initial' setting of the
head position parameter. On the other hand, there are unpredictable, language-particular exploitations of the formal possibilities
in a language; we might call these idioms 'formal' idioms.
A widely shared set of assumptions about idioms is the following:
(1) Usual assumptions:
idioms are listed
idioms are well-formed phrases
idioms have empty parts:
The cat has Xs tongue
relation of idioms to syntax:
Insert the idiom, then fill in the parts
So, for example, The cat has got X's tongue is an idiomatic phrase, fully consistent with the laws of English syntax, which is
inserted in a phrase marker for an S position; further substitution of a referential NP for the position of X will yield an English
sentence.
These assumptions, though common, lead to some surprising conclusions, or at least some surprising conclusions, or at least
some surprising questions. In particular, they lead to the view that a great deal of what we have called 'rule' in syntax might
really be 'idiom'. To give an example, consider the fact that embedded questions in English must begin with a WH word. It is
well known that this is not to be described by making WH movement obligatory in English. What kind of information is it then
that [Wh-phrase S] is a (potential) embedded question in English? It COULD be that this form is an idiom:
(2) [[Wh-phrase] S]s: 'embedded question'
Under this conception, (2) could be listed in the lexicon, awaiting insertion into some embedded S position, where it
is an embedded question (as the slogan after the colon above indicates). Further substitution of some wh-
phrase for the wh-phrase position in (2) and of an S for the S position in (2) will complete the sentence, exactly parallel to The
cat has got X's tongue.
One does not ordinarily think of this as idiomatic information, even though it is language particular; rather, one thinks of this
feature of English as one of the 'parametric' possibilities. (2) is not the ordinary way to encode this information.
Example (2) may seem sufficiently different from The cat has got your tongue that it would never be misidentified as an idiom
in the same sense. However, I intend to supply enough examples of cases intermediate between the two that I cannot see how it
cannot be questioned whether (2) is the correct description or not.
2.1. Idioms are instances of well-formed structures
The view that I am presenting here would have large numbers of objects of a sometimes quite abstract character stored in the
idiom lexicon of language. This view might then be seen as converging with several recent proposals that there are
'constructions' in language, a backlash against the principles and parameters model. There is, though, a fundamental difference
between the proposal here and those proposals specifically, I hold that idioms are well formed, and that the rules of well-
formedness are simple 'parameterized' rules.
2.1.1. Filmore's examples
For example: C. Filmore (p.c.) has suggested to me that while most prepositions in English are pre-positions, some are not, and
consequently the notion that there is a uniform head position in English is not correct; the most one can say is that there are a
number of constructions in which the head is leftmost, and some others in which it is not. His candidates for postposition in
English are notwithstanding and ago:
(3a) John notwithstanding, we will go there tomorrow.
(3b) John left 3 days ago.
I believe that neither of these is a preposition, and that the notion that English is head initial can be maintained in a strong form.
Notwithstanding can be assimilated to the following constructions:
(4a) John aside, ...
That noted, ...
which are clearly small clause constructions of some kind. The difference between notwithstanding and aside is that aside can
appear in other small clause constructions, whereas notwithstanding is restricted to the context indicated:
(5a) put that aside,
(5b) *notwithstanding
We may express this restriction on notwithstanding by not listing it in the lexicon on its own, but only as a part of the following
'idiom':
(6) [[NP notwithstanding]sc S]s: 'even with NP, S'
Importantly, (6) is an instance of a structure which is well formed in English independently, namely the structure of
(4):
(7) [NP AP] S > >
[NP notwithstanding] S: 'even'
I will use the double carat sign ' ' to mean 'has a well-formed instance'.
I think ago as well has a better analysis than the postpositional one. Consider:
(8a) long ago
(8b) 5 minutes in the past
(8c) a few days before the party
It appears that time prepositions in general can take some sort of extent specification; however, this specification in
general precedes the time preposition, as we might expect specifiers to. (8c, d) shows clearly that the extent
specification is not a part of the complement structure, as the prepositions in and before have complements to their right.
Now, ago differs from these in two ways. First, it cannot have a complement to the right. This however shows nothing except
that ago is intransitive. Secondly, in the case of ago, the extent specification is obligatory. Here, if we think that extent
specification is ordinarily outside of the subcategorizational reach of a head, we might appeal to an idiom to capture this
exceptional feature of ago:
And the fact (a) is not passivizable is no surprise either, since there is no theta relation between the direct object and the rest of
the sentence.
If this account is correct, then theta structure is not necessarily respected in idioms, though it may be. A child learning an idiom
will aggressively assign it as much structure as possible according to its rules. If it can assign it a semantic argument structure
based on its syntactic argument structure, it will, but if not, then the idiom will lack argument structure.
2.2. Abstract idioms
2.2.1. Idioms with instances
If we begin with the idea that an idiom is a phrase, and that it can contain empty parts, then we immediately face the question,
how empty can an idiom be? Can it, for example, be mostly empty? The question arises sharply for an example like the English
noun pants. This noun, as is well known, is an 'arbitrary' plural; a shirt, for example, of the same genus topologically (at least
when the fly of the pants is unzipped) is singular. This is a trivial piece of idiomatic, that is unpredictable, information about
this noun, that it must be plural. Strikingly, though, this is not true of this noun alone, but in fact of every word that has the
meaning that pants has: something worn on the legs in such and such a way:
(11) pants jeans shorts cutoffs culottes bermudas
New, made-up terms for lower trunk wear must conform as well. One exception, bathing-suit, is an exception only in
that it does not specifically refer to lower-trunk wear, but rather means whatever one wears to bathe in; it is an
accidental fact of current fashion that this refers to lower-trunk wear.
Now, what sort of information is this? It is information about a general restriction on form that follows from meaning;
specifically, if a noun is going to have such and such a meaning, then it must be plural;
(12) Ns : 'lower trunk wear'
I have drawn the arrow from right to left to mean, if an item is going to have the meaning on the right, then it
MUST have the form on the left. This is a different sort of idiomatic information from knowing that kick the bucket
CAN mean die, and so the different notation.
Another example like pants is fish. Fish names, with some exceptions, are all unmarked plurals or have that as an option: trout,
bass, perch, bream, yellowtail, mahimahi. The exceptions are not really fish, by and large: whale, guppy, minnow. Other animal
families are untouched by this idiosyncracy: bee, wasp, ant. As far as I can tell, this is an unpredictable, but very general fact
about English, and counts as 'idiomatic' information about the language.
A further case of the same sort is the language-particular patterns of lexicalization identified by Talmy (1985). He found that
languages systematically differed in the kinds of verb meanings they allowed. For example, English allows verbs of motion to
indicate a means of motion. Float can be used as a directional verb, but at the same time, it indicates the manner of motion:
John floated under the bridge can mean that John moved under the bridge by floating. Spanish and French lack entirely verbs of
this kind. Flotter in French and flotar in Spanish (float) can mean only to float stationarily, and the restriction is apparently a
hard and fast one. Similarly, verbs of posture (sit, kneel, lie, etc.) differ systematically from language to language, in whether
the stative, inchoative, or causative is the basic or underived form; English, Japanese, and Spanish systematically differ in this
choice see Talmy (1985) for details.
Again, we have language-particular variation of a quite general sort. Again, the information is 'idiomatic', but the question
remains, how to represent it. We might again represent it as an idiom 'with a hole in it':
(13a) [inchoative]v : posture verb (Japanese)
where V is atomic
(13b) [motion verb] V : manner (English)
Now, the representation here I think is not so important as the question of what this sort of information is, and
especially how it is acquired. I think that language-particular patterns of the kind that have just been discussed pants
in English, the motion verbs in Romance, the posture verbs, etc. fall outside of the 'parametric' core, and yet, they are quite
general, and basically exceptionless. This means some sort of general induction, the kind that is meant to extract 'lexical
reduncancy rules', must be capable of acquiring from the data the 'exceptionlessness' of the rule. Some substitute for negative
evidence, such as counting, and statisticking, is required. And some limitation on the space of searches must hold, in order for
the induction to remain in the realm of possibility.
The most surprising cases are the cases that achieve exceptionlessness. The 'pants' idiom is exceptionless, in that any noun that
means the right thing must participate in the idiom. The present participle is another case: there are no present participles that do
not end in -ing whereas past tense forms, as is well known, are quite varied. The learner learns more than that there are no
exceptions; he learns that there can be no exceptions. We will discuss such cases further in section 4. For the moment, we note
that the formalism proposed suffices to express exceptionlessness, though it at the same time hides the learning problem implied
by them.
2.2.2. Formal idioms: Exploited and unexploited avenues
The pants phenomenon just examined is I think more widespread. Formally, the grammar permits singular nouns to have the
meaning 'such and such a type of legwear', but this avenue is unexploited in English, thanks to (12). By 'unexploited avenues' I
mean possibilities that the formal system would seem to allow, but which it does not use. In identifying such cases one always
risks missing the formal explanation for the missing possibility, though a case like pants I think clearly shows that there will not
always be one.
As one example, consider compound terms in English and French. Both languages have means of putting together words from
further words, or kind-denoting terms. In French, the syntactic system is exploited; so, one has compound terms of the following
kind:
(14) VP > > V NP > > V N > > essuie-glace 'wipe window', 'wind-
shield-wipers'
VP > > V PP > > V P N > > laisse-pour-conte 'left for count',
'abandoned one'
Here, the double carat means 'has as an instance'. So, compound terms in French are instances of syntactic
constructions. As word level items they have their own limitations (e.g., no referential material may occur in them,
and so, for example, no determiners are allowed), but they are nevertheless well-formed syntactic objects.
English, on the other hand, exploits a different system to form its compound terms:
(15) [X Y]Y > > [N N]N
The system exploited here is the affixation system in the lexicon, which is right-headed. Ordinarily, Y is a suffix, forming the
head of a word. English lets Y be a full noun, giving us compounds.
Importantly, both languages have both resources English has the same (left-headed) syntax as French, and French has the same
right-headed affixation system as English; however, they each exploit a different one of these for their compound terms. I
assume that this is 'idiomatic' that is, language-particular, but perhaps not 'parametric'.
2.3. Syntactic idioms
English embedded questions must begin with a wh-phrase. This is not to say that Wh-movement is obligatory in English, as it
clearly is not; not only do matrix questions not necessarily undergo Wh-movement, but even a wh-word in an embedded
question need not move, when, for example, another has moved, or the complementizer is already wh, like whether:
(16) Who wonders whether George saw who?
The correct generalization is as stated: a Wh-word must appear at the beginning of an embedded Question. What
sort of information is that? We might describe it as an idiom, in the sense developed herein:
(17) [wh-phrase] S : embedded question
That is, a sentence with a Wh-phrase at the beginning is a question, and nothing else is. The arrow goes backwards,
as any embedded question must have this form.
Idiom (17) is a good candidate for a 'parameter', in that there is a small number of ways that question words can be dealt with:
(1) moved to front; (2) moved to verb (as in Hungarian); or (3) left in situ. However, I think there is good evidence that idioms
just like (17) must be countenanced, ones that are not reducible to parameters of variation.
One case is the 'amount' relatives. These have the following form:
(18a) [wh-phrase S] > >
[what N S]: Little (amount relatives)
(18b) I gave him what food I had.
(19) I give him what I had.
(18b) has an implication that (19) does not have, namely, that there was little food in question.
Now, where does this implication come from? It does not come from what, which does not have this implication in general, not
even in free relatives, except in the context in (18b). Furthermore, it does not inhere in free relatives in general. In fact, it occurs
only in the structure in (18), it is idiosyncratic to that structure. Assuming that there is no parameter to set here, then this is a
learned fact about this structure.
What is interesting is how formally similar (18) is to (17). The only difference is that the semantics of (18) is very particular,
and therefore plausibly idiomatic, whereas the semantics of (17) is very general. But the formal means may be the same in the
two cases: a feature of meaning and form are connected in an idiomatic entry in the lexicon. If a learner can induce (18), it
would seem that (19) would be accessible to the same mechanism.
A related sort of case arises from Subject Aux Inversion in English; the following is an idiom of English:
(20) [V S]S : matrix yes/no question
This is comparable to (17) an obligatory idiom (that is, of the pants variety) has the effect of forcing syntactic rules to
apply. More interesting are the cases of inversion which receive a conditional interpretation:
(21a) Had I been there, this would not have happened
(21b) S > > [V S]S: conditional
> > [had...]S
> > [were...]VP
The rule of inversion gives a large number of forms which are ungrammatical in this context:
(22) *Could I write poetry, I would not be a linguist.
In fact, inversion in the conditional context works only for the auxiliaries had and were. What sort of information is this?
Importantly, the cases allowed in the construction are a subset of the cases allowed in general; hence, what is learned is that not
all the formally allowed possibilities are realized. We will adopt the following convention for representing this situation:
(23)Instance principle:
If a form to which a meaning is assigned has listed subinstances, then those
subinstances are exhaustive.
2.4. Idiom families
At one end of the language-particular information that a learner must acquire are the completely fixed expressions; at the other
end are the broad typological parameters. I have suggested that there are intermediate 'abstract' idioms pants and amount
relatives, for example, which link these endpoints with a graded continuum. In compensation for this more complicated
situation, and the more complex learning problem that it poses, I have suggested that each level of 'abstractness' must conform to
the level above it; thus we do not have a wholesale theory of 'constructions', but still a broadly parametric model. This says that
a construction say, passive must conform to the typological pattern that is determined by some parameter settings, but leaves
open the possibility that not all features of the passive construction will be determined by this conformation.
As further evidence of this view, I will discuss here some idiom families, that is, language-particular idiom patterns. Each idiom
pattern has a number of idioms as instances. The principle reason for recognizing the existence of the idiom family is that some
languages will have idioms of a certain sort, and others will lack them altogether, apparently in a way not related to the
parametric typology of the languages, though of course one could always be wrong about that for any particular case.
The most interesting sort of case I know of was pointed out to me by Martin Everaert (p.c.); the idioms are of the form:
(24) NX P NX
side by side
the two Ns are meant to be identical tokens of the same noun, as in side by side. French, English, and Dutch have these
idioms, Japanese lacks them entirely. For this reason, we would want to call (24) itself an 'abstract idiom' of the sort discussed in
previous sections.
Idiom (24) probably has as its most immediate instantiations not actual idioms, but further, more concrete, idiom families, one
for each P that participates:
the other hand, both prepositions must be present: de part en part ('limb from limb') (V. Deprez, p.c.; T. Hoekstra has pointed
out to me the existence of heure par heure and côte à côte).
The various subcases of (24) do not have a common element of meaning. So, for example, cheek to cheek refers to the pressing
together of two cheeks, as in dancing; but minute to minute and day by day refers to a series of days in sequence. Even the
instances sharing a common proposition do not have a completely shared element of meaning, as the two instances with to just
cited show. Therefore, these forms are not compositional, despite clear patterns in the meaning.
On the other hand, from the fact that they are so prevalent in one language, and absent altogether in another, we know that they
are present as a group in some sense. It seems unlikely that there is a parameter for this property alone; perhaps it follows from
some other parameters, though it is hard to see how.
An alternative is that the structure in (25)(27) is induced from the data of the language. How could this happen? Suppose that at
a certain point in the course of acquisition, some number of forms with the shape NxPNx have been learned. As a class, they
conform to the shape of leftheaded compound prepositional phrases, and so do not fall outside of the language altogether. On
the other hand, their properties are not entirely explicable in terms of the general principles of the grammar; in particular, the
use of bare singular count Ns as the objects of prepositions is not a general feature of prepositional phrases in English. So these
remain idiomatic; however, they are idiomatic as a class, not as individuals.
Another family of idioms is illustrated in the following:
(29) John hunts bear
snares rabbit
traps monkey
(a) *John hunts book
*I am going to grade paper
(b) *John counts monkey
(c) *hunt sleepy elephant
What is special about this case is the use of the bare singular as object. Normally, this is not allowed for English
count nouns, but is allowed here. So, we have an abstract idiom, of the following form:
(30) [V N]VP
+ HUNT +ANIMAL
The form in (30) is a special case of the general form of VP (and so, for example, is 'V N' and not 'N V'), and thus
conforms to our overall claim that abstract idioms are always instances of more general patterns of the language.
The limits of this idiom are somewhat roughly indicated by (b) and (c) the verb must be a verb of hunting, and the
object must be an animal (d) shows a further restriction not only are determiners excluded, but adjectives as well
(unless 'sleepy' is a kind of elephant).
One might conclude from this the observation that this construction was in fact a lexically compound verb to 'hunt monkey' then
would be syntactically intransitive. I doubt this, since English in general disallows compound verbs, and particularly disallows
left-headed compound verbs; but even if the conjecture were correct, then the problem posed by these examples is not solved,
but simply delivered to the lexicon, with fee still unpaid.
I assume that it is not at all predictable that English would have this pattern; and in fact, inspection of (b) and (c) might lead one
to not expect this pattern. I conclude therefore, that learning (30) entails generalizing over examples like those in (29), and the
limits of the generalization must follow in some way from the actual mechanism of generalization.
The existence of these idioms of intermediate abstractness argues that learning language does not reduce to (a) learning
parameter settings, and (b) learning the properties of particular lexical items. Rather, there are structures between these two
extremes, what I have called abstract idioms, which can only be learned as language-particular generalizations of particular
forms.
3. Paradigms
A paradigm is a multidimensional array of linguistic forms for example, a verb conjugation, or a Latin noun declension. A
paradigm is not just a convenient way to display linguistic information; rather, it is a basic form of linguistic knowledge. It
interests here because it is highly language-particular, and at the same time, quite abstract in structure. Paradigmatic structure is
also pervasive.
Example (a) below is a slice of the Latin verbal conjugation:
(31) Latin:
(a) + - finite + - indicative (b) amo amamus
+ - passive + - perfective amas amatis
pres/imperf/fut amat amant
+ - plural 1/2/3person
The Latin verbal conjugation is 8-dimensional, with the dimensions listed on the left.
It is not possible to say in a general way how many dimensions a paradigm will have, nor how many points on a given
dimension, nor what the dimensions will 'mean', or what syntactic or semantic categories the points on a dimension will be taken
as signifying.
There are several levels of abstraction involved in paradigms. At the lowest level, we have word-paradigms, such as in (31b). At
a slightly higher level of abstraction, we have paradigms in the traditional sense, roughly speaking, sets of endings:
(32) -o -mus
-as -atis
-t -ant
At a slightly greater level of abstraction, one might regard a paradigm as a set of rules which, when applied to a
stem, derive a word paradigm, by, for example, adding endings. So there is a rule for forming the past tense, a rule
for forming the third person present tense, etc. In fact, though, I believe that a paradigm is even more abstract than
that a paradigm is a patterning which is more abstract than any set of forms, any set of endings, or any set of rules
for filling the slots in a paradigm.
That the paradigm is a real object, and not the epiphenomenal product of various rules, is shown by the phenomena of blocking,
syncretism, suppletion, and paradigm defectiveness, as argued in Halle (1973).
To consider only the first of these: if there are two rules for filling a slot in a paradigm, only one may be used; thus, we have
bit, not bited, and in general, only a single past tense form for a given verb, despite multiple ways to form past tenses. This
reveals that there is a target slot to fill, which is independent of the rules for filling it, and that slot is given by the paradigm.
3.1. Extensiveness
We tend to think of paradigms as a means of displaying inflectional information about the parts of speech. But in fact, the
notion of paradigm is much broader than this. To begin with, paradigms must include syntactic items, or phrases, as well as
words. To see this, we need only examine a slice of the Latin verb paradigm:
active
amo
(33) amavi passive
present amor
present amatus sum
Here the forms are all 1st singular, present-perfective crossed with active-passive. One corner of this square contains
a phrase, amatus sum, while the other three corners contain words. This shows that phrases form an inextricable part of
paradigmatic information. If we removed the perfective passive form, we would have destroyed the symmetry of the paradigm,
which is otherwise perfectly symmetrical.
We can see the same thing in an English paradigm, the comparative paradigm:
(34) A COMP SUPER
long longer longest
compact more compact most compact
good better best
The rule is, if an adjective is mono- (or nearly) syllabic, then form the comparative with -er; if not, then the
comparative and superlative are formed phrasally. The existence of this paradigm is what permits us to speak of 'the
comparative' of an adjective, even though there are two ways of forming comparatives. Many languages lack any way at all to
form the comparative; English has two ways, one morphological, the other syntactic.
Paradigms include not just inflectional dimensions, but what have been called 'derivational' processes as well. I am sure that
there is no distinction between derivational and inflectional morphology, but if there is, then paradigms are found in both
morphologies.
The terminal nodes are the actual cells of the paradigm. The starred nodes are the nodes to which actual forms are assigned. By
convention, a cell is filled by the nearest specified form above it. The identity of the forms and the points in the tree at which
they are mapped are given in (35). The assignment shown is the most economical, as each form is assigned once.
We might call the starred nodes 'entry points' these are the points at which concrete forms are specified. The tree along with the
starred nodes we
might call the 'pattern of syncretism'. This is a pattern which is independent of the rules for creating the forms in the pattern. It
is a part of the formal structure of the paradigm. Strikingly, the pattern of syncretism holds generally in a category, not just for
particular verbs, as well will see.
If we look at the pattern of syncretism for a variety of English verbs, a startling pattern emerges:
Here, marked on the same tree, are the entry points for several categories of verbs, including irregular verbs. As can be seen, the
sets of entry points form a nested set; the verb be shows the most distinctions, and consequently has the most entry points, but
all other verbs, including all irregulars, have some subset of the entry points of be. It is far from obvious that such a relation
should exist if a verb is going to be irregular, why should it not be irregular in having a different pattern of syncretism, a
different set of entry points? But this does not happen, even irregular verbs respect the pattern of the language as a whole. In
fact, even suppletive verbs, the limiting case of irregularity, respects the pattern of syncretism; the verb go has went as its past
tense form. Things could have been different: went could have been the third past plural form, with goed (or something else) for
all the other forms; but then, go-went would have violated the language-wide pattern of syncretism.
Hence, the pattern of syncretism is a quite abstract structure, standing above particular words, particular rules, particular
suppletive relationships.
We can see this further in the noun declensions of Latin. Latin has 5 declensions, each with its own set of endings (we ignore
here the genitive and the vocative):
Latin declension structure:
principle syncretisms:
1st decl: pl. indirect (-is)
2nd decl: pl. indirect (is), sg. indirect (-o)
neuter: above + direct sg. (-um), pl. direct (-a)
3rd decl.:pl. indirect(-ibus)
4th decl.:pl. direct (-es) pl. indirect(-ibus)
neuter: sg. (direct = indirect) (-ø), pl. indirect, pl. direct
Here, the nominative and accusative have been grouped together as 'direct', and the ablative and dative as 'indirect'. The reason
for this is that this reflects the patterns of syncretism: nominative and accusative fall together sometimes, and dative and
ablative do as well.
A striking thread that runs through the entire set of declensions is the indirect plural syncretism. In the singular, there is an
indirect entry point for 2nd and 4th neuter; the plural syncretism holds across the board. Importantly, this generalization is
independent of the rules for forming the indirect forms, for in fact there are two different rules for that: in the first and second, -
is is affixed; whereas in the 3rd and 5th, -ibus is affixed. Hence, the pattern is more abstract than the rules or affixes. I would
suggest that a pattern is abstracted that applies to all the declensions, essentially the tree structure in 1. Other declensions will
make further syncretisms, but this one will hold for all.
A general conclusion we may draw is that when there are multiple related paradigms, there will be one instantiated paradigm,
and all others will have its syncretic structure, and perhaps some more. But no other related paradigm will have a contrary
syncretic structure, making distinctions where that one does not. We will call that one paradigm the basic paradigm.
For the Latin nominal declension, the first declension is the basic paradigm. For English verbs, the verb to be is the basic
paradigm.
Let us now consider Latin verbs. Every finite Latin verb form has a different form, so there is no syncretism at this level. But at
the level of stem, there is a paradigm structure with a limited number of entry points. Below is a chart of the stem forms for
various classes of Latin verbs:
The asterisked positions represent the entry points for the regular verbs of all four conjugations. This alone is striking, for again,
there are different rules in the different conjugations for yielding the forms at these entry points. The first and second
conjugation form the future by suffixing '-b-', whereas the third suffixes nothing, but switches the conjugational class of the
basic stem. Neverthess, that the future is an entry point is common to all the conjugations.
The verb esse has the most entry points, and every other verb uses some subset of those entry points. Thus the conjugation of
esse is the basic conjugation, to which all the others are related.
As a final example, we consider Anderson's (1984) description of Georgian verb conjugation. The system is quite complex;
Anderson's account uses blocks of rules, both conjunctively and disjunctively ordered. It is my contention that such a system
will fail to capture the most abstract patterning of a paradigm, which, as we have seen, is generally independent of affixes or
rules. Some indication that this is so can be derived from the following table of affixes (or, to use Anderson's account more
directly, of rules for adding affixes) for present and past:
(38) present past
-en 3pl subj -es 3pl subj
-t pl -t pl
-s 3rd subject -s 3rd subject
The rules are quite similar, and Anderson's remark 'This -es rule has the same status as the -en rule in this block, and also
takes precedence of the -s and the -t rules ' (1984: 8) shows I believe that an abstract structure is being replicated in different
parts of the paradigm, thus underscoring the independence of that structure from actual rules or affixes.
3.3. Learning paradigm structure
We ask at this point, why are these patterns of syncretism in language? Why is a pattern of syncretism replicated accross
different modes of realizing paradigm cells?
I speculate that it is the acquisition of paradigm structure that is responsible for this arrangement.
Pinker has demonstrated how the blocking principle will give rise to the development of paradigm structure in the language
learner. The basic idea is that whenever the language learner has been forced to posit two items to fill a single cell, he is then
motivated to split the paradigm (really, to double it) so as to avoid violating the blocking principle:
The paradigm now has a new dimension, and a whole new set of cells; the language learner now must learn the 'significance' of
the dimension, and fill in the rest of the cells.
I think that this algorithm for building paradigms, combined with the notion of paradigm structure I have just outlined, will
predict some of the patterns we have observed.
There are two sorts of splitting than can take place intra- and inter-paradigm splitting. Given a present tense paradigm for some
verb, 'see,' for example, the recognition that 'saw', like 'see', can be used for the 1st person singular, leads to a postulation of a
past tense plane, in which to locate 'saw'. This is intra-paradigm splitting. The other kind of splitting might be called
'declensional' splitting: given a set of endings, say the endings for the 1st declension in Latin (-a, -am, -ae, -a, etc.), the
recognition that the ending -us can signify nominative singular, just as -a does, triggers the splitting of the nouns into (at least)
two declensions. The difference between this and the first case is that in the first case, a given word will have forms in every
cell of the new and old plane, but in the second, a word will have forms in only a single declension. We may nevertheless
consider the splitting to be formally the same in the two cases, and this is supported by the observation that syncretism patterns
the same in the two cases.
An example of a 'declensional' split is the comparative paradigm in English; the paradigm is a linear three-point paradigm:
adjective, comparative, superlative. But there are two modes of forming members, as we have seen one for monosyllabics and
simple disyllabics, and another for everything else. A telling point which shows that we are dealing with a declensional split
here is that the same criterion that is used to determine whether the comparative is A-er or 'more A' is used to determine whether
the superlative is A-est or 'most A'. This criterion is not therefore a part of the rules themselves, but is rather a general criterion
of membership in the two declensions; much as 'feminine' is a criterion for membership in the Latin 1st declension.
Now, suppose that a learner has learned a piece of paradigm structure, and has learned not only the labels for the dimensions,
but has also learned the positioning of the entry points, which I have called the pattern of syncretism. Suppose further that when
the paradigm is split, that this abstract pattern of syncretism is replicated along with the cells themselves:
If this is done, then we will expect to see patterns of syncretism recurring. The following prediction is made: whatever paradigm
is learned first will embody
the most distinctions. This is because cells which are designated identical by the pattern of syncretism of the first pattern will
remain identical in latter versions. There may be fewer distinctions in later folds, but not more.
Thus for example when the learner learns the first declension, with its indirect plural syncretism, and then learns that there is a
second declension, that same indirect plural syncretism will show up in the second declension as well, copied as a part of the
abstract structure of the first declension.
4. Learning words
4.1. Learning morphemes
The notion of an abstract paradigm, the blocking principle, and paradigm splitting may account for how a paradigm is
elaborated, but what accounts for the identification of a (potential) paradigm in the first place? One ordinarily thinks of verbal
paradigms, realizing person, number, tense reference, etc., as the typical paradigm, but in fact languages have novel paradigms
that it is unlikely could be specifically anticipated in UG.
Suppose that one component of the learner is a device that uses extra cycles in the child's computational life to track down
statistical correlations among various properties of its thus far stored linguistic units. What the set of properties is will not detain
us here. The child learning English, for example, will discover a correlation between words ending in -y and adjectivehood:
(41a) fishy, lumpy, lucky, speedy, etc.
(41b) dainty, pretty, happy, etc.
Many adjectives do not end in -y, and many words ending in -y are not adjectives, but the probability that a word is an
adjective increases once one knows that it ends in -y.
Note that this is true for a large class of adjectives where the -y does not serve as a suffix (the second group). Even excluding
the cases where -y is an affix, there is a correlation.
A battery of such correlations could serve the next step: to identify morphemes, and assign them properties. In the case of -y ->
A, there are sufficient cases to warrant postulating a suffix with the category adjective, and assign it wherever possible that is,
wherever an independent stem exists.
Even where the analysis into morphemes does not hold, the information of the correlation has been noted; it seems unlikely that
this information is forgotten once the morpheme has been established.
Another example in English of a no-go morpheme is the suffix -tude; most of the 50 or so members do not admit an analysis
into morphemes:
(42) altitude attitude platitude, etc.
And yet, this suffix so strongly marks nounhood that there is not a single verb or adjective with this ending.
The number of examples it takes to establish a correlation between two properties is quite small. For example, the English noun-
deriving suffix -al has fewer than 20 instances; and yet, the correlation between suffix and category seems firmly established for
all speakers, as well as the restriction that the stem must be stress final, a separate learned correlation:
(43) betrothal, avowal, approval, removal
The confidence of these identifications is surprising given the small number of cases involved.
So, there are two levels of analysis so far; first, the identification of correlating properties, and second, the postulation of
analysis that arises from this. Units will be postulated that account for the correlation.
There is a further level of analysis, I believe. In some cases, it is determined by the learner not just that two properties correlate,
but that one of the properties is uniquely correlated with the other property. We might suppose that children are built to look for
this especially.
Several examples have been examined in this paper. One is the idiom Ns: 'wear on the legs'; the property of meaning 'wear them
on your legs' is uniquely correlated with nouns that end in 's'. Another is the progressive, which is uniquely corellated with V +
ing.
I believe that the uniqueness has special salience. It is what we might consider the 'ideal' case, and so the first sought and most
readily accepted. We know, for example, that the past tense forms of verbs in English are irregular, and some are in fact
suppletive. Given this, we are not surprised so much to learn new verbs with irregular past tenses. However, I think we would
resist entirely learning a new verb with a suppletive progressive, a progressive that did not end in -ing.
productive over the entire set of stems in English it is of course only in the adjectives that it shines. So -ness and -ion are
equally productive, we might say, over their own classes the class for -ness is adjectives, and the class for -ion is Latinate verbs.
Given this, we might wonder, why isn't every affix completely productive within some arbitrarily drawn subclass of the lexicon,
say the class of things to which it does attach? I think the answer is that subclasses cannot be arbitrary. The Latinate class is
productive for -ion precisely because it can be identified independently of the occurences of -ion: it can be identified as the class
to which -ive attaches, or perhaps it can be identified in some way along the lines of (46). In either case, we must attribute to the
language learner the ability and the inclination to compare subclasses, and look for high matches. When a high match is found,
then a dimension of a paradigm has been identified.
References
Anderson, S., 1984. On representation in morphology. NLLT 2, 157218.
Disciullo, A.M. and E. Williams, 1986. On the definition of word. Cambridge, MA: MIT Press.
Fiengo, R., 1974. Semantic conditions on surface structure. MIT dissertation.
Halle, M., 1973. Prolegomena to a theory of word formation. Linguistic Inquiry 4, 316.
Talmy, L., 1985. Patterns of lexicalization. In: T. Shopen (ed.), Language typology and language description. Cambridge, MA:
Cambridge University Press.
Williams, E., 1981. On the notions 'lexically related' and 'head of a word'. Linguistic Inquiry 12, 245274.
1. Introduction
English is particularly rich in verbs with both transitive and intransitive uses where the meaning of the transitive use of a verb V
can be roughly paraphrased as 'cause to V-intransitive'. Such verbs are illustrated in (1) and (2), where the transitive (a)
sentences might be paraphrased in terms of the intransitive (b) sentences; that is, as 'Antonia caused the vase to break' and 'Pat
caused the door to open'.
* This work was presented at the Workshop on the Acquisition of the Lexicon at the University of Pennsylvania in
January, 1992. We would like to thank the other workshop participants and particularly Tony Kroch, the paper's
discussant, for their comments. This work was also presented at Hebrew University in March, 1992; we are grateful to the
audience for their comments. This paper has also benefited from the comments of Mary Laughren, Steve Pinker, Betsy
Ritter, and an anonymous reviewer. We would like to thank John Wickberg for helping us find relevant examples in on-
line texts. This research was supported in part by NSF Grant BNS-8919884.
For this reason, before presenting our analysis of the causative alternation, we provide a survey of certain properties of the
causative alternation aimed at providing a contribution towards filling this gap in our understanding. This facet of our
investigation focuses on two related questions: (i) Is it possible to delimit semantically the class of verbs which participate in the
alternation? and (ii) Do all examples of the causative alternation as defined above represent instances of a single phenomenon?
Answers to these questions will not only help us understand the causative alternation itself, but they should also deepen our
understanding of the nature of lexical representation and its relation to syntactic structure.
In this paper, we hope to show that the phenomena that fall under the label 'causative alternation' are on the one hand less
idiosyncratic and on the other hand less uniform than is typically believed. We suggest that much of the data we investigate is
explained once we distinguish two processes that give rise to transitive and intransitive verb pairs.4 The first, and by far more
pervasive process, is the one which forms lexical 'detransitive' verbs from some transitive causative verbs. The second, which is
more restricted in its scope, forms causative verbs from some intransitive verbs. With respect to intransitivity, we hope to
provide further insight into the semantic underpinnings of the Unaccusativity Hypothesis, the hypothesis proposed by Perlmutter
(1978) that the class of intransitive verbs consists of two subclasses, each associated with a distinct syntactic configuration.
Finally, as in our previous work, we hope to show that if the relevant aspects of meaning of a verb (or class of verbs) are
properly identified, many of the apparent idiosyncratic properties of that verb (or verb class) fall into place.
there are many verbs in English which occur in the transitive/intransitive pairs characteristic of this alternation. A preliminary
list of such verbs is given below.
(3) bake, bounce, blacken, break, close, cook, cool, dry, freeze, melt, move,
open, roll, rotate, shatter, spin, thaw, thicken, whiten, widen,
Furthermore, the counterparts of these verbs in other languages occur in transitive/intransitive pairs characterized by
the same semantic relationship. In some languages, as in English, the relation is not morphologically mediated; see
the Basque example in (4).5 In other languages, the relation is morphologically mediated in some way, as in the French
example in (5), where the reflexive clitic se is associated with the intransitive member of the pair.6
(4a) Mirenek atea ireki du.; Miren-NORK door-NOR open 3sNOR-have-
3sNORK; 'Miren opened the door.'
(4b)Atea ireki da.; door-NOR open 3sNOR-be; 'The door opened.'
(5a)Marie a ouvert la porte.; 'Marie opened the door.'
(5b)La porte s'est ouverte.; 'The door opened.'
The existence of this phenomenon in a wide range of languages suggests that the causative alternation is not
idiosyncratic to English.
Studies of the causative alternation going at least as far back as Jespersen (1927) have suggested that this alternation is found
with a semantically
5 In Basque the change in transitivity is accompanied by a change in the auxiliary accompanying the verb. Simplifying
somewhat, the transitive use selects the transitive auxiliary ukan 'have', while the intransitive use selects the intransitive
auxiliary izan 'be'. Thus the difference in auxiliary reflects general properties of Basque and not properties of the
alternation. The labels 'NOR' and 'NORK' are the traditional names for the cases associated with the noun phrases in the
examples. See Levin (1989) for more discussion.
6 For more on the morphological relationships between the verb forms in the transitive and intransitive variants of the
causative alternation, see the discussion of Nedjalkov (1969) and Haspelmath (1993) at the end of section 4.
coherent class of verbs. In order to determine whether this suggestion receives support, we can ask the following rather
simplistic questions: (i) Do all intransitive verbs have transitive counterparts with the appropriate paraphrase? and (ii) Do all
transitive verbs with a causative meaning have intransitive counterparts with the appropriate meaning? We begin with a
discussion of the first question.
The following examples show that there are undoubtedly intransitive verbs which do not have transitive causative counterparts.7
(6a) The children played.
(6b) *The parents played the children.
(cf. The parents made the children play.)
(7a) The actor spoke.
(7b) *The director spoke the actor.
(cf. The director made the actor speak.)
(8a) The audience laughed.
(8b) *The comedian laughed the audience.
(cf. The comedian made the audience laugh.)
These examples might suggest that agentivity is the crucial factor and that agentive verbs do not participate in the
alternation, while non-agentive verbs do. As it happens, both suggestions are wrong. There are agentive verbs which
do show the causative alternation, as in (9) and (10), and non-agentive verbs which do not, as in (11)(14).
(9a) The soldiers marched to the tents.
(9b) The general marched the soldiers to the tents.
(10a) The horse jumped over the fence.
(10b) The rider jumped the horse over the fence.
(11a) The cactus bloomed/blossomed/flowered early.
(11b) *The warm weather bloomed/blossomed/flowered the cactus early.
(12a) The neglected wound festered.
(12b) *The heat and dirt festered the neglected wound.
7 Some English intransitive verbs without transitive causative counterparts are used transitively in the resultative
construction, but in this construction such verbs do not have the transitive causative meaning which the alternating verbs
have. Consider the verb laugh in the resultative construction The crowd laughed the actor off the stage. This construction
does not mean that the crowd made the actor laugh, which would be the interpretation that would parallel the intended
interpretation of (8b), but rather that the crowd laughed.
The examples in (15) and (16) illustrate a further complication involving the transitive use of agentive verbs of manner of
motion: the directional phrases which are optional in the intransitive use of these verbs are obligatory in their transitive use.8
(15a) The soldiers marched (to the tents).
(15b) The general marched the soldiers to the tents.
(15c) ??The general marched the soldiers.
(16a) The horse jumped (over the fence).
(16b) The rider jumped the horse over the fence.
(16c) ?The rider jumped the horse.
The behavior of the agentive verbs of manner of motion contrasts with that of non-agentive verbs of manner of motion, which,
as shown in (17), do not require a directional phrase in either their transitive or intransitive use.
(17a) The ball bounced/rolled (into the room).
(17b) The boys bounced/rolled the ball (into the room).
Although various researchers have commented that the alternation as manifested by agentive verbs of manner of motion is
qualitatively different
8 There may be some disagreement about whether the directional phrases are absolutely necessary in the transitive
causative uses of these verbs, particularly with a verb like jump. But even if these phrases need not be expressed in certain
circumstances, they are always understood in the transitive causative use. A speaker who accepts (16c) still cannot give
this sentence the interpretation that the rider made the horse jump in place; rather this sentence receives the interpretation
involving the directional phrase: the rider made the horse jump over something. We look at this issue in more detail in
section 8, where we also discuss some verbs of manner of motion that do not have causative forms even in the presence of
directional phrases.
Verbs of manner of motion are not unique in imposing the directional phrase requirement. The behavior of agentive verbs
of position parallels that of agentive verbs of manner of motion in that they can have a causative variant only in the
presence of a directional phrase, which gives them an 'assume position' reading: *Maude stood the baby versus Maude
stood the baby on the table. We do not discuss this data here because this class of verbs presents a number of
complications. See Levin and Rappaport Hovav (to appear) for more discussion of verbs of position, as well as a
discussion of a directional phrase requirement that surfaces in certain circumstances with verbs of emission.
from that shown by verbs such as break (Cruse 1972, Hale and Keyser 1987, among others), we include this alternation among
the data that needs to be accounted for since the general form of the alternation is the same: the transitive and intransitive uses
of these verbs differ with respect to the notion of 'cause'. Aside from Pinker (1989), previous researchers have taken the central
property of these verbs to be that when intransitive they require agentive subjects, noting that this property appears to be carried
over to the object of their transitive causative use. This work disregards the change in status of the directional phrase. In
contrast, we believe that the directional phrase is the key to explaining why these verbs show the alternation. On the other hand,
the contrast between (15)(16) and (17) suggests that, although there are agentive verbs which participate in the alternation as we
have initially defined it, this alternation may be an instance of a different phenomenon, as we propose in section 8.
Jespersen (1927) calls the class of causative alternation verbs the 'move and change' verbs, because it includes a variety of verbs
of motion and verbs of change of state. The list of alternating verbs presented in (3) can easily be divided into two subclasses
along these lines:
(18a) bake, blacken, break, close, cook, cool, dry, freeze, melt, open, shatter,
thaw, thicken, whiten, widen,
(18b)bounce, move, roll, rotate, spin,
To the extent that verbs of motion involve a change of position (though not necessarily a translation through space), the set of
'move and change' verbs might be given the unified characterization 'verbs of change'.
This semantic characterization, although on the right track, is nevertheless inadequate. As we will see, change of state verbs do
constitute the core of the class of intransitive verbs which alternate. However, to the extent that verbs of manner of motion like
run are verbs of motion, it remains to be explained why they cannot appear in this alternation without directional phrases (in
contrast to non-agentive manner of motion verbs like roll). There are also verbs manifesting the causative alternations which
cannot be readily characterized as verbs of change. These include verbs of sound and light emission and verbs of position.
(19a) The bell buzzed/rang.
(19b) The postman buzzed/rang the bell.
(20a) The flashlight beamed/shone.
Furthermore, different classes of verbs participate in the alternation to varying degrees, a fact which itself is in need of an
explanation. Verbs of change figure most prominently and most regularly in the alternation. Some, though by no means all,
verbs of emission whether they describe the emission of sound, light, smell, or substance can alternate. We have presented
examples that show that among the verbs of light emission, the verbs beam and shine alternate, but the verbs glitter and sparkle
do not. Similarly, among verbs of sound emission, the verbs buzz and ring can alternate, but the verbs burble and roar do not.
Verbs of position allow the alternation rather freely. Not only hang, but also the verbs lean, sit, and stand allow the alternation,
although a few verbs of position, including slouch and loom, do not. The behavior of slouch is particularly interesting since this
verb is rather close in meaning to lean.
(22a) The ladder leaned against the wall.
(22b) I was leaning the ladder against the wall.
(23a) The surly youth slouched against the wall.
(23b) *I slouched the surly youth against the wall.
(24a) The bear loomed over the sleeping child.
(24b) *The giant loomed the bear over the sleeping child.
To summarize, our discussion so far has focused on the first question: whether all intransitive verbs have transitive counterparts
with the paraphrase appropriate to the causative alternation. We have seen that the intransitivity of a verb is not sufficient to
ensure its participation in the alternation. Nor is the semantic notion 'change' sufficient, since although verbs of change are
generally found in this alternation, intransitive verbs of other types differ in their behavior with respect to the alternation, even
when they are members of the same semantic class. Some other properties besides intransitivity and 'change' must be found, and
presumably the properties isolated will help to explain the behavior of the verbs in the different classes.
We turn now to the second question: whether all transitive verbs whose meaning involves a notion of 'cause' have related
intransitive uses that lack this notion. Again, the answer is 'no'. There are verbs which meet the semantic criterion, but which do
not have related intransitive uses. Examples
include the verb cut, which Hale and Keyser (1987) define as in (25), or kill, which has been defined albeit controversially as
'cause to die' (Lakoff 1970, McCawley 1968, among others).
(25) cut: [x cause [y develop linear separation in material integrity], by sharp
edge coming into contact with latter]
(Hale and Keyser 1987: (10))
(26a) The baker cut the bread.
(26b)*The bread cut. (on the interpretation 'The bread came to be cut')
(27a) The terrorist killed the politician.
(27b)*The politician killed.
Verbs close in meaning to cut such as slice or carve do not show the alternation; neither do verbs related to kill, such as murder
and assassinate.
(28a) The chief sliced/carved the turkey.
(28b) *The turkey sliced/carved.
(29a) The terrorist assassinated/murdered the politician.
(29b) *The politician assassinated/murdered.
Moving to other domains, verbs of creation also do not participate in the alternation, although creation is sometimes described
as 'cause to exist' or 'cause to come to be' (e.g., Dowty 1979: 91).
(30a) Anita Brookner just wrote a new novel.
(30b) *A new novel wrote.
(31a) The contractor built another house.
(31b) *Another house built.
Even more interesting is the fact that many morphologically complex English verbs formed with the suffixes -ize and -ify lack
intransitive counterparts,9 although these suffixes can be considered to be 'causative' affixes. (In fact, -ify comes from the Latin
word for 'make/do'.) Consider the examples below:
(32a) The farmer homogenized/pasteurized the milk.
(32b) *The milk homogenized/pasteurized.
However, some of these morphologically complex verbs have intransitive counterparts of the appropriate type:
(34a) I solidified the mixture.
(34b) The mixture solidified.
(35a) The cook caramelized the sugar.
(35b) The sugar caramelized.
The behavior of -ify and -ize verbs contrasts strikingly with that of English verbs formed with the suffix -en. The suffix -en is
also arguably a causative suffix, but verbs with this suffix appear to show the causative alternation rather more freely.
(36a) I ripened the bananas./The bananas ripened.
(36b) I loosened the rope./The rope loosened.
(36c) John thickened the sauce./The sauce thickened. (Lakoff 1968: (37a), (4a))
As part of a study that attempted to identify causative alternation verbs automatically in a machine-readable version of the
Longman Dictionary of Contemporary English (Procter et al. 1978), Fontenelle and Vanandroye (1989) found that only 14 out
of the 82 -ify verbs in that dictionary participated in the alternation, contrasting with 46 out of the 84 -en verbs. Unfortunately,
they did not provide figures for -ize verbs, but an examination of the machine-readable version of a comparable dictionary, the
Oxford Advanced Learner's Dictionary (Hornby 1974), suggests that 14 out of the 78 -ize verbs listed as headwords in this
dictionary participate in such pairs.10 The contrasting behavior of these morphologically complex verbs formed with 'causative'
suffixes again calls into question the existence of a correlation between the presence of a notion of 'cause' in a verb's meaning
and a verb's ability to show the alternation. It appears that neither intransitivity nor a meaning involving 'cause' is sufficient to
ensure participation in the alternation.
10 The small number of -ify and -ize verbs listed in these dictionaries can be attributed to their intended function: these
dictionaries are relatively small dictionaries designed for learners of English. However, a preliminary examination of a
more extensive list of such verbs suggests that the number of alternating verbs really is not that high.
Before presenting our own account of the alternation, we turn to an examination of an additional factor that intervenes in
determining participation: selectional restrictions. The shared semantic relation between the transitive and intransitive variants of
causative alternation verbs has sometimes been demonstrated via the existence of selectional restrictions that are shared by the
subject of the intransitive use and the object of the transitive use (Fillmore 1967, among others). For example, only physical
objects with certain characteristics can break, a property reflected in the set of possible objects of transitive break and possible
subjects of intransitive break.
(37a) Antonia broke the vase/the glass/the dish/the radio.
(37b) The vase/the glass/the dish/the radio broke.
(38a) *Antonia broke the cloth/the paper/the innocence.
(38b) *The cloth/the paper/the innocence broke.
Assuming that selectional restrictions reflect the meaning of a verb, then this pattern of selectional restrictions reflects the fact
that both variants share a common core of meaning.
However, the extent to which selectional restrictions are shared across such pairs is not as great as is often thought. Smith
(1970), whose study of the factors that determine participation in this alternation we come back to in section 3, points out that
some intransitive verbs that typically do not enter into such alternations may enter into them for certain specific choices of
subjects of the intransitive use, as shown in the following examples.
(39a) The baby burped.
(39b) The nurse burped the baby. (Smith 1970: (36a))
(40a) The doctor burped.
(40b) *The nurse burped the doctor. (Smith 1970: (36c))
(41a) The bell buzzed.
(41b) The postman buzzed the bell.
(42a) The bees buzzed.
(42b) *The postman buzzed the bees.
The examples with the verbs burp and buzz show that selectional restrictions need not be identical for the corresponding
arguments in the transitive and intransitive uses. In these examples, the set of possible objects of the transitive use are a subset
of the set of possible subjects of the intransitive use.
The lack of common selectional restrictions is even more pervasive. There are also instances of the reverse phenomenon: a verb
which when used transitively is found with a set of objects that is larger than the set of subjects the same verb allows when used
intransitively. To take one example, consider the verb clear, a deadjectival verb that presumably means 'cause to become clear'.
This verb is found in causative pairs as in (43), yet, although one can clear a table or a sidewalk, the table and sidewalk can't
'clear', as shown in (44).
(43a) The wind cleared (up) the sky.
(43b) The sky cleared (up).
(44a) The men cleared the table/the sidewalk.
(44b) *The table/the sidewalk cleared.
A similar example involves the verb peel. This verb does not alternate at all in its most literal sense 'remove peel from a fruit or
a vegetable', although it can be used intransitively to describe the removal of skin a 'peel'-like covering from a body part. The
intransitive use of peel seems even to be preferred in the use in (46).11
(45a) I peeled the orange.
(45b) *The orange peeled.
(46a) ?I peeled my nose.
(46b) My nose was peeling.
The examples in (43)(46) show that for some causative alternation verbs the selectional restrictions on the object of the
transitive and the subject of the intransitive do not always coincide exactly.12 The transitive object or the intransitive subject
may show narrower restrictions. Presumably, for those choices of arguments where these do not have transitive or intransitive
uses, they lack them for the same reason that some verbs never have them.
To summarize, an account of the causative alternation as defined in the broadest sense must explain why some verbs show this
alternation freely, why
11 This example was inspired by a similar example in Rothemberg (1974), a study of a comparable phenomenon in
French, which includes many examples of diverging selectional restrictions.
12 It is possible that a closer examination of a wide range of verbs may show that the selectional restrictions do not
coincide for any verb. For instance, as pointed out by Brousseau and Ritter (1991), there are even senses of the verb break
where the overlap is not complete: He broke his promise but *His promise broke.
some verbs do not show it at all, and why some verbs show it under restricted circumstances. Finally, such an account must
grapple with the issue of whether the data discussed in this section represent a unified phenomenon or not.
intransitive use that is basic, but for the verb peel it is the transitive use that is basic. The question to be asked in such instances
is what aspect of verb meaning determines that peel is basically transitive, while buzz is basically intransitive.
The selectional restriction criterion still leaves open the issue of those verbs that appear to have similar selectional restrictions
for both the transitive and intransitive uses, such as break or open. (Although given the comment in footnote 12, it is possible
that for all verbs the selectional restrictions in one variant are looser than those in the other.) In order to isolate the meaning
components which determine the (in)transitivity of a verb, we compare verbs like break that permit transitive and intransitive
uses, to verbs such as laugh, cry, or glitter that permit only intransitive uses (except perhaps under very special circumstances).
(In section 6 we will address the issue of what distinguishes the break verbs from transitive verbs like cut and write, which have
only transitive, but not intransitive, uses.) The question is what makes verbs like break on their intransitive use different from
these other verbs? Here we draw on Smith's (1970) insightful discussion of the semantic factors that play a part in determining
which verbs that can be used intransitively have transitive causative uses.
Smith characterizes the difference between those intransitive verbs which do and do not have transitive causative uses by means
of a notion of 'external control'. Verbs like break, Smith proposes, denote eventualities that are under the control of some
external cause which typically brings such an eventuality about. Such intransitive verbs have transitive uses in which the
external cause is expressed as subject. Verbs like laugh and cry do not have this property: the eventualities each one denotes
'cannot be externally controlled' but 'can be controlled only by the person engaging in it'; that is, control 'cannot be relinquished'
(1970: 107). Smith takes the lack of a causative transitive use for these verbs and other verbs such as shudder, blush, tremble,
malinger, and hesitate, to be a reflection of the presence of internal control; we return in section 4 to the question of why verbs
of internal control should have this property.
(47a) Mary shuddered.
(47b) *The green monster shuddered Mary.
(47c) The green monster made Mary shudder. (Smith 1970: (35ac))
Similar distinctions have been recognized in other work on English (e.g., Hale and Keyser 1987) and other languages (e.g.,
Guerssel 1986 on Berber).
For reasons which we explain below, we will not use Smith's notion of 'control' for distinguishing among intransitive verbs
which do and do not have causative uses. Rather, we use a related notion, distinguishing between 'internally' and 'externally
caused' eventualities. With an intransitive verb denoting an internally caused eventuality, some property inherent to the argument
of the verb is 'responsible' for bringing about the eventuality. On this approach, the concept of internal cause subsumes agency.
For agentive verbs such as play, speak, or work, the inherent property responsible for the eventuality is the will or volition of
the agent who performs the activity. However, an internally caused eventuality need not be agentive. For example, the verbs
blush and tremble are not agentive, but they, nevertheless, can be considered to denote internally caused eventualities, because
these eventualities arise from internal properties of the arguments, typically an emotional reaction.13
Verbs with an inanimate, clearly non-agentive subject, may also denote internally caused eventualities in the sense that these
eventualities are possible because of inherent properties of their subjects. In particular, the notion of internal cause can be
straightforwardly extended to encompass verbs of emission. It is an internal physical property of the argument of such a verb
which brings about the eventuality denoted by the verb. This property is reflected in the strong restrictions that these verbs
impose on possible subjects. For example, only very few things have the properties that are necessary to sparkle, and the same
holds for other verbs of emission. Consistent with the classification of these verbs as internally caused is the fact that, as
mentioned in section 2, verbs of emission generally do not have causative counterparts, as illustrated in (48). (We return in
section 7 to cases in which they do.)
(48a) *The jeweller sparkled the diamond.
(48b) *Max glowed Jenny's face with excitement,
(48c) *We buzzed the bee when we frightened it.
(48d) *The cook bubbled the stew.
13 The verbs shudder and shake, which at first glance appear to have the same meaning, present an interesting minimal
pair. Only shake, and not shudder, shows a transitive causative use. Our account would suggest that shaking is externally
caused and shuddering is internally caused. This proposal receives support from an examination of the things that can
shake and shudder. The two sets are not co-extensive; the set of things that shudder is to a large extent a subset of the set
of things that shake. Things that shudder usually can be thought of as having a 'self-controlled' body; they include people,
animals, and, perhaps by forced extension, the earth or a car. In contrast, leaves, teacups, or furniture can only shake. This
difference, like the internal versus external cause distinction, reflects the way we conceptualize the world.
Since verbs of emission pattern with other verbs without causative counterparts, we use the notion internal versus external cause
rather than the notion of control. It seems inappropriate to attribute control to the inanimate emitter argument of a verb of
emission.
In contrast to internally caused verbs, verbs which are externally caused inherently imply the existence of an external cause with
immediate control over bringing about the eventuality denoted by the verb: an agent, an instrument, a natural force, or a
circumstance. Thus something breaks because of the existence of some external cause; something does not break solely because
of its own properties. Some of these verbs can be used intransitively without the expression of an external cause, but, even when
no cause is specified, our knowledge of the world tells us that the eventuality these verbs denote could not have happened
without an external cause.
(49a) The vase broke./Antonia broke the vase.
(49b) The door opened./Pat opened the door.
We thus assume that the intransitive verbs which have transitive uses are externally caused, while those intransitive verbs which
do not are internally caused. A closer look at the class of alternating verbs will bear out this suggestion.
The change of state verbs that figure prominently among the alternating verbs describe changes in the physical shape or
appearance of some entity that can be brought about by an external cause, be it an agent, a natural force, or an instrument. Many
of these verbs are deadjectival; they are based on stage-level adjectives which describe properties of entities that can be caused
to change, such as their physical characteristics, color, and temperature (Dixon 1982). Some examples of such deadjectival verbs
taken from Levin (1993) are given below in (50); these verbs fall into two major groups, one in which the verbs are zero-related
to adjectives, as in (a), and the second in which the verbs are formed from adjectives through the use of the affix -en, as in (b).
(50a) brown, clear, clean, cool, crisp, dim, dirty, dry, dull, empty, even, firm,
level, loose, mellow, muddy, narrow, open, pale, quiet, round, shut, slack,
slim, slow, smooth, sober, sour, steady, tame, tan, tense, thin, warm,
yellow,
(50b)awaken, blacken, brighten, broaden, cheapen, coarsen, dampen, darken,
deepen, fatten, flatten, freshen, gladden, harden, hasten,
The verb smarten provides a particularly interesting illustration of the constraints on the adjectives that can serve as the base for
verbs. Although the adjective smart has two senses, 'intelligent' and 'well and fashionably dressed', the verb smarten is related to
the second adjectival sense, reflecting the fact that it is typically only in this sense that the adjective denotes a stage-level
property, and, hence, a property that might be caused to change.14 That is, individual-level properties are typically not acquired
as a result of an external cause, whereas stage-level properties are.
The distinction between internally versus externally caused eventualities is not relevant only to verbs of change.15 It also
explains the behavior of verbs of position with respect to the causative alternation. As noted above, verbs like hang, lean, sit,
and stand have causative uses, but verbs like loom and slouch do not. It seems to us that the difference between internal and
external cause is the key to their differing behavior. Looming and slouching are postures that are necessarily internal caused,
unlike hanging, leaning, sitting, or standing, which are postures that can be brought about by an external cause.
Many studies assume that the intransitive variant of a causative alternation verb is basic and the transitive variant derived. This
assumption probably seems justified because the meaning of the transitive verb includes that of the
14 Betsy Ritter has pointed out to us the expression Smarten up! Here the verb is related to the adjectival sense
'intelligent', but interestingly the verb is related to a stage-level use of the adjective. It appears that this adjective, like
many other basically individual-level adjectives, can sometimes be used as a stage-level predicate.
Dowty (1979: 129, fn. 4) discusses other instances in which deadjectival verbs lose some of the senses of their base
adjective. For example, he notes that although the adjective tough can mean either 'difficult' or 'resistant to tearing', the
verb toughen cannot mean 'make difficult'. We think that the stage-level versus individual-level distinction could be
responsible for at least some of the differences in available senses that Dowty cites including the toughen example.
15 There seems to be a gap in the English verb inventory: there appear to be no agentive verbs of change of state. We do
not have an explanation for their absence. In fact, we are aware of very few internally caused verbs of change of state at
all, and those we have found, such as flower and blossom, and, in some languages, blush are non-agentive. We discuss
this type of verb in Levin and Rappaport Hovav (to appear).
intransitive verb. For example, while transitive break means 'cause to become broken', intransitive break means 'become broken'.
We suggest that this is not the case. A scrutiny of the range of verb classes in Levin (1993) reveals that there are no externally
caused verbs without a transitive variant. That is, all externally caused verbs have a transitive causative use, but not all of them
need have an intransitive use in which the external cause is unspecified (e.g., write or murder). Given this generalization, we
offer the following analysis: internally caused verbs are inherently monadic predicates, and externally caused verbs are
inherently dyadic predicates, taking as arguments both the external cause and the passive participant, which is often referred to
as the patient or theme. The adicity of the predicate is then a direct reflection of a semantic property of the verb. Externally
caused verbs only detransitivize under specific circumstances; we discuss the circumstances that license the non-expression of
the cause argument of externally caused verbs in section 6. But it is important to stress that on our analysis externally caused
verbs do not undergo a process of causativization they are inherently causative but rather a process of detransitivization. Since
the majority of causative alternation verbs are externally caused, it is the process of detransitivization that is most pervasive in
English.
The following lexical semantic representations for the two types of verbs reflect the type of distinction we suggest.
(51) break-transitive: [x cause [y become BROKEN]]
(52) laugh: [x LAUGH]
The representation for a verb like break is a complex lexical semantic representation involving the predicate CAUSE; it
represents the meaning of such verbs as involving two subevents, with each of the arguments of the verb associated with a
distinct subevent. The representation for an internally caused verb such as laugh does not involve the predicate CAUSE; such
verbs have only one subevent and are taken to be basically monadic. We discuss the rules that determine the syntactic
expression of the arguments in these lexical semantic representations in the next section. However, it is clear that the intransitive
form of break involves an operation which prevents the external cause from being projected to the lexical syntactic
representation (the argument structure). We do not discuss this operation in this paper, but see Levin and Rappaport Hovav (to
appear) for discussion.
In light of the discussion above, certain facts about the formation of causatives across languages cited by Nedjalkov (1969) are
not surprising. In
this study, which is based on a survey of 60 languages, Nedjalkov looks at the morphological relation between the causative and
non-causative uses of the verbs break and laugh (as well as two other verbs) in each of these languages. Nedjalkov points out
that in the majority of his sample, the transitive causative form of the verb break is morphologically unmarked, with the
intransitive form being identical to the transitive form (19 out of 60 languages) or derived from this form (22 out of 60
languages). If verbs such as break are appropriately characterized as denoting externally caused eventualities, then the monadic
use is in some sense derived and indeed morphological marking has a function: it is needed to indicate the non-expression of the
external cause.16
Nedjalkov also considers the verb laugh. As a monadic verb which is internally caused, the verb laugh does not denote an
eventuality that involves an external cause and can, therefore, be assumed to be basically a single argument verb. In fact,
Nedjalkov does not cite any languages in which this verb has a transitive counterpart which is identical in form to or
morphologically less complex than the intransitive and which receives a causative interpretation.17 Nedjalkov reports that in 54
of the 60 languages surveyed, the causative form is morphologically more complex than the non-causative form; see also Hale
and Keyser (1987) for discussion of some similar data.
Haspelmath (1993) follows up on Nedjalkov's study and discusses verbs which tend not to show consistent patterns cross-
linguistically. For example, verbs corresponding to English melt tend to be basically transitive in most languages, with the
intransitive form being the derived form, but the opposite pattern is found in a few languages. It is likely that this variability
arises because the meaning of a verb such as melt is consistent with classification as either internally or externally caused.18
Pinker (1989) also points out that
16 Of course, there are some languages where the reverse type of morphology is used to create a dyadic causative
predicate from the monadic predicate. 9 of the 60 languages in Nedjalkov's sample show this property. However, it is
difficult to tell from Nedjalkov's paper whether the morpheme used to form transitive break is that used for the derivation
of causatives in general in the languages concerned, although the data Nedjalkov cites in the appendix to his paper
suggests that in the majority of the languages it is at least not the morpheme used to form the causative of laugh.
17 Nedjalkov (1969) notes that in those languages where the verb laugh has both transitive and intransitive uses, this verb
is likely to mean 'laugh at' rather than 'make laugh' when used transitively.
18 Nedjalkov (1969) also looks at two other verbs, burn and boil, finding that their behavior with respect to causative
formation across languages was much more variable than that of break and laugh. This variation, like the variation that
Haspelmath observes with the verb melt, could also be attributed to the variable classification of these verbs.
there are certain classes of verbs which denote eventualities which can be construed on cognitive grounds to be either internally
or externally caused. It is precisely with respect to these kinds of verbs that cross-linguistic variation is expected. In fact,
appropriately formulated linking rules should predict which kinds of verbs are most likely to exhibit cross-linguistic variation.
The distinction between internal and external causation seems to do just this, and we take it to corroborate our approach.
4. Formulating the linking rules
Although the number of arguments that a verb requires in its lexical semantic representation is determined by whether it
describes an internally or an externally caused eventuality, we must also posit linking rules that ensure that these arguments have
the appropriate syntactic expression. As we describe in Rappaport et al. (1988), we see linking rules as creating the lexical
syntactic representation or argument structure of a verb from its lexical semantic representation. As we also outline in that paper,
a verb's argument structure in turn relatively straightforwardly determines the d-structure syntactic configuration that the verb is
found in. We propose that the following linking rules are among those that determine the lexical syntactic representation of a
verb:
(53)Immediate Cause Linking Rule:
The argument of a verb that denotes the immediate cause of the eventuality
denoted by that verb
is its external argument.
(54)Directed Change Linking Rule:
The argument of a verb that denotes an entity undergoing a directed change
denoted by the verb
is its direct internal argument.
We have stated these linking rules in terms of the argument structure notions 'external argument' and 'direct internal argument';
these argument structure positions are then 'projected' into syntax as the d-structure grammatical relations of subject and object,
respectively. In the next section we explain why we have stated these rules in terms of argument structure notions that
correspond most closely to d-structure grammatical relations rather than to s-structure grammatical relations. In this section we
discuss the linking rules and their application to the data we have discussed.
The Immediate Cause Linking Rule is intended to apply to the argument that causes the eventuality denoted by both internally
and externally caused verbs. First, we consider internally caused verbs such as laugh or play. The verb laugh's single argument
is the cause of the eventuality that the verb denotes and will be expressed as an external argument as a consequence of the
Immediate Cause Linking Rule. This rule will also explain why laugh and other internally caused verbs do not have a simple
transitive causative use. Such a use would involve the introduction of an additional cause, external to the eventuality denoted by
the verb. Such an external cause would have to be expressed as the external argument due to the Immediate Cause Linking Rule.
The external cause would thus compete with the verb's own argument for external argument. As a verb has only a single
external argument, such causative uses would be ruled out. On this account, the lack of a causative variant for an internally
caused verb receives an explanation in terms of the properties of argument structure; this explanation only indirectly appeals to
the semantics of the verbs involved.19
The only way to introduce an external cause is to express the causative use of internally caused verbs periphrastically. And
across languages, verbs like laugh, cry, speak or play are causativized through the use of a causative affix or verb.
(55a) *The clown laughed me.
(55b) The clown made me laugh.
(56a) *The bad news cried me.
(56b) The bad news made me cry.
(57a) *The director spoke the actor.
(57b) The director made the actor speak.
(58a) *The parents played the children.
(58b) The parents made the children play.
Following Baker (1988), Marantz (1984), S. Rosen (1989), and others, we assume that the causative morpheme or verb comes
with its own argument structure, so that the Immediate Cause Linking Rule does not have to
19 Pinker (1989) points out that internally caused verbs are not expected to have causative uses because the eventuality
they denote cannot have an external cause which is at the same time an immediate cause; that is, such eventualities cannot
be construed as being directly caused. Although this property is probably implicated in the non-causativizability of such
verbs, the existence of internally caused verbs which do causativize under certain syntactic conditions, such as those
discussed in section 8, suggests that syntactic factors enter into the explanation as well.
associate two arguments from a single argument structure with the same argument structure position. General principles will
determine that in languages with causative affixes or verbs the introduced cause will be first in line for being chosen as the
external argument in its clause.
The Directed Change Linking Rule is similar in spirit to familiar linking rules which associate a patient or a theme (or an
equivalent notion) with the direct object grammatical function (Anderson 1977, Fillmore 1968, Marantz 1984, among others).
Our formulation is meant to give specific semantic content to the notions 'patient' and 'theme'. The Directed Change Linking
Rule is meant to apply to verbs of change of state and verbs of change of location. This second class includes verbs of directed
motion such as come, go, rise, and fall but NOT verbs of manner of motion such as roll, run, jog, and bounce. This difference
follows because, although the action denoted by a verb of manner of motion inherently involves a kind of change, it is not a
directed change. Tenny suggests that there are certain kinds of changes which can be characterized ' as a change in a single
parameter or a change on a scale' (1987: 189). We call such changes 'directed changes'. Tenny argues that an argument denoting
an entity which is specified to undergo such a change is realized in the syntax as a direct object. This property distinguishes a
change of state verb like dry from both agentive and non-agentive verbs of manner of motion like walk and roll. The verb dry
specifies a change characterizable in terms of a single parameter, dryness, whereas walk and roll do not specify such a change.
In contrast, for verbs of directed motion there IS a directed change: a movement in a particular direction.20 The argument of a
non-agentive manner of motion verb such as roll will be a direct internal argument, as well will see, but this linking will be
effected by another linking rule. The justification for this will be given in section 7.
The linking rules we have formulated also ensure that when a verb like break is used transitively, the external cause will be the
external argument, and the patient, since it undergoes a specified change, will be the direct internal argument. When a verb like
break is used intransitively with only the patient argument, the Directed Change Linking Rule will apply, and this
20 As formulated here the Directed Change Linking Rule, unlike some other proposed linking rules that are similar in
scope, will apply to certain atelic verbs of change, such as widen or cool. We argue that this property is desirable in Levin
and Rappaport Hovav (to appear), where we provide a more detailed comparison of the Directed Change Linking Rule
with other linking rules, especially those which make reference to concepts such as telicity. We also compare our
approach with one such as Dowty's (1991) which makes use of the rather similar notion of 'incremental theme'.
argument will be the direct internal argument. Since these verbs have s-structure subjects when intransitive, this argument must
assume the subject grammatical relation at s-structure, presumably as a consequence of independent syntactic principles. The
typical GB-framework account of the expression of the arguments of such verbs makes reference to the Case Filter, Burzio's
Generalization, and the Extended Projection Principle (e.g., Burzio 1986); we do not go into details here.21
Together the Immediate Cause and Directed Change Linking Rules can be used to predict whether the members of the verb
classes that we discussed in section 2 will have causative uses or not. Verbs of change of state are inherently dyadic verbs, so
they will always have causative uses, although not as a result of causativization; in section 6 we elaborate on the circumstances
in which these verbs can have monadic 'detransitive' uses. Internally caused verbs are not expected to have causative uses,
explaining the behavior we observed for verbs of emission; we discuss in section 7 why some verbs of emission nevertheless do
have causatives. Agentive verbs of manner of motion, as internally caused verbs, are also not expected to have causative uses.
As seen in section 2 these verbs do not typically have causative uses in isolation; we discuss in section 8 why these verbs may
have causative uses in the presence of a directional phrase. We attribute the mixed behavior of verbs of position to a split in the
class: some of these verbs are internally caused and others are not, and the internally caused verbs are not expected to have a
causative use.
These linking rules leave open the question of what happens with an argument that falls under neither of the linking rules
introduced in this section. Here we make the assumption, which we justify in Levin and Rappaport Hovav (to appear), that an
argument that is not linked by one of these two linking rules will be a direct internal argument rather than an external
argument.22
(59)Default Linking Rule:
An argument of a verb that does not fall under the scope of the other
linking rules is its direct internal argument.
The Default Linking Rule will apply to the theme (located) argument of transitive sit, stand and other externally caused verbs of
position, since this argument neither causes the eventuality denoted by the verb nor does it undergo a specified change.23 We
return to the Default Linking Rule in sections 7 and 8, where we illustrate its applicability more fully.
Given the definitions of unaccusative verbs as verbs taking a single direct internal argument and unergative verbs as verbs
taking a single external argument, the linking rules proposed in section 4 will receive support if there is evidence that internally
caused verbs are unergative and externally caused verbs, when monadic, are unaccusative. We review two unaccusative
diagnostics that can be used to support this claim; for further discussion see Levin and Rappaport Hovav (to appear).
23 We do not discuss these verbs further in this paper since a full account of the application of the linking rules to these
verbs would require us to introduce certain complications in their behavior. We discuss these complexities in Levin and
Rappaport Hovav (to appear). However, we would like to point out that our account suggests that the externally caused
verbs of position should be basically transitive.
Work on the Unaccusative Hypothesis has established that the resultative construction can be used as an unaccusative diagnostic
(Hoekstra 1984, Levin and Rappaport Hovav (to appear), Simpson 1983, among others). Although both unaccusative and
unergative verbs are found in this construction, they pattern differently due to an interaction of verb type with a syntactic
constraint requiring the resultative phrase to be predicated of a d-structure object. What matters for our purposes is that when an
unaccusative verb is found in the resultative construction, the resultative phrase is predicated directly of its surface subject, as in
(61), but a resulative phrase cannot be predicated directly of the surface subject of an unergative verbs, as in (62a). A resultative
phrase may only be predicated of the subject of an unergative verb through the mediation of what Simpson (1983) calls a 'fake'
reflexive object, as in (62b). Alternatively, a resultative phrase may be predicated of a non-subcategorized object found with an
unergative verb, as in (63), an option not available to unaccusative verbs, as shown in (64).24
(61) The bag broke open.
(62a) *We yelled hoarse.
(62b) We yelled ourselves hoarse.
(63) The dog barked them awake.
(64) *The bag broke the groceries all over the floor.
Thus the different patterns of the resultative construction correlate with the status of a verb as unaccusative or unergative: a
monadic verb which allows a resultative phrase to be predicated directly of its subject is unaccusative, while a monadic verb
which allows such a phrase to be predicated of an object either a 'fake' reflexive or a non-subcategorized object is unergative.
The closely related X's way construction is also an unaccusative diagnostic. This construction, in which a resulative phrase is
predicated of the subject of a verb through the use of the phrase 'X's way' in object position, is found with unergative verbs, but
not with unaccusative verbs (Jackendoff 1990, Marantz 1992).
(65) They worked their way to the top.
(66) *The Arctic explorers froze their way to fame.
24 See Levin and Rappaport Hovav (to appear) for an explanation of the differential behavior of the two classes of verbs
in the resultative construction, and Hoekstra (1992) for an alternative account.
The resultative and X's way constructions distinguish internally caused verbs from externally caused verbs as predicted. An
examination of the set of tokens of these constructions we have collected over the last few years shows that internally caused
verbs like laugh, play, and work are regularly found in the X's way construction and the unergative resultative pattern, while
instances of monadic externally caused verbs are attested only in the unaccusative resultative pattern.
The behavior of verbs of emission in the resultative construction is of particular interest since the classification of these verbs
has been the subject of controversy. Perlmutter (1978) originally classified these verbs as unaccusative, but this classification
has been challenged (see for example Zaenen 1993). We have classified these verbs as internally caused verbs, and hence, we
predict that they will pattern with unergative verbs in general, and in the resultative and X's way constructions in particular. The
examples below verify this prediction.25
(67a) The beacons flared the news through the land. (Henderson I 92; cited in
K.-G. Lindkvist, A Comprehensive Study of Conceptions of Locality,
Almqvist & Wiksell, Stockholm, Sweden, 1976, p. 89, sec. 233, 4)
(67b)you can't just let the thing ring itself to death, can you? (Observer; Trace
That Call No More!, New York Times, March 8, 1989)
(67c) The very word was like a bell that tolled me back to childhood summers
(Hers; Child's Play, Women's Sway, New York Times, July 17, 1988)
(67d)Then he watched as it gurgled its way into a whiskey tumbler. (M. Grimes,
The Five Bells and Bladestone, Little, Brown, Boston, 1987, p. 200)
(67e) To counter the unease that was oozing its way between them. (P. Chute,
Castine, Doubleday, Garden City, NY, 1987, p. 214)
In Levin and Rappaport Hovav (to appear) we look at a wide range of tests and find that they corroborate the results of the two
tests that we have discussed in this section, further supporting the linking rules formulated in
25 Given their unergative classification, we would not expect these verbs to pattern as unaccusative verbs with respect to
the resultative construction. In actual fact, some of these verbs are found in the unaccusative resultative construction, but
as we discuss in Levin and Rappaport Hovav (to appear) their unaccusative behavior correlates with a shift in meaning,
with the additional meaning being one that is typically associated with an unaccusative classification.
section 4. In that work, we also show that there are some verbs which are compatible with both internal and external causation.
These verbs include the non-agentive verbs of manner of motion such as roll and bounce and the verbs of position. As we show
in that work, with such verbs external causation is correlated with unaccusative status, while internal causation is correlated with
unergative status.
6. When can an externally caused verb detransitivize?
The next question we address is the following: if externally caused eventualities are basically dyadic, when can verbs denoting
such eventualities turn up as intransitive, and why is this possibility open to some verbs only for certain choices of arguments?
Again we draw on the insights in Smith's (1970) paper to reach an understanding of this phenomenon.
In trying to identify the factors that permit detransitivization (that is, the non-expression of the external cause), it is useful to
look at the characteristics of the subjects of externally caused verbs. Among the verbs that never detransitivize are verbs that
require an animate intentional and volitional agent as subject, such as the verbs murder and assassinate or the verbs of creation
write and build.
(68) The terrorist assassinated/murdered the candidate.
(69a) Tony wrote a letter to the editor of the local newspaper.
(69b) That architect also built the new high school.
Smith proposes that the verbs of change that may be used intransitively are precisely those in which the change can come about
independently 'in the sense that it can occur without an external agent' (1970: 102). She identifies independence and external
control the notion which we have subsumed under our notion external cause as the two features which characterize verbs of
change. Independence allows for the possibility of intransitive counterparts, and external control or causation allows for the
possibility of a transitive causative use. Smith's observation can also be recast as follows: the transitive verbs that detransitivize
are those in which the eventuality can happen spontaneously without the volitional intervention of an agent. We believe that this
property is reflected in the ability of such verbs to allow natural forces or causes, as well as agents or instruments, as external
causes, and, hence, as subjects, as illustrated with the alternating verb break.
Verbs such as break contrast with verbs such as murder, assassinate, write, and build. These four verbs, as well as any other
verbs which, like them, denote eventualities that require the participation of a volitional agent and do not admit natural force
subjects, will not detransitivize, despite the fact that their meanings involve a notion of 'cause'.
(71a) *The candidate assassinated/murdered.
(71b) *The letter wrote.
(71c) *The house built.
In fact, these four verbs are among those that require an agent in the strongest sense: they do not even allow an instrument as
subject.
(72a) *The knife assassinated/murdered the candidate.
(72b) *The pen wrote the letter.
(72c) ??The crane built the house.
A verb like cut shows that the set of verbs that do not detransitivize is not limited to verbs which restrict their subjects to
volitional agents. Although this verb does not typically allow natural force subjects, it does allow instruments in addition to
agents as subjects.26
(73) The baker/that knife cut the bread.
Sentence (74), however, cannot be used to describe the bringing about of a separation in the material integrity of some object.
(74) *The bread cut. (on the interpretation 'The bread came to be cut')
The behavior of a verb like cut can receive an explanation. Its meaning includes a specification of the means involved in
bringing the action it denotes about, which in turn implies the existence of a volitional agent. Specifically, the very meaning of
the verb cut implies the existence of a sharp instrument that must be used by a volitional agent to bring about the change
26 See Brousseau and Ritter (1991) for further discussion of the circumstances that allow verbs to take both instruments
and agents as subjects.
of state denoted by the verb. If the same change of state were to come about without the use of a sharp instrument, then it could
not be said to have come about through cutting, showing that the choice of instrument makes cutting cutting.
Perhaps the same considerations can explain the behavior of the verb remove, which does not have an intransitive form. Its non-
existence might seem somewhat surprising since at a first approximation this verb's meaning might be paraphrased as 'cause to
become not at some location'. A closer look at the verb remove's meaning reveals that the eventuality it denotes is brought about
by a volitional agent, as shown by the oddness of the examples in (75), which have inanimate non-volitional subjects.
(75a) ??The wind removed the clouds from the sky. (cf. The wind cleared the
clouds from the sky.)
(75b)??The water removed the sand from the rocks. (cf. The water washed the
sand from the rocks.)
We assume that the same factors explain why most morphologically complex verbs formed with the suffixes -ize and -ify cannot
typically detransitivize, as the data repeated here illustrates.
(76a) The farmer homogenized/pasteurized the milk.
(76b) *The milk homogenized/pasteurized.
(77a) Carla humidified her apartment.
(77b) *Her apartment humidified.
Most of these verbs cannot detransitivize, we propose, because they describe eventualities such as being pasteurized or
homogenized that cannot come about spontaneously without the external intervention of an agent. It appears to be precisely
those -ify and -ize verbs which allow for this possibility that do detransitivize.
(78a) I solidified the mixture./The mixture solidified.
(78b) The cook caramelized the sugar./The sugar caramelized.
Again, the -ify and -ize verbs that do and do not permit intransitive uses contrast with respect to the range of subjects they
permit when transitive. The verbs that resist detransitivization show a narrower range of subjects when transitive; specifically,
they appear to exclude natural force subjects.
If we look more closely at some of the alternating verbs in -ify and -ize listed in (80), we see that many of these verbs, such as
intensify or equalize, are deadjectival and are very similar in meaning to the previously mentioned alternating deadjectival verbs
in (50).
(80a) acetify, acidify, alkalify, calcify, carbonify, emulsify, gasify, intensify,
lignify, liquefy, nitrify, ossify, petrify, putrefy, silicify, solidify, stratify,
vitrify,
(80b)caramelize, carbonize, crystallize, decentralize, demagnetize, depressurize,
destabilize, equalize, fossilize, gelatinize, glutenize, harmonize, ionize,
magnetize, neutralize, oxidize, polarize, pulverize, regularize, stabilize,
vaporize,
Other alternating -ify and -ize verbs are denominal; their meaning may be paraphrased roughly as 'cause to turn into the
substance named by the noun that the verb is based on': caramel for caramelize, powder for pulverize, gas for gasify, and so on.
The non-alternating -ify and -ize verbs also include some denominal verbs whose stems are nouns that name substances: zincify,
carbonize, and iodize. But what is interesting is that the meaning of these non-alternating verbs is different from that of the
alternating verbs: it could be paraphrased as 'process or treat using the substance' rather than 'cause to turn into the substance'.
We suggest that due to this difference in meaning, these verbs require an agent and hence do not detransitivize. In fact, if zincify
meant 'turn to zinc' rather than 'process with zinc', we would predict that the verb could alternate, and our own intuitions, as well
as those of others we have consulted, is that it would. A preliminary examination of a wider range of non-alternating -ify and
-ize verbs suggests that many describe changes that involve a particular type of processing or treatment, as with the previously
cited verbs homogenize and pasteurize or as with the verbs sterilize or vulcanize. Other non-alternating verbs involve changes of
state that only come about through the active intervention of an agent, such as legalize or sanctify.
The constraint on detransitivization also explains why some verbs have intransitive uses only for certain choices of patient: it is
only for these choices of patient that the change can come about without the intervention of an agent. For instance, in section 2
we noted the following contrasts involving the verb clear:
Our knowledge of the world tell us that tables and sidewalks are things that are cleared (of dishes and snow, respectively)
through the intervention of an animate agent. The sky, however, can clear through the intervention of natural forces, such as the
wind. Thus the difference in the possibility of intransitive counterparts.
Similarly, peeling causing an entity to lose an outer layer is typically brought about through the actions of a volitional agent,
particularly if a fruit or vegetable is involved. However, there are certain entities that lose their outer layers due to natural causes
rather than through the action of an agent, and in these instances the verb peel can be used intransitively, as in the case of the
loss of skin from a person, as illustrated in (84).
(83a) I peeled the orange.
(83b) *The orange peeled.
(84a) ?I peeled my nose.
(84b) My nose was peeling.
The verb lengthen can be used to present another contrast of the same type:
(85a) The dressmaker lengthened the skirt.
(85b) *The skirt lengthened.
(86a) The mad scientist lengthened the days.
(86b) The days lengthened.
Typically skirts are only lengthened through the intervention of an agent, and hence the verb lengthen as applied to skirts is not
found intransitively.27 Days, on the other hand, become longer as the earth progresses through a certain part of its orbit around
the sun, something that happens without the intervention of an outside agent. And lengthen as applied to days is typically used
intransitively, although in a science fiction context where artificial
27 Of course, it is possible to construct contexts in which a skirt might be lengthened by being washed. As Mary
Laughren has pointed out to us, the intransitive use should be possible in such circumstances.
manipulation of the length of days is possible, transitive uses are also found, as in (86a). These examples show yet again that
detransitivization is possible precisely where an externally caused eventuality can come about without the intervention of an
agent. In this sense, detransitivization is a productive process, since it appears to be possible wherever this condition is met.
In trying to pin down a verb's transitivity, we have suggested that verbs can be categorized according to whether or not they
denote an eventuality with an external cause and according to whether or not they denote an eventuality which can occur
spontaneously. Since these two distinctions are rather similar, we might ask whether there is any need to distinguish between
them. In fact, Haspelmath (1993) has independently developed an analysis similar to the one we present here, except that he
does not make a clear distinction between the two notions. Although Haspelmath is not explicit about this, it appears that he
takes the likelihood of spontaneous occurrence for an event to be the opposite of external causation for that event. It seems to us
that there is evidence favoring our approach, which takes the two notions to be distinct. Haspelmath links the likelihood of
spontaneous occurrence to intransitivity, without distinguishing between unaccusative and unergative intransitive verbs as we
do. For Haspelmath, those verbs which denote events which are likely to occur spontaneously will have an intransitive form,
while those which are not likely to occur spontaneously will have only a transitive form. However, Haspelmath does note that
across languages certain intransitive verbs like break tend to be the morphologically marked member of a causative alternation
pair of verbs, while others like laugh tend to be the morphologically unmarked member. It turns out, as he notes, that those
verbs which, like break, are both spontaneously occurring and externally caused, are the ones which tend to have the intransitive
form as the morphologically marked one. Those which, like laugh, are spontaneously occurring and internally caused tend to
have the transitive member of a causative alternation pair morphologically marked. This difference justifies the retention of both
notions. In some sense, Haspelmath's study provides cross-linguistic corroboration of the results we obtained from the in-depth
study of a single language.
the verbs burp and buzz, which we have seen are internally caused, can be used transitively for certain types of arguments, as in
the examples below.
(87a) The nurse burped the baby.
(87b) *The nurse burped the doctor. (Smith 1970: (36a,c))
(88a) The postman buzzed the bell.
(88b) *The postman buzzed the bees.
This phenomenon is sparsely and unevenly distributed across the English verb inventory. For instance, the verb burp may be the
only bodily process verb with a causative transitive use. The existence of causative transitive uses is somewhat more widely
attested with verbs of emission, particularly verbs of sound emission. This property might be attributable to the fact that, unlike
verbs of bodily process, verbs of emission are typically predicated of inanimates; therefore, some verbs of emission can describe
either internally or externally caused eventualities. Among the verbs of emission that can be used transitively are a few verbs of
light emission, including beam and shine, and a somewhat larger number of verbs of sound emission, including buzz, jingle,
ring, and rustle. The verb buzz describes a type of sound that is emitted by certain animals bees or by certain types of devices
bells and buzzers. This verb can only be used transitively when the emitter of the sound is a device, and only if the device can
be caused to emit the sound through direct manipulation by an external cause. Similarly, the verb of light emission beam may be
used transitively when the object of the verb is a flashlight, again a manipulatable device, but not a person's face.28
(89a) He beamed the flashlight in the dark.
(89b) *He beamed her face with satisfaction.
The following generalization appears to hold of all the verbs of emission with causative transitive uses: they can be used
transitively only with an emitter that is directly manipulated by an external cause, and when used in this way, the interpretation
must be one in which the emission is directly brought about by an external cause. There are fewer verbs of light emission
28 Steve Pinker has pointed out to us that the transitive use of verbs of light emission generally has a meaning which
includes 'aiming in a particular direction', rendering a directional phrase either obligatorily present or at least understood.
He suggests that perhaps the analysis of these verbs should be similar to the one we propose for the causative forms of
the agentive verbs of manner of motion in the next section.
with transitive causative uses than there are verbs of sound emission since in most instances the entities of which verbs of light
emission are predicated emit light without the intervention of an external cause, unless these entities are devices. More verbs of
sound emission than verbs of light emission are predicated of entities which emit a sound only under manipulation by an
external cause. Some verbs of emission, such as sparkle and burble cited in section 2, never have causative transitive uses. It is
unclear to us at this point whether some verbs of emission lack causative uses because they denote eventualities in which
causation simply cannot be assumed by an external cause that is, they are necessarily internally caused or because, even though
external causation may be possible, the set of verbs denoting eventualities compatible with both internal and external causation
is explicitly learned from examples.
We can now propose an explanation for why burp is apparently the only verb denoting a bodily process with a transitive
causative use. One of the few feasible instances of external causation of a bodily process is burping as it applies to babies.
Babies are incapable of burping by themselves, so that the person caring for the baby must assume control of the burping. Thus
the verb burp can be used transitively only when babies are involved.
We propose then that the eventualities denoted by a small number of English verbs are compatible with either internal or
external causation, giving rise to both an intransitive use and a transitive causative use of these verbs. Since the causative use,
when available, is associated with direct manipulation of the emitter by an external cause, we assume that in such instances the
emitter is no longer viewed as the cause of the eventuality, and that the only cause is the external cause which manipulates the
emitter. The Immediate Cause Linking Rule will apply to the external cause, so that it will be the external argument. The
Default Linking Rule will apply to the emitter, since it does not meet the conditions on the other linking rules, and it will be the
direct internal argument.
As mentioned earlier, certain verbs of manner of motion have meanings compatible with either internal or external causation.
These verbs include the set of verbs of manner of motion which are not necessarily agentive, such as swing, bounce, or roll. In
Levin and Rappaport Hovav (to appear) we provide evidence that a verb like roll is in fact unaccusative when predicated of an
inanimate entity, as in The ball rolled (on the floor), but unergative when used agentively, as in The dog rolled (on the floor).
This behavior is just what our analysis predicts. When internal causation is involved, the Immediate Cause Linking Rule will
ensure that the single argument, as the internal
cause, will be the external argument, and the verb will be unergative. When external causation by an agent or a force, such as a
push or gravity, is involved but no overt cause is expressed, the single argument will be the direct internal argument due to the
Default Linking Rule, and the verb will be unaccusative. (The Directed Change Linking Rule does not apply since there is no
directed change; the verb roll is atelic in the absence of a directional phrase.)
8. The interaction of directional phrases and transitivity
In this section we return to the last type of causatives mentioned in the survey in section 2: the causative uses of agentive verbs
of manner of motion such as march and jump, illustrated in examples (15b) and (16b), which are repeated below.
(90a) The general marched the soldiers to the tents.
(90b) The rider jumped the horse over the fence.
These verbs are internally caused monadic predicates. By the linking rules, their single argument should be an external
argument; therefore, contrary to fact, these verbs are not expected on our analysis to have the transitive causative uses which
some of them do manifest. In this section we provide an account of why some members of this set of internally caused verbs
have causative uses.
In the process, we will also provide an answer to another question that is posed by the linking rules that figure in our account of
causatives. We have formulated two linking rules which associate arguments with the notion of direct internal argument, and
one which associates arguments with the notion of external argument. Since one of the rules linking arguments to direct internal
argument is a default rule, a natural question to ask is why we need the other rule that links arguments to direct internal
argument, the Directed Change Linking Rule, at all. Couldn't the Default Linking Rule alone yield the same results? For
example, if we dispensed with the Directed Change Linking Rule, the Default Linking Rule could be applied to the verb break
with the desired result. This section explains why both linking rules are needed. We illustrate the necessity of the Directed
Change Linking Rule using the behavior of agentive verbs of manner of motion with respect to causativization.
We propose that the key to understanding the unexpected behavior of the agentive verbs of manner of motion is the fact that in
English, such verbs can be used as verbs of directed motion in the presence of a directional phrase (Talmy 1975, 1985, among
others).
(91a) The soldiers marched to the tents.
(91b) The horse jumped over the fence.
When an agentive verb of manner of motion is used in a directed motion sense, then both the Immediate Cause Linking Rule
and the Directed Change Linking Rule are applicable to the agentive argument. If we assume that the Directed Change Linking
Rule takes precedence over the Immediate Cause Linking Rule something that a default linking rule could by definition not do
then the single argument of a verb like run would be a direct internal argument when the verb is used in a directed motion sense.
And indeed many studies of unaccusativity have established that English agentive verbs of manner of motion are unaccusative in
the presence of a directional phrase.29
Given the unaccusativity of these verbs with directional phrases, it is possible to give an explanation for why agentive verbs of
manner of motion may have a transitive causative use when they are unaccusative: there is no external argument, so that the
external cause can be linked to external argument. Since this alternative linking, which allows us to explain the existence of the
causative use of these verbs, cannot be accomplished by a default rule, we do not dispense with the Directed Change Linking
Rule. This account also explains why a directional phrase is needed or at the very least understood when agentive verbs of
manner of motion are used causatively. The presence of such a phrase sanctions the alternative linking of the theme argument
that permits the introduction of an external cause, explaining the contrasts below.
(92a) The general marched the soldiers to the tents.
(92b) ??The general marched the soldiers.
(93a) The rider jumped the horse over the fence.
(93b) *The rider jumped the horse.
The example in (94) shows that a phrase with a directional interpretation, and not any type of locative phrase, is needed for the
causative use.
29 We do not repeat this evidence here; see Hoekstra (1984), L. Levin (1986), Levin and Rappaport Hovav (1992), C.
Rosen (1984), among others.
This example is unacceptable on the locative interpretation, which would involve the mouse running aimlessly around inside the
box, but it improves on the directional interpretation where the mouse runs around the perimeter of the box. The constraint
against locative phrases reflects the fact that only directional phrases allow for a directed change.
The process which makes manner of motion verbs into verbs of directed motion is fully productive in English. Therefore, we
would expect that the process which transitivizes these directed verbs of manner of motion to be fully productive as well, so that
all class members should have a transitive causative use in the presence of a directional phrase.30 In fact, a wide variety of
agentive verbs of manner of motion are attested in causative uses with directional phrases.31
(95a) several strong Teamsters shuffled Kit out of the room (L. Matera, A
Radical Departure, 1988; Ballatine, New York, 1991, p. 79)
(95b)'' I promised Ms. Cain I would ride her around the ranch " (N. Pickard,
Bum Steer, Pocket Books, New York, 1990, p. 92)
30 This account leaves unexplained the fact, noted also in Pinker (1989), that verbs of directed motion which are not
verbs of manner of motion do not have causative counterparts: *She arrived the package (at the store). We believe that
the lack of causatives with these verbs may not be a problem for our account of causatives of verbs of motion. We
suspect that these verbs are best not characterized as verbs of motion for several reasons, but rather they should be
considered verbs of appearance. Interestingly, verbs of appearance, for reasons that we do not fully understand, do not
permit causative uses: *The magician appeared a dove (from his sleeve). Pinker suggests that the semantic conditions we
formulate here are only necessary conditions for participation in the alternations. He proposes that membership in
lexically specified semantic subclasses of verbs determines the sufficient conditions for participation in diathesis
alternations in general. These subclasses are implicated in what Pinker calls narrow-range lexical rules. It remains to be
seen whether the lack of causative uses for certain classes needs to be stipulated lexically as Pinker suggests or can be
shown to follow from more general principles.
31 It is clear from the context that in (95b) the riders are actually on separate horses; that is, the example does not have
the accompaniment interpretation found in sentences such as I walked my dog, which might be argued to instantiate a
distinct phenomenon from the phenomenon being discussed here.
The unavailability of certain causatives can be attributed to the Immediate Cause Linking Rule itself, which is formulated in
terms of immediate causation. All of the sentences with transitive causative uses of agentive manner of motion verbs imply
some sort of coercion (a fact also noted in Pinker 1989). In fact, in the absence of a particular context, these verbs sound best
when the subject is human and the object is an animal, or else when the subject is someone in a position of authority and the
object is under that authority. We attribute these preferences to a need to construe such examples in a way that the subject can
be interpreted as the immediate cause of the eventuality. Some verbs of manner of motion describe types of motion that do not
lend themselves to an interpretation involving coercion, and such verbs are unacceptable in the causative use. This additional
condition on causativization is illustrated by verbs that describe aimless motion, such as stroll, mosey, meander and wander.
Typically aimless motion cannot be brought about by coercion and, indeed, these verbs appear not to have a causative use.
(97) *We strolled/moseyed/meandered/wandered the visitors (to the museum).
However, a search of text corpora did yield the following example of a causative use of stroll, suggesting that in the right
circumstances even these verbs can causativize, although a reviewer found this example unacceptable, as our analysis would
suggest.
Julie Smith will stroll you through the Garden District, in New Orleans
(98)Mourning (New York Times)
On this account, agentive verbs of manner of motion enter into a real process of 'causativization', in the sense that the causative
form is the derived form. The account of the causative forms of these verbs contrasts with that of the causative forms of verbs
like break, which we have argued are basically dyadic and enter into a process of detransitivization. This analysis, as we
mentioned in section 3, is corroborated by the fact, noted in Hale and Keyser (1987), that cross-linguistically it is the causative
form of such verbs which tends to be morphologically marked.32
32 Another fact which suggests that the process involved with verbs of manner of motion is different from the one
involved with verbs of change of state is pointed out by Reinhart (1991). She notes that the introduced subject in a
transitive use of a verb of manner of motion must be an agent, not an instrument or natural force. Compare The rider
jumped the horse over the fence with *The whip/the lightning jumped the horse over the fence. This property is also noted
in Cruse (1972)
Our account of the transitive use of verbs of manner of motion is different from our account of the transitive use of verbs of
emission. We have claimed that certain verbs of emission can be construed as being externally caused; because of that, a
directional phrase is not required to effect a change from an unergative to an unaccusative verb. In contrast, verbs of manner of
motion are never really considered externally caused, so that a directional phrase must be introduced to effect the change in the
classification of the verb from unergative to unaccusative. The introduction of the new external cause is constrained in that the
external cause must somehow be construed as an immediate cause. In fact, as we discuss at greater length in Levin and
Rappaport Hovav (to appear), even with certain inanimate emitters, the emission of the sound does not come about by direct
manipulation of the emitter, so that causatives are only possible in the presence of a directional phrase (e.g., The driver
roared/screeched the car into the driveway). That is, the situation reduces to precisely the situation observed with agentive
verbs of manner of motion.
The crucial part that the directional phrase plays in sanctioning the causative use of agentive verbs of manner of motion is
brought out by comparing the behavior of the verbs run and roll. The verb roll, although a manner of motion verb, is not
necessarily agentive and falls rather into Jespersen's 'move' class. As discussed at the end of section 7, the type of motion that
roll denotes can be either internally caused or, when brought about by an agent or a force such as a push or gravity, externally
caused. Depending on whether the verb roll is understood as internally or externally caused, monadic roll would be predicted to
behave either as an unaccusative or as an unergative verb. When the verb takes an animate agentive argument, it would be
expected to show unergative behavior since the rolling would be internally caused. In fact, when the verb takes an animate
subject, it can be found in the prepositional passive construction, a construction that Perlmutter and Postal (1984) argue is only
possible with unergative verbs.
(99) This carpet has been rolled on by three generations of children.
When its argument is inanimate, the eventuality denoted by the verb would be externally caused. The argument would be an
internal argument by the Default Linking Rule, since neither of the other two linking rules would be applicable, and the verb
would be expected to show unaccusative bahavior. In fact, the verb cannot be found in an unergative type resultative
construction with an inanimate subject, as shown in (100a), though it can be found in an unaccusative type resultative
construction, as shown in (100b).
(100a) *The bowling balls rolled the markings off the floor. (cf. The
basketball players dribbled the markings off the floor.)
(100b) The door rolled open/shut.
This account of why roll can be unaccusative does not make reference to a directional phrase, contrasting with the account of
why run can be unaccusative. However, like unaccusative run, unaccusative roll should allow for a causative counterpart,
though again without the necessary accompaniment of a directional phrase. As predicted, the verb roll can be used causatively
even in the absence of a directional phrase.
(101a) The bowling ball rolled (into the room).
(101b) The bowler rolled the bowling ball (into the room).
The contrasting behavior of the verbs roll and run supports our account of these two verbs.
9. Conclusion
In this paper we have unravelled some of the puzzles concerning the causative alternation. Central to our analysis is the
distinction between verbs which are inherently monadic and verbs which are inherently dyadic. This distinction is related to but
not reducible to33 the distinction between unaccusative and unergative verbs. With respect to the phenomena that have come
under the label 'causative alternation', we have suggested that the more productive process in English is one which forms
'detransitive' verbs from lexical transitive causative verbs, as in the case of the verb break. Some verbs, such as buzz have both
transitive and intransitive uses since the meaning of the verb is compatible with both internal and external causation; this
phenomenon is restricted only to those verbs which are indeed compatible with both interpretations. Transitivization of agentive
verbs of manner of motion involves the introduction of an agent to an inherently monadic verb when, due to the presence of a
directional phrase, the verb no longer takes an external argument. We hope that this study of causative verbs in English will help
to illuminate our understanding of the much discussed, though still elusive, notion of transitivity.
33 We argue in Levin and Rappaport Hovav (to appear) that verbs of inherently directed motion such as arrive are
unaccusative and monadic.
References
Anderson, S.R., 1977. Comments on the paper by Wasow. In: P. Culicover, A. Akmajian, T. Wasow (eds.), 1977, 361377.
Baker, M., 1988. Incorporation: A theory of grammatical function changing. Chicago, IL: University of Chicago Press.
Borer, H., 1991. The Causative-inchoative alternation: A case study in parallel morphology. The Linguistic Review 8, 119158.
Bresnan, J. and A. Zaenen, 1990. Deep unaccusativity in LFG. In: K. Dziwirek, P. Farrell, E. Mejías-Bikandi (eds.),
Grammatical relations: A cross-theoretical perspective, 4557. Stanford, CA: CSLI, Stanford University.
Brousseau, A.-M. and E. Ritter, 1991. A non-unified analysis of agentive verbs. WCCFL 10.
Burzio, L., 1986. Italian syntax: A Government-Binding approach. Dordrecht: Reidel.
Carter, R., 1988. On linking: Papers by Richard Carter. (Lexicon Project Working Papers 25, edited by B. Levin and C. Tenny.)
Cambridge, MA: Center for Cognitive Science, MIT.
Comrie, B., 1981. Language universals and linguistic typology. Chicago, IL: University of Chicago Press.
Cruse, D.A., 1972. A note on English causatives. Linguistic Inquiry 3, 522528.
Culicover, P., A. Akmajian and T. Wasow (eds.), 1977. Formal syntax. New York: Academic Press.
Dixon, R.M.W., 1982. Where have all the adjectives gone? In: R.M.W. Dixon, Where have all the adjectives gone? and other
essays in semantics and syntax, 162. Berlin: Mouton.
Dowty, D.R., 1979. Word meaning and Montague grammar. Dordrecht: Reidel.
Dowty, D.R., 1991. Thematic proto-roles and argument selection. Language 67, 547619.
Fillmore, C.J., 1967. The grammar of hitting and breaking. In: R. Jacobs, P. Rosenbaum (eds.), Readings in English
transformational grammar, 120133. Waltham, MA: Ginn.
Fillmore, C.J., 1968. The case for case. In: E. Bach, R.T. Harms (eds.), Universals in linguistic theory, 188. New York: Holt,
Rinehart and Winston.
Fontenelle, T. and J. Vanandroye, 1989. Retrieving ergative verbs from a lexical data base. Dictionaries 11, 1139.
Guerssel, M., 1986. On Berber verbs of change: A study of transitivity alternations. (Lexicon Project Working Papers 9.)
Cambridge, MA: Center for Cognitive Science, MIT.
Hale, K.L. and S.J. Keyser, 1986. Some transitivity alternations in English. (Lexicon Project Working Papers 7.) Cambridge,
MA: Center for Cognitive Science, MIT.
Hale, K.L. and S.J. Keyser, 1987. A view from the middle. (Lexicon Project Working Papers 10.) Cambridge, MA: Center for
Cognitive Science, MIT.
Haspelmath, M., 1993. More on the typology of inchoative/causative verb alternations. In: B. Comrie (ed.), Causatives and
transitivity. Amsterdam: Benjamins.
Hoekstra, T., 1984. Transitivity. Dordrecht: Foris.
Hoekstra, T., 1988. Small clause results. Lingua 74, 101139.
Hoekstra, T., 1992. Aspect and theta theory. In: I.M. Roca (ed.), 1992, 145174.
Hornby, A.S. (ed.), 1974. Oxford advanced learner's dictionary of current English. Oxford: Oxford University Press.
Jackendoff, R.S., 1990. Semantic structures. Cambridge, MA: MIT Press.
Jespersen, O., 1927. A modern English grammar on historical principles, Volume 3. Heidelberg: Carl Winters.
Keyser, S.J. and T. Roeper, 1984. On the middle and ergative constructions in English. Linguistic Inquiry 15, 381416.
Lakoff, G., 1968. Some verbs of change and causation. (Report NSF-20.) Cambridge, MA: Aiken Computation Laboratory,
Harvard University.
Lakoff, G., 1970. Irregularity in syntax. New York: Holt, Rinehart and Winston.
Levin, B., 1989. The Basque verbal inventory and configurationality. In: L. Maracz, P. Muysken (eds.), Configurationality: The
typology of asymmetries, 3962. Dordrecht: Foris.
Levin, B., 1993. English verb classes and alternations: A preliminary investigation. Chicago, IL: University of Chicago Press.
Levin, B. and M. Rappaport Hovav, 1992. The lexical semantics of verbs of motion: The perspective from unaccusativity. In:
I.M. Roca (ed.), 1992, 247269.
Levin, B. and M. Rappaport Hovav, to appear. Unaccusativity: At the syntax-lexical semantics interface. Cambridge, MA: MIT
Press.
Levin, L., 1986. Operations on lexical forms: Unaccusative rules in Germanic languages. Ph.D. thesis, MIT.
Marantz, A.P., 1984. On the nature of grammatical relations. Cambridge, MA: MIT Press.
Marantz, A.P., 1992. The way constructions and the semantics of direct arguments in English. In: T. Stowell, E. Wehrli (eds.),
Syntax and semantics, Vol. 26: Syntax and the lexicon, 179188. New York: Academic Press.
McCawley, J.D., 1968. Lexical insertion in a transformational grammar without deep structure. CLS 4, 7180. Reprinted with
notes in: J.D. McCawley, 1973, Grammar and meaning, 154166. Tokyo: Taishukan.
Nedjalkov, V.P., 1969. Nekotorye verojatnostnye universalii v glagol'nom slovoobrazovanii. In: I.F. Vardul' (ed.), Jazykovye
universalii i lingvisticheskaja tipologija, 106114. Moscow: Nauka.
Nedjalkov, V.P. and G.G. Silnitsky, 1973. The typology of morphological and lexical causatives. In: F. Kiefer (ed.), Trends in
Soviet theoretical linguistics, 132. Dordrecht: Reidel.
Perlmutter, D.M., 1978. Impersonal passives and the Unaccusative Hypothesis. BLS 4, 157189.
Perlmutter, D.M. and P. Postal, 1984. The 1-Advancement Exclusiveness Law. In: D.M. Perlmutter, C. Rosen (eds.), 1984,
81125.
Perlmutter, D.M. and C. Rosen (eds.), 1984. Studies in Relational Grammar 2. Chicago, IL: University of Chicago Press.
Pinker, S., 1989. Learnability and cognition: The acquisition of argument structure. Cambridge, MA: MIT Press.
Procter, P., et al. (eds.), 1978. Longman dictionary of contemporary English. London: Longman Group.
Pustejovsky, J. (ed.), 1993. Semantics and the lexicon. Dordrecht: Kluwer.
Rappaport, M. and B. Levin, 1988. What to do with theta-roles. In: W. Wilkins (ed.), Syntax and semantics, Vol. 21: Thematic
relations, 736. New York: Academic Press.
Rappaport, M., B. Levin and M. Laughren, 1988. Niveaux de représentation lexicale. Lexique 7, 1332. Appears in English as:
Levels of lexical representation. In: J. Pustejovsky (ed.), 1993, 3754.
Reinhart, T., 1991. Lexical properties of ergativity. Paper presented at the Workshop on lexical specification and lexical
insertion, December 911, 1991. Utrecht, The Netherlands: Research Institute for Language and Speech, University of Utrecht.
Roca, I.M., 1992. Thematic structure: Its role in grammar. Berlin: Walter de Gruyter.
Rosen, C., 1984. The interface between semantic roles and initial grammatical relations. In: D.M. Perlmutter, C. Rosen (eds.),
1984, 3877.
Rosen, S.T., 1989. Argument structure and complex predicates. Brandeis University. (Ph.D. thesis.)
Rothemberg, M., 1974. Les verbes à la fois transitifs et intransitifs en Français contemporain. The Hague: Mouton.
Ruwet, N., 1972. Théorie syntaxique et syntaxe du Français. Paris: Editions du Seuil.
Shibatani, M., 1976. The grammar of causative constructions: A conspectus. In: M. Shibatani (ed.), Syntax and semantics, Vol.
6: The grammar of causative constructions, 140. New York: Academic Press.
Simpson, J., 1983. Resultatives. In: L. Levin, M. Rappaport, A. Zaenen (eds.), Papers in Lexical-Functional Grammar, 143157.
Bloomington, IN: Indiana University Linguistics Club.
Smith, C.S., 1970. Jespersen's 'move and change' class and causative verbs in English. In: M.A. Jazayery, E.C. Palome, W.
Winter (eds.), Linguistic and literary studies in honor of Archibald A. Hill, Vol. 2: Descriptive linguistics, 101109. The Hague:
Mouton.
Talmy, L., 1975. Semantics and syntax of motion. In: J.P. Kimball (ed.), Syntax and semantics, Vol. 4: 181238. New York:
Academic Press.
Talmy, L., 1985. Lexicalization patterns: Semantic structure in lexical forms. In: T. Shopen (ed.), Language typology and
syntactic description, Vol. 3: Grammatical categories and the lexicon, 57149. Cambridge: Cambridge University Press.
Tenny, C., 1987. Grammaticalizing aspect and affectedness. Ph.D. thesis, MIT.
Wasow, T., 1977. Transformations and the lexicon. In: P. Culicover, A. Akmajian, T. Wasow (eds.), 1977, 327360.
Zaenen, A., 1993. Unaccusativity in Dutch: An integrated approach. In: J. Pustejovsky (ed.), 1993, 129161.
Section 2
Discovering the word units
assume that complete utterances are not what lexicons consist of. The number of utterances is potentially infinite, so that to store
all the utterances we might ever hear our lexicon would also have to have infinite capacity. Even if we set an arbitrary length
limit, the number of possible utterances is enormously large; for instance, Miller (1967) calculated that there must be at least
1020 possible English sentences of twenty words or less a total which, he drily added, would take considerably longer than the
estimated age of the earth to speak.
Instead, we assume that the contents of a lexicon consist of sound-to-meaning mappings in discrete chunks. (We can refer to
lexical entries by the shorthand term 'words', although of course not all lexical entries necessarily correspond to what would be
written as a single separate word. Some subword forms such as affixes or stem morphemes may well have lexical representation,
as may particles which are conjoined with other words in writing; likewise, multi-word idiomatic expressions and frequently
occurring phrases may be represented by a single entry.) Thus using a lexicon requires the separation of utterances into the
lexically relevant chunks of which they are made up producing speech requires the language user to string together lexical
entries to make a whole utterance, and recognising speech requires division of an utterance into units which can be looked up in
the lexicon. Likewise, acquiring a lexicon eventually involves acquiring the ability to use it in these ways.
The present contribution focusses on the very start of lexicon-building: how the infant might find out what words in the input
language are like, and might assemble an initial stock of known words. The initial task is perceptual. What exactly does it
involve? For instance, does it involve (as mature use of a lexicon in speech recognition involves) division of multi-word
utterances into lexically relevant chunks? And if so, how difficult is this task? To answer these questions we need to consider
the nature of the speech input with which the infant is most likely to be confronted. Comparative studies of various types of
speech are considered in the next section.
as rehearsed speech (heard in the theatre or on radio or television, for example), read speech (in news broadcasts, or, too often,
in lectures) or computer-synthesised speech.
Whatever the style of speech, words in isolation occur only rarely nearly all utterances are multi-word. A lot is known about the
phonetics of multiword utterances, and a fair summary of our knowledge is that words are strongly affected by the contexts in
which they occur; moreover, these contextual assimilation processes operate to obscure word boundaries, with the result that
there are few reliable cues in a continuous speech signal to where one word ends and the next begins. Klatt (1989) provides a
telling overview of the problems which this causes for the lexical access process so essential for speech recognition.
Nevertheless, the majority of such phonetic studies have been conducted on speech produced in laboratory situations, which is
normally read speech. Is this a fair representation of the speech which most listeners usually hear? Motivated by this question,
speech scientists have undertaken a number of studies aimed at describing spontaneous speech, and because of the underlying
motivation, most of the studies have been comparative: spontaneous speech has been contrasted with read speech. These studies
have revealed systematic differences between the two types of speech. Some of these differences might render the listener's
problems even worse in spontaneous than in read speech. For example, casual spontaneous speech is particularly prone to
phonological elisions and assimilations (G. Brown 1977, Labov 1972, Milroy 1980) and to syntactic simplifications and,
occasionally, incompleteness (Cheshire 1982, Labov 1972). Other differences, however, might make life easier for listeners to
spontaneous speech. These are principally differences in the prosodic domain. Thus spontaneous speech tends to be produced at
a slower rate than read speech (Barik 1977, Johns-Lewis 1986, Levin et al. 1982), and to have longer and more frequent pauses
and hesitations (Barik 1977, Crystal and Davy 1969, Kowal et al. 1975, Levin et al. 1982) and shorter prosodic units (Crystal
and Davy 1969).
Listeners can distinguish spontaneous utterances either from read speech (Levin et al. 1982, Remez et al. 1985, Blaauw 1991) or
from rehearsed speech (Johns-Lewis 1987); their judgements are most likely based on prosodic aspects of the speech, because
accuracy is still high when the speech extracts have been low-pass filtered (Levin et al. 1982), while the distinction can not be
as accurately made on written versions of the text (Johns-Lewis 1987). Fluent spontaneous speech can be identified as
accurately as disfluent (Blaauw 1991).
The prosodic differences between spontaneous and read speech have consequences for the way speech in each mode is
processed by listeners. McAllister (1991) examined word recognition in spontaneous and read speech using the gating task, in
which listeners hear successively larger fragments of a word. She found that word identification (in context) occurred earlier for
a word stressed on the first rather than on the second syllable in spontaneous speech, but not in read speech. In a word-by-word
gating study of spontaneous speech Bard et al. (1988) and Shillcock et al. (1988) similarly found that words containing strong
syllables were easier to identify than words which were realised as weak syllables.
Mehta and Cutler (1988) investigated phoneme detection reaction time in spontaneous and read speech, and compared in
particular the relative strength in the two speech modes of a number of previously established effects. They found no overall
difference in response time between the two speech modes, and also no difference between the two modes on the one semantic
variable in the study, the effects of the transitional probability of the target-bearing word. However, four other effects differed
across modes. In read speech but not in spontaneous speech, late targets were detected more rapidly than early targets, and
targets preceded by long words were detected more rapidly than targets preceded by short words. In contrast, in spontaneous
speech but not in read speech, targets were detected more rapidly in accented than in unaccented words and in strong than in
weak syllables.
Mehta and Cutler explained these differences in terms of prosodic differences between the two speech modes. The greater
frequency of hesitations in spontaneous speech, for example, results in shorter prosodic units, which in turn reduces the average
span over which rhythmic predictability will hold. So because prosodic units are long generally clause-length in read speech, but
usually short in spontaneous speech, the opportunity for rhythmic prediction in the latter case is much smaller. Mehta and Cutler
thus argued that position in the sentence is not, strictly speaking, what affects target detection time; rather, the effective variable
is position in the prosodic unit. Similarly, because hesitations tend to be more frequent and longer in spontaneous speech, it is
much more likely that a particular target-bearing word will be preceded by a hesitation in the spontaneous than in the read
mode. Where a target is immediately preceded by a hesitation, any effects of incomplete processing of the previous word will be
nullified by the extra processing time provided by the hesitation, so that effects of preceding word length, which are held to
reflect just such processing hangovers from the preceding word, will be less likely. Finally, because accent patterns in
spontaneous utterances were
more varied and less likely to express default accenting than those in read utterances, and the acoustic differences between
strong and weak syllables were greater in spontaneous than in read speech, there was greater opportunity for processing effects
of both sentence accent and syllable stress to appear in the spontaneous than in the read speech; this would account for the
finding of significant facilitation due to sentence accent and syllable stress in the former but not in the latter. The results of the
gating studies described above provide similar evidence of the perceptual importance of syllable stress in spontaneous speech.
These findings speak to the majority case for speech processing. Most speech that adult listeners hear is spontaneously
produced. Such speech is characterised by a fairly slow overall rate of speech, short prosodic units, frequent pauses and, in
English, a clear opposition between strong and weak syllables. These factors affect the way the speech is processed.
The one exception is that infant-directed speech is reported to have higher pitch and a wider fundamental frequency range
(Fernald and Simon 1984, Garnica 1977). In contrast, the fundamental frequency range of spontaneous speech has been reported
in at least some studies to be relatively narrow, at least in intimate conversation (Johns-Lewis 1986, Blaauw 1991). Pitch is a
particularly important dimension of infant-directed speech, since the fact that infants prefer to listen to this style of speech
(Fernald 1985) has been found to be principally due to its pitch characteristics (Fernald and Kuhl 1987, Sullivan and Horowitz
1983).
Ohala (1983, 1984) has argued that raised pitch is an ethologically universal signal of smallness, ingratiation and non-
threatening attitude. From such a perspective it would be possible to argue that raised pitch might not be a phonologically
relevant manipulation in speech to infants, but might simply arise from universal expression of affection or nurturance on the
part of an adult to an infant. Against this conclusion, on the other hand, might be cited the more recent findings that the pitch
manipulations found in infant-directed speech in American English and related languages are apparently not universal. Although
rising contours predominate in infant-directed speech in the stress languages English (Sullivan and Horowitz 1983) and German
(Fernald and Simon 1984), falling contours are more prevalent in the tone languages Mandarin (Grieser and Kuhl 1988) and
Thai (Tuaycharoen 1978). In a comprehensive review of the literature on pitch in infant-directed speech, Shute (1987)
concluded that pitch modifications are not only clearly not universal across languages, but may also differ within one language
as a function of sex of the speaker, age of the child addressee, frequency of the speaker's interaction with children and other
factors.
In fact a recognisable style of infant-directed speech is itself not universal, contrary to the confident expectations of researchers
in the 70s that it would prove not only to be universal (Ferguson 1977) but absolutely necessary for successful acquisition (R.
Brown 1977). It is now clear that there are cultures where infants are exposed to much normal adult speech but no speech in any
special infant-directed mode (Heath 1983, Schieffelin 1985, Schieffelin and Ochs 1983). Even where infant-directed speech
appears to conform to the pattern observed in English and like languages, this may not constitute a specialised mode; thus
infant-directed speech in Quiche Mayan has relatively high pitch, but so, in this language, does adult-directed speech from the
same informants (Bernstein-Ratner and Pye 1984).
Thus it is reasonable to conclude that infants in the earliest stages of language acquisition receive at least some of their input and
for some
infants, perhaps all of their input in a form that at least closely resembles normal spontaneous speech between adults. One of the
characteristics of spontaneous speech, it will be recalled, is the high frequency of phonological elisions and assimilations (G.
Brown 1977, Labov 1972, Milroy 1980). Some studies have reported that child-directed speech, too, is replete with such
distorting processes (Bard and Anderson 1983, 1991; Shockey and Bond 1980), which is consistent with the view that this style
of speech lies on a general continuum with adult-directed casual speech. Other studies, however, have reported lower frequency
of distorting phonological transformations in speech to infants than in speech to adults (Bernstein-Ratner 1984a,b). In an attempt
to resolve this apparent contradiction, Stoel-Gammon (1984) transcribed five hours of speech to one-year-olds; her results
strongly support the view of a continuum, since she effectively discovered a continuum within her own data, from very clear
articulation (e.g. release of word-final stop bursts, clear articulation of unstressed vowels) to very casual forms (frequent vowel
reduction, omissions of whole syllables such as [sko] as a pronunciation of let's go). Stoel-Gammon concluded that the
phonological characteristics of speech to children depend on such factors as contextual redundancy, the function of the
individual utterance, and the situational context the same factors that determine the phonological forms of adult spontaneous
speech (Lieberman 1963, Cheshire 1982).
There is to my knowledge no evidence, from any culture, of a greater incidence of isolated words in speech to children than in
other forms of speech. Even though phrases may be short, they are still phrases. Thus a speech segmentation problem, as
described in the introductory section to this paper, seems to exist for the infant as for the adult language user. The speech that
the infant hears is continuous; much of the speech of the infant's environment will be speech among mature language users; in
perhaps a majority of cultures speech addressed specifically to the infant would form only a small proportion of the input; even
then, such speech may not necessarily be clearly articulated. The problem is compounded for the infant by the necessity of
compiling a lexicon, and this added difficulty does not trade off against reduced segmentation difficulty in the input. In fact, the
scale of the segmentation problem in the structure of the input is remarkably similar for the infant and for the adult.
words cannot be recognised till after their ends then segmentation by default would lose its very basis.
Secondly, models such as segmentation by default are far from robust; they assume that prelexical processing of the speech
signal will be accurate. But in practice speech signals are not always fully clear. Background noise, distance between speaker
and listener, distortion of the speaker's vocal tract, foreign accents, slips of the tongue all these, and similar factors, conspire to
make the listener's phonetic interpretation task harder. A much more robust model is needed to account for what is obviously
true, namely that human speech recognition is extremely successful even under noisy conditions or with previously unfamiliar
voices or accents.
than two-thirds of all weak syllables were the sole or initial syllables of grammatical words.
This means that a listener encountering a strong syllable in spontaneous English conversation would seem to have about a three
to one chance of finding that strong syllable to be the onset of a new lexical word. A weak syllable, on the other hand, would be
most likely to be a grammatical word. English speech should therefore lend itself to a segmentation procedure whereby strong
syllables are assumed to be the onsets of lexical words. Cutler and Norris interpreted results of an experiment they ran as
evidence for such a procedure. They used a task which they called word-spotting, in which listeners were asked to detect real
words embedded in nonsense bisyllables; detection times were slower to the embedded word in, say, mintayf (in which the
second vowel is strong) than in mintef (in which the second vowel is schwa). Cutler and Norris interpreted this as evidence that
listeners were segmenting mintayf prior to the second syllable, so that detection of mint therefore required combining speech
material from parts of the signal which had been segmented from one another. No such difficulty would arise for the detection
of mint in mintef, since the weak second syllable would not be divided from the preceding material.
Further evidence for such a procedure was produced by Cutler and Butterfield (1992), who investigated the way in which word
boundaries tend to be misperceived. In both spontaneous and experimentally elicited misperceptions they found that erroneous
insertions of a word boundary before a strong syllable (e.g. achieve being heard as a cheap) and deletions of a word boundary
before a weak syllable (e.g. bird in being heard as burgling) were far more common than erroneous insertions of a boundary
before a weak syllable (e.g. effective being heard as effect of) or deletions of a boundary before a strong syllable (e.g. were
waiting being heard as awaken). This is exactly what would be expected if listeners deal with the segmentation problem by
assuming that strong syllables are likely to be word-initial, but weak syllables are not.
As Cutler and Norris point out, the strong syllable is defined by the quality of its vowel (full, in comparison to the reduced
vowels of weak syllables); thus spotting strong syllables cannot provide a complete solution to the segmentation problem since
word boundaries actually occur prior to the onset of syllables. A strong syllable spotter must be supplemented by some means of
estimating actual syllable onset; Cutler and Norris suggest that more than one alternative realisation of such a device would be
feasible. Assuming that a rhythmically based segmentation procedure is indeed practrical,
its advantages are considerable. For instance, such a procedure is obviously not going to be affected by the frequency of words
embedded within other words in speech, or by the relative frequency of monosyllables versus polysyllables. Only where
polysyllabic words contain strong syllables in non-initial position will the procedure produce a non-optimal result (i.e. it will
signal a word boundary but this will be a false alarm). However, polysyllabic words with non-initial strong syllables occur
relatively rarely (Cutler and Carter 1987), and in only a small minority of them will a false alarm actually produce a real word
unrelated to the embedding word (e.g. late in collate; Cutler and McQueen, in press). Thus rhythmic segmentation is a relatively
efficient procedure for English.
It is also quite robust in fact, it is precisely with uncertain input that rhythmic segmentation proves particularly useful.
Researchers in automatic speech recognition (e.g. Shipman and Zue 1982) have developed systematic representations of
phonetic uncertainty, namely transcriptions in which only general classes of phoneme are provided (e.g. glide, nasal, stop
consonant, etc.). Two studies using uncertain input of this kind have produced further evidence in favour of rhythmic
segmentation. In the first study, Briscoe (1989) implemented four segmentation algorithms and tested their performance on a
(phonetically transcribed) continuous input, using a 33,000-word lexicon. The algorithms postulated potential lexical
boundaries: (a) at the end of each successfully identified word ('segmentation by default'); (b) at each phoneme boundary; (c) at
each syllable onset; and (d) at each strong syllable onset (the rhythmic segmentation proposal). The measure of performance was
the number of potential lexical hypotheses generated (the fewer the better). With completely specified phonetic input all
algorithms naturally performed quite well. However, significant differences between the algorithms emerged when some or all
of the input was phonetically uncertain; most affected were 'segmentation by default' and the phonemic algorithm, both of which
generated huge numbers of potential parses of incomplete input. Far better results were produced by the algorithms which
constrained possible word onset positions in some way, and the more specific the constraints, the better the performance: the
rhythmic segmentation algorithm performed best of all with the uncertain input. In the second study, Harrington et al. (1989)
compared the rhythmic segmentation algorithm with a segmentation algorithm based on permissible phoneme sequences (Lamel
and Zue 1984, Harrington et al. 1987), using as a metric the proportion of word boundaries correctly identified in a 145-
utterance corpus. With phonetically uncertain input, sequence constraints proved virtually useless, but the rhythmic segmentation
algorithm still performed effectively (in fact it correctly detected more word boundaries in uncertain input than the phoneme
sequence constraints had detected in completely specified input).
The efficiency and robustness of rhythmic segmentation therefore suggest that listeners profit from employing an explicit
segmentation procedure of this kind. A striking fact about this procedure, however, is its language-specificity: as described for
English, the procedure is based on stress rhythm, i.e. the opposition of strong and weak syllables. Clearly, it therefore cannot be
a universal strategy, because many (indeed most) languages of the world do not have stress rhythm. However, all languages have
rhythm speech rhythm need not be stress-based. In the next section alternative forms of rhythmic segmentation are described,
supported by the results from experiments in languages which do not have stress rhythm.
If speech segmentation in French proceeds syllable by syllable, there is an interesting parallel to the results from English
reported in the previous section. Just as use of the opposition between strong and weak syllables in segmenting English exploits
the English language's characteristic stress-based rhythmic pattern, so does use of the syllable in segmenting French exploit
rhythmic patterns, since the characteristic rhythm of French is syllable-based. Recent results from studies of speech
segmentation in a third language, Japanese, confirm the connection between segmentation and speech rhythm. In Japanese,
speech rhythm is based on a subsyllabic unit called the mora (which can be a vowel, an onset-vowel sequence, or a syllabic
coda). Otake et al. (1993) conducted an experiment in Japanese which was directly analogous to the French experiment by
Mehler et al. (1981); they compared detection of CV (e.g. ta-) and CVC (e.g. tan-) targets in Japanese words beginning with
open (tanishi) versus closed (tanshi) syllables. In both words the first mora is the initial CV sequence ta; and detection of CV
targets was equally fast in both words (had the Japanese subjects been using a syllabic segmentation procedure, the CV targets
should have been harder to detect in closed than in open syllables). CVC targets constitute two morae, and correspond to the
first two morae of the words with initial closed syllables; however, they do not correspond properly to a mora-based
segmentation of words like tanishi (CV-CV-CV). Indeed, the Japanese listeners responded to the CVC targets in words like
tanshi, but usually failed to respond in words like tanishi.
Thus rhythmic segmentation seems to be quite a widespread phenomenon across languages, with the nature of the rhythmic
processing being determined by the nature of each language's characteristic rhythmic structure: stress-based, syllabic, or moraic
rhythm can all be used in speech segmentation by adult listeners. However, there turn out to be strict limitations on the way any
listener can exploit speech rhythm in segmentation; and these limitations may illuminate the questions with which we started
this chapter, namely those pertaining to how the prelinguistic infant first solves the segmentation problem.
al. 1986); neither English nor French listeners use moraic segmentation when performing the same task with Japanese (Otake et
al. 1993). In other words, syllabic segmentation seems to be specific to French listeners, moraic segmentation to Japanese
listeners. (In fact the French listeners segmented both English and Japanese speech by syllables, just as they segment French!)
Moreover, under appropriate conditions listeners can be seen to abandon the rhythmic segmentation procedures characteristic of
their language community. When responding very fast, French listeners can base their responses on subsyllabic units (Dupoux
1993). CVV sequences are apparently less conducive to application of moraic segmentation by Japanese listeners than the (more
common) CVCV and CVN sequences (Otake 1992). The failure to find processing disadvantages for English words beginning
with weak syllables when the words are carefully read, reported in the second section of this paper, may reflect a similar case: if
the input is very clear, stress-based segmentation may not need to be called into play. Thus it is quite clear that none of the
rhythmic segmentation procedures constitutes an absolutely necessary component of adult listeners' speech processing.
The strongest evidence that this is so comes, however, from studies of bilingual processing. Cutler et al. (1992) tested French-
English bilinguals with the techniques which had demonstrated syllabic responding in French listeners (Mehler et al. 1981) and
stress-based responding in English listeners (Cutler and Norris 1988). Their subjects were as bilingual as they could find each
had learned both languages from the earliest stages of acquisition, spoke both languages daily, and was accepted as a native
speaker by monolinguals in each language.
Yet these bilinguals did not necesarily produce the pattern of results which monolinguals had shown on each previous
experiment. Instead, their response patterns could be predicted from a measure of what Cutler et al. called language 'dominance',
which amounted in essence to a decision as to which of their two languages the bilinguals would be most sorry to lose. On
Mehler et al.'s target detection task with French materials, only those bilinguals who chose French as their 'dominant' language
showed a syllabic pattern of responding; the English-dominant bilinguals showed no trace of syllabic effects. On Cutler and
Norris' word-spotting task, in contrast, a stress-based response pattern appeared only with those bilinguals who chose English;
the responses of the French-dominant bilinguals were unaffected by the rhythmic pattern of the embedding nonsense word.
Apparently, these maximally competent bilinguals had available to them in these tasks only one rhythmic segmentation
procedure either that which was characteristic of one of their native languages, or that which was characteristic of the other, but
not both.
Of course, it should be remembered that this conclusion is based only on the results of laboratory experiments, and may not
reflect the full extent of the resources which bilinguals can apply to the processing of, for example, spontaneous speech; as
earlier sections of this chapter described, different speech styles may call differentially upon a listener's processing repertoire.
However, the experiments undeniably show that in the laboratory some bilinguals can exploit a given rhythmic segmentation
procedure, and do exploit it, while others certainly do not exploit the same procedure, and possibly cannot do so. A claim that,
for example, French-dominant French-English bilinguals are capable of stress-based segmentation, but abandon it when
processing laboratory speech, ought therefore to be accompanied by an account of why English-dominant bilinguals, and
monolingual English speakers, do not abandon this procedure in the laboratory. On the basis of the laboratory results alone, it
would surely appear that bilinguals simply do not have available to them the segmentation procedure characteristic of their non-
dominant language.
This is a remarkable finding in the light of the undoubted competence of these bilingual speakers in both their languages. The
English-dominant bilinguals spoke and understood French just as well as the French-dominant bilinguals did, and the latter
group spoke and understood English just as well as the former. For those bilinguals who used stress-based segmentation with
English, the apparent unavailability of syllabic segmentation for use with French seemed to have no adverse effect on their
linguistic competence; likewise, for those bilinguals who used syllabic segmentation with French, the unavailability of stress-
based segmentation seemed not to reduce in any way their demonstrated competence in English. These results may therefore
indicate that the rhythmic segmentation procedures are not a necessary component of a language user's processing mechanism;
one can demonstrate native competence without them.
This in turn would imply that rhythmic segmentation procedures are not simply developed in response to experience with the
statistical properties of the native language, as the arguments made by, for instance, Cutler and Carter (1987) with respect to
stress-based segmentation in English contended. There is no doubt that stress-based segmentation does work efficiently with
English; but despite having been exposed to English since their earliest years, and despite using English with native competence
all their lives, the French-dominant bilinguals nevertheless do not, in the word-spotting experiment, show evidence of
segmenting by stress. The question must be posed, therefore, of how the rhythmic segmentation procedures could arise, if it is
arguably the case that they may not result automatically from experience with the statistical properties of the native language. A
possible answer to this question, proposed by Cutler et al. (1992) and by Cutler and Mehler (1993), is discussed in the next
section.
segmentation dividing the continuous speech into lexical units. Nor need the rhythmic structure be exclusively expressed in the
auditory domain; as Pettito and Marentette (1991) demonstrate, gestural language acquisition by congenitally deaf infants
follows a developmental path with noticeable similarities to spoken language acquisition.
Because, as we have described, rhythmic structure differs even across spoken languages, the infant exposed to stress rhythm will
focus upon a different regularity than the infant exposed to, say, syllabic or moraic rhythm. As Cutler et al. (1992) argue, this
can be conceived of as the infant attending to the smallest level of regularity occurring in the spoken input. What is remarkable
about this process is that it seems to happen only once, if the evidence from the bilingual studies is reliable. That is, exposure to
two differing rhythmic regularities (syllabic and stress rhythm, for instance) does not result in the ability to use both types of
rhythm in speech segmentation; a language user appears to be able to command only one rhythmic segmentation procedure. This
type of all-or-none instantiation of a language processing procedure is distinctly reminiscent of the notion of parameter-setting in
syntactic processing (e.g. Wexler and Manzini 1987).
evidence of infants' exploitation of utterance prosody to structure speech into interpretable units.
Indirect evidence for the present proposal can be found in both perceptual and production evidence from prelinguistic infants.
For instance, it has been shown that the characteristic rhythmic pattern of speech is salient to the newborn child. Condon and
Sander (1974) found that neonates are able to synchronise their movements with speech structure, whether the speech is spoken
directly to the child or played from a tape recorder, and whether it is in the parental language or a foreign language. (Tapping
sounds, on the other hand, did not invoke synchrony in the infant's movement.) The ability to discriminate the contrasts involved
in rhythmic patterning appears early; thus two-month-olds can discriminate rhythmic groupings of tones (Demany et al. 1977).
These early discriminatory abilities also apply to the particular contrasts involved in speech rhythm: very young infants can
discriminate stress contrasts (Spring and Dale 1977, Jusczyk and Thompson 1978, Karzon 1985), and neonates can make
discriminations based on number of syllables (Bijeljac-Babic et al. 1993). Speech to infants tends to have more regular rhythm
than speech to adults, as evidenced in English by more frequent occurrence of stresses (Garnica 1977) and more regular
alternation of vocalisation and pause (Stern et al. 1983); however, the relevance of this is unclear given that durational features
of infant-directed speech do not appear to be involved in infant preferences for this speech style (Fernald and Kuhl 1987).
More important would seem to be recent evidence of rhythmic patterning in the speech production of prelinguistic infants.
Cross-linguistic studies of babbling have pointed to increasing language-specificity in babbling during the second half of the
first year of life (e.g. de Boysson-Bardies et al 1984, de Boysson-Bardies and Vihman 1991, Hallé et al. 1991, Blake and de
Boysson-Bardies 1992), including language-specificity in prosodic structure (Whalen et al. 1991). Rhythmic structure is one of
the language-specific patterns which appear in speech at this age. Levitt and Wang (1991) and Levitt and Utman (1992) found
that reduplicative babbling of infants from French-speaking homes showed a gradually increasing regularity of timing of non-
final syllables across the first year of life, while the speech of infants of the same age from English-speaking homes showed a
gradually increasing variability of syllable structure and timing. This suggests that the characteristic rhythm of speech is
incorporated into infants' linguistic competence before they acquire their first words.
It appears that infants also become aware of the characteristic word prosody of their language before acquiring their first words.
Jusczyk et al.
(1993) found that nine-month-old infants acquiring English preferred to listen to lists of bisyllabic words with initial stress
(crossing, former, cable) than bisyllables with final stress (across, before, decay), although no such preferences appeared with
six-month-olds. Even when the lists were low-pass filtered to remove most of the segmental information, nine-month-olds still
preferred the initial-stress lists, suggesting that their preferences were based on prosodic structure. Jusczyk et al. argued that
during the second half of their first year, infants exercise their ability to segment speech with the result that they acquire
knowledge of the typical prosodic structure of words in the input language.
At later ages, language-specific exploitation of rhythmic structure by children is established: children learning English use stress
rhythm in segmentation (Gerken 1991, Gerken et al. 1990, Peters 1985); children learning French and other languages with
syllable rhythm use syllables (Alegria et al. 1982, Content et al. 1986); children learning Japanese use morae (Mann 1986). The
hypothesis proposed here is that language rhythm is also what allows infants to accomplish their very first segmentation of
speech. An ability to process rhythm is inborn. By using this ability, infants are enabled to overcome the segmentation problem
and hence take their first step towards compilation of their very own lexicon.
References
Alegria, J., E. Pignot and J. Morais, 1982. Phonetic analysis of speech and memory codes in beginning readers. Memory and
Cognition 10, 451456.
Bard, E.G. and A. Anderson, 1983. The unintelligibility of speech to children. Journal of Child Language, 10, 265292.
Bard, E.G. and A. Anderson, 1991. The unintelligibility of speech to children: Effects of referent availability. Proceedings of the
Twelfth International Congress of Phonetic Sciences, Aix-en-Provence, Vol. 4, 458461.
Bard, E.G., R.C. Shillcock and G. Altmann, 1988. The recognition of words after their acoustic offsets in spontaneous speech:
Effects of subsequent context. Perception and Psychophysics 44, 395408.
Barik, H.C., 1977. Cross-linguistic study of temporal characteristics of different types of speech materials. Language and Speech
20, 116126.
Bernstein-Ratner, N., 1984a. Patterns of vowel modification in mother-child speech. Journal of Child Language 11, 557578.
Bernstein-Ratner, N., 1984b. Phonological rule usage in mother-child speech. Journal of Phonetics 12, 245254.
Bernstein-Ratner, N. and C. Pye, 1984. Higher pitch in BT is not universal: Acoustic evidence from Quiche Mayan. Journal of
Child Language 11, 515522.
Bijeljac-Babic, R., J. Bertoncini and J. Mehler, 1993. How do 4-day-old infants categorize multisyllabic utterances?
Developmental Psychology 29, 711721.
Blake, J. and B. de Boysson-Bardies, 1992. Patterns in babbling: A cross-linguistic study. Journal of Child Language 19, 5174.
Blaauw, E., 1991. Phonetic characteristics of spontaneous and read-aloud speech. Proceedings of the ESCA Workshop on
Phonetics and Phonology of Speaking Styles, Barcelona, 12.112.5.
de Boysson-Bardies, B. and M.M. Vihman, 1991. Adaptation to language: Evidence from babbling and first words in four
languages. Language 67, 297319.
de Boysson-Bardies, B., L. Sagart and C. Durand, 1984. Discernible differences in the babbling of infants according to target
language. Journal of Child Language 11, 115.
Briscoe, E.J., 1989. Lexical access in connected speech recognition. Proceedings of the 27th Congress, Association for
Computational Linguistics, Vancouver, 8490.
Brown, G., 1977. Listening to spoken English. London: Longman.
Brown, G.D.A., 1984. A frequency count of 190,000 words in the London-Lund corpus of English conversation. Behavior
Research Methods, Instrumentation and Computers 16, 502532.
Brown, R., 1977. Introduction. In: C.E. Snow, C.A. Ferguson (eds.), Talking to children: Language input and acquisition, 127.
Cambridge: Cambridge University Press.
Cheshire, J., 1982. Variation in an English dialect. Cambridge: Cambridge University Press.
Cole, R.A. and J. Jakimik, 1978. Understanding speech: How words are heard. In: G. Underwood (ed.), Strategies of information
processing, 67116. London: Academic Press.
Colombo, J. and R. Bundy, 1981. A method for the measurement of infant auditory selectivity. Infant Behavior and
Development, 4, 219233.
Coltheart, M., 1981. The MRC psycholinguistic database. Quarterly Journal of Experimental Psychology 33A, 497505.
Condon, W.S. and L.W. Sander, 1974. Synchrony demonstrated between movement of the neonate and adult speech. Child
Development 45, 456462.
Content, A., R. Kolinsky, J. Morais and P. Bertelson, 1986. Phonetic segmentation in prereaders: Effect of corrective
information. Journal of Experimental Child Psychology 42, 4972.
Crystal, D. and D. Davy, 1969. Investigating English style. London: Longman.
Cutler, A. and S. Butterfield, 1992. Rhythmic cues to speech segmentation: Evidence from juncture misperception. Journal of
Memory and Language 31, 218236.
Cutler, A. and D.M. Carter, 1987. The predominance of strong initial syllables in the English vocabulary. Computer Speech and
Language 2, 133142.
Cutler, A. and J.M. McQueen, in press. The recognition of lexical units in speech. In: B. de Gelder, J. Morais (eds.), From
spoken to written language. Cambridge, MA: MIT Press.
Cutler, A. and J. Mehler, 1993. The periodicity bias. Journal of Phonetics 21, 103108.
Cutler, A. and D.G. Norris, 1988. The role of strong syllables in segmentation for lexical access. Journal of Experimental
Psychology: Human Perception and Performance 14, 113121.
Cutler, A., J. Mehler, D.G. Norris and J. Segui, 1986. The syllable's differing role in the segmentation of French and English.
Journal of Memory and Language 25, 385400.
Cutler, A., J. Mehler, D. Norris and J. Segui, 1992. The monolingual nature of speech segmentation by bilinguals. Cognitive
Psychology 24, 381410.
Demany, L., B. McKenzie and E. Vurpillot, 1977. Rhythm perception in early infancy. Nature 266, 718719.
Dupoux, E., 1993. The time course of lexical processing: The syllabic hypothesis revisited. In: G.T.M. Altmann, R.C. Shillcock
(eds.), Cognitive models of speech processing: The Sperlonga Meeting II, 81114. Cambridge MA: MIT Press.
Dupoux, E. and J. Mehler, 1990. Monitoring the lexicon with normal and compressed speech: Frequency effects and the
prelexical code. Journal of Memory and Language 29, 316335.
Ferguson, C., 1977. Baby talk as a simplified register. In: C.E. Snow, C.A. Ferguson (eds.), Talking to children: Language input
and acquisition, 209235. Cambridge: Cambridge University Press.
Fernald, A., 1985. Four-month-old infants prefer to listen to motherese. Infant Behavior and Development 8, 181195.
Fernald, A. and P. Kuhl, 1987. Acoustic determinants of infant preference for motherese speech. Infant Behavior and
Development 10, 279293.
Fernald, A. and T. Simon, 1984. Expanded intonation contours in mothers' speech to newborns. Developmental Psychology 20,
104113.
Fernald, A., T. Taeschner, J. Dunn, M. Papousek, B. de Boysson-Bardies and I. Furui, 1989. A cross-language study of prosodic
modifications in mothers' and fathers' speech to preverbal infants. Journal of Child Language 16, 477501.
Garnica, O., 1977. Some prosodic and paralinguistic features of speech to young children. In: C.E. Snow, C.A. Ferguson (eds.),
Talking to children: Language input and acquisition, 6388. Cambridge: Cambridge University Press.
Gerken, L., 1991. The metrical basis for children's subjectless sentences. Journal of Memory and Language 30, 431451.
Gerken, L., B. Landau and R.E. Remez, 1990. Function morphemes in young children's speech perception and production.
Developmental Psychology 26, 204216.
Gleitman, L.R. and E. Wanner, 1982. Language acquisition: The state of the state of the art. In: E. Wanner, L.R. Gleitman
(eds.), Language acquisition: The state of the art, 348. Cambridge: Cambridge University Press.
Gleitman, L.R., H. Gleitman, B. Landau and E. Wanner, 1987. Where learning begins: Initial representations for language
learning. In: F. Newmeyer (ed.), The Cambridge linguistic survey, 150193. Cambridge: Cambridge University Press.
Glenn, S.M., C.C. Cunningham and P.F. Joyce, 1981. A study of auditory preferences in nonhandicapped infants and infants
with Down's Syndrome. Child Development 52, 13031307.
Grieser, D.L. and P.K. Kuhl, 1988. Maternal speech to infants in a tonal language: Support for universal prosodic features in
motherese. Developmental Psychology 24, 1420.
Grosjean, F., 1985. The recognition of words after their acoustic offset: Evidence and implications. Perception and
Psychophysics 38, 299310.
Hallé, P., B. de Boysson-Bardies and M.M. Vihman, 1991. Beginnings of prosodic organisation: Intonation and duration
patterns of disyllables produced by Japanese and French infants. Language and Speech 34, 299318.
Harrington, J.M., I. Johnson and M. Cooper, 1987. The application of phoneme sequence constraints to word boundary
identification in automatic, continuous speech recognition. Proceedings of the First European Conference on Speech
Technology, Edinburgh, Vol. 1, 163167.
Harrington, J.M., G. Watson and M. Cooper, 1989. Word boundary detection in broad class and phoneme strings. Computer
Speech and Language 3, 367382.
Hayes, J.R. and H.H. Clark, 1970. Experiments on the segmentation of an artificial speech analogue. In: J.R. Hayes (Ed.),
Cognition and the development of language, 221234. New York: Wiley.
Heath, S.B., 1983. Ways with words: Language, life and work in communities and classrooms. Cambridge: Cambridge
University Press.
Hirsch-Pasek, K., D.G. Kemler Nelson, P.W. Jusczyk, K.W. Cassidy, B. Druss and L. Kennedy, 1987. Clauses are perceptual
units for young infants. Cognition 26, 269286.
Johns-Lewis, C., 1986. Prosodic differentiation of discourse modes. In: C. Johns-Lewis (ed.), Intonation and discourse, 199219.
London: Croom Helm.
Johns-Lewis, C., 1987. The perception of discourse modes. In: M. Coulthard (ed.), Discussing discourse. University of
Birmingham: Discourse Analysis Research Monograph No. 14, 249271.
Jusczyk, P.W., 1993. How word recognition may evolve from infant speech recognition capacitiers. In: G.T.M. Altmann, R.C.
Shillcock (eds.), Cognitive models of speech processing: The Sperlonga Meeting II, 2755. Cambridge MA: MIT Press.
Jusczyk, P.W. and E. Thompson, 1978. Perception of a phonetic contrast in multi-syllabic utterances by 2-month-old infants.
Perception and Psychophysics 23, 105109.
Jusczyk, P.W., A. Cutler and N.J. Redanz, 1993. Infants' preference for the predominant stress patterns of English words. Child
Development 64, 675687.
Jusczyk, P.W., D.G. Kemler Nelson, K. Hirsch-Pasek, L. Kennedy, A. Woodward and J. Piwoz, 1992. Perception of acoustic
correlates of major phrasal units by young infants. Cognitive Psychology 24, 252293.
Karzon, R.G., 1985. Discrimination of polysyllabic sequences by one- to four-month-old infants. Journal of Experimental Child
Psychology 39, 326342.
Klatt, D.H., 1989. Review of selected models of speech perception. In: W.D. Marslen-Wilson (ed.), Lexical representation and
process, 169226. Cambridge, MA: MIT Press.
Kowal, S., D. O'Connell, E.A. O'Brien and E.T. Bryant, 1975. Temporal aspects of reading aloud and speaking: Three
experiments. American Journal of Psychology 88, 549569.
Kuhl, P.K., K.A. Williams, F. Lacerda, K.N. Stevens and B. Lindblom, 1992. Linguistic experience alters phonetic perception in
infants by six months of age. Science 255, 606608.
Labov, W., 1972. Sociolinguistic patterns. Philadelphia, PA: University of Pennsylvania Press.
Lamel, L. and V.W. Zue, 1984. Properties of consonant sequences within words and across word boundaries. Proceedings of the
1984 International Conference on Acoustics, Speech and Signal Processing 42.3.142.3.4.
Levin, H., Schaffer, C.A. and C. Snow, 1982. The prosodic and paralinguistic features of reading and telling stories. Language
and Speech 25, 4354.
Levitt, A.G. and J.G.A. Utman, 1992. From babbling towards the sound systems of English and French: A longitudinal two-
case study. Journal of Child Language 19, 1949.
Levitt, A.G. and Q. Wang, 1991. Evidence for language-specific rhythmic influences in the reduplicative babbling of French-
and English-learning infants. Language and Speech 34, 235249.
Lieberman, P., 1963. Some effects of semantic and grammatical context on the production and perception of speech. Language
and Speech 6, 172187.
Luce, P.A., 1986. A computational analysis of uniqueness points in auditory word recognition. Perception and Psychophysics
39, 155158.
Mann, V.A., 1986. Phonological awareness: The role of reading experience. Cognition 24, 6592.
McAllister, J., 1991. The processing of lexically stressed syllables in read and spontaneous speech. Language and Speech 34,
126.
McQueen, J.M. and A. Cutler, 1992. Words within words: Lexical statistics and lexical access. Proceedings of the Second
International Conference on Spoken Language Processing, Banff, Canada, Vol. 1, 221224.
Mehler, J., 1981. The role of syllables in speech processing. Philosophical Transactions of the Royal Society, B295, 333352.
Mehler, J., E. Dupoux and J. Segui, 1990. Constraining models of lexical access: The onset of word recognition. In: G.T.M.
Altmann (ed.), Cognitive models of speech processing: Psycholinguistic and computational perspectives, 236262. Cambridge,
MA: MIT Press.
Mehler, J., J.-Y. Dommergues, U. Frauenfelder and J. Segui, 1981. The syllable's role in speech segmentation. Journal of Verbal
Learning and Verbal Behavior 20, 298305.
Mehta, G. and A. Cutler, 1988. Detection of target phonemes in spontaneous and read speech. Language and Speech 31,
135156.
Miller, G.A., 1967. The psychology of communication. London: Penguin.
Milroy, L., 1980. Language and social networks. Oxford: Blackwell.
Ohala, J.J., 1983. Cross-language use of pitch: An ethological view. Phonetica 40, 118.
Ohala, J.J., 1984. An ethological perspective on common cross-language utilization of F0 of voice. Phonetica 41, 116.
Otake, T., 1992. Morae and syllables in the segmentation of Japanese. Paper presented to the XXV International Congress of
Psychology, Brussels, July.
Otake, T., G. Hatano, A. Cutler and J. Mehler, 1993. Mora or syllable? Speech segmentation in Japanese. Journal of Memory
and Language 32, 358378.
Peters, A.M., 1985. Language segmentation: Operating principles for the perception and analysis of language. In: D.I Slobin
(ed.), The crosslinguistic study of language acquisition, Vol. 2: Theoretical issues, 10291067. Hillsdale, NJ: Erlbaum.
Pettito, L.A. and P.F. Marentette, 1991. Babbling in the manual mode: Evidence for the ontogeny of language. Science 251,
14931496.
Pilon, R., 1981. Segmentation of speech in a foreign language. Journal of Psycholinguistic Research 10, 113122.
Remez, R.E., P.E. Rubin and S. Ball, 1985. Sentence intonation in spontaneous utterances and fluently spoken text. Paper
presented to the Acoustical Society of America, 109th Meeting, Austin, Texas, April.
Schieffelin, B.B., 1985. The acquisition of Kaluli. In: D.I. Slobin (ed.), The crosslinguistic study of language acquisition, Vol. 1:
The data, 525593. Hillsdale, NJ: Erlbaum.
Schieffelin, B.B. and E. Ochs, 1983. A cultural perspective on the transition from prelinguistic to linguistic communication. In:
R.M. Golinkoff (ed.), The transition from prelinguistic to linguistic communication, 115131. London: Erlbaum.
Segui, J., 1984. The syllable: A basic perceptual unit in speech processing. In: H. Bouma, D.G. Bouwhuis (eds.), Attention and
performance, Vol. 10, 165181. Hillsdale, NJ: Erlbaum.
Segui, J., U. Frauenfelder and J. Mehler, 1981. Phoneme monitoring, syllable monitoring and lexical access. British Journal of
Psychology 72, 471477.
Shillcock, R.C., E.G. Bard and F. Spensley, 1988. Some prosodic effects on human word recognition in continuous speech.
Proceedings of SPEECH '88 (Seventh Symposium of the Federation of Acoustic Societies of Europe), Edinburgh, 819826.
Shipman, D.W. and V.W. Zue, 1982. Properties of large lexicons: Implications for advanced isolated word recognition systems.
Proceedings of the 1982 International Conference on Acoustics, Speech and Signal Processing, Paris, 546549.
Shockey, L. and Z.S. Bond, 1980. Phonological processes in speech addressed to children. Phonetica 37, 267274.
Shute, H.B., 1987. Vocal pitch in motherese. Educational Psychology 7, 187205.
Spring, D.R. and P.S. Dale, 1977. Discrimination of linguistic stress in early infancy. Journal of Speech and Hearing Research
20, 224232.
Stern, D.N. S. Spieker, R.K. Barnett and K. MacKain, 1983. The prosody of maternal speech: Infant age and context-related
changes. Journal of Child Language 10, 115.
Stoel-Gammon, C., 1984. Phonological variability in mother-child speech. Phonetica 41, 208214.
Sullivan, J.W. and F.D. Horowitz, 1983. The effects of intonation on infant attention: The role of the rising intonation contour.
Journal of Child Language 10, 521534.
Svartvik, J. and R. Quirk, 1980. A corpus of English conversation. Lund: Gleerup.
Tuaycharoen, P., 1978. The babbling of a Thai baby: Echoes and responses to the sounds made by adults. In: N. Waterson, C.
Snow (eds.), The development of communication, 111125. Chichester: Wiley.
Wakefield, J.A., E.B. Doughtie and B.-H.L. Yom, 1974. The identification of structural components in an unknown language.
Journal of Psycholinguistic Research 3, 261269.
Werker, J.F. and L. Polka, 1993. Developmental changes in speech perception: New challenges and new directions. Journal of
Phonetics 21, 83101.
Wexler, K. and R. Manzini, 1987. Parameters and learnability in binding theory. In: T. Roeper, E. Williams (eds.), Parameter
setting, 4176. Dordrecht: Reidel.
Whalen, D.H., A.G. Levitt and Q. Wang, 1991. Intonational differences between the reduplicative babbling of French- and
English-learning infants. Journal of Child Language 18, 501516.
Wilson, M.D., 1988. MRC psycholinguistic database: Machine-usable dictionary, Version 2.0. Behavior Research Methods,
Instrumentation and Computers 20, 610.
1. Introduction
A common theme in evolutionary biology concerns the distinction between inherited and specific adaptations. Both structures
may aid an organism, but the former derive from the organism's ancestry whereas the latter may have evolved to suit its specific
needs. Consider the duck-billed platypus, for example. These strangest of all mammals possess numerous anatomical
* Preparation of this paper was supported by National Institutes of Health Grant 1 R29 HD23385 to M.H. Kelly. Thanks
to Anne Cutler, Jacques Mehler, and Bob Rescorla for helpful comments on earlier versions of this paper.
structures that assist their aquatic lifestyle. Thus, waterproof fur helps to insulate the platypus from the cold waters in which it
normally swims and its famous bill is actually a highly-developed sense organ that is used in foraging. However, the history of
these two adaptations is quite different. Whereas the bill seems to be a specific adaptation of the platypus and monotreme
relatives, the fur, of course, is a general mammalian trait that the platypus inherits (see Gould, 1991, for further discussion). This
trait is general in two ways. First, many species share it and, second, it is relatively general purpose. Thus hair is not only
involved in insulation, but also in sensations of touch and the transduction of sound in the inner ear.
As Darwin emphasized in many of his works, the same evolutionary concepts that apply to anatomical structures can also
extend to thought and behavior. Thus, some of our own cognitive and perceptual abilities are inherited from our ancestors (and
hence it is no coincidence that we can learn something about, for instance, our visual system by studying the visual systems of
other animals they are branches from the same tree). Still other structures might be unique to the evolutionary history of human
beings. Language is, of course, the most obvious human duck-bill, and much ink has been spilled over the years in arguments
over whether this obvious intuition should really be believed. That controversy will be avoided here. Instead, we will focus on
the following general theme: Whatever the evolutionary history of the language faculty, language processing must involve
cognitive and perceptual abilities that are domain-general in nature. For an example, consider the parsing literature. Both
linguists (e.g., Frazier and Fodor 1978) and psychologists (e.g., Just and Carpenter 1992) have devoted considerable attention to
how listeners identify sentence constituents and their grammatical relations. Radically different viewpoints are often proposed
about how these tasks are accomplished. However, throughout all the controversy, one unifying point stands clear: The parser is
fundamentally a slave to a limited capacity short-term (or working) memory system. Thus, parsing preferences from Kimball's
(1973) 'fast phrase closure' to Frazier and Fodor's (1978) 'minimal attachment' are all justified by reference to constraints on
short-term memory capacity. Indeed, potential individual differences in parsing biases have recently been related to individual
differences in memory capacity (Just and Carpenter 1992).
In sum, then, a system that is shared by other animals, namely a limited capacity memory, is critically involved in a system
probably unique to human beings, namely language. Furthermore, an understanding of this ancestral memory system is crucial to
a full understanding of how linguistic competence
gets put to use in performance. Finally, since this ancestral system is shared with our non-human relatives, we could gain
insights into the human system through memory experiments with animals as well as experiments with human beings. We will
extend these claims here to another domain potentially shared with non-human animals. This domain is reflected in the old
adage that death and taxes are the only certainties in life. In less depressing terms, much of the information that animals learn
during their lives, and which must be used to guide behavior, is probabilistic in nature. Should the human being bet on the
Flyers to win the NHL championship now that they have signed the Michael Jordan of hockey in Eric Lindros? Is the breeding
season promising enough for a bird to raise two clutches of young rather than just one? Will the squirrel find more nuts foraging
by this tree or that one? Should a listener who hears 'The fruit flies ' assume that 'fruit' is a noun and 'flies' is a verb?
Though these questions vary in their subject matter and importance, they all nonetheless contain some level of uncertainty, and
the types of information that could be used to decide a course of action are probabilistic in nature. Historically, attention to
probabilities and their use has been either ignored or looked upon with aversion bordering on plain disgust. Thus, mathematical
treatments of probability are relatively recent developments, and were formulated in the dubious area of gambling. Even
physics, the paragon of scientific certainty, has not escaped from the clutches of probability, and the probabilistic component
inherent to quantum mechanics drove Einstein to exclaim that God could simply NOT be a closet gambler.
One possible reason for negative views of probability could be an implication that the world is full of mass confusion that
cannot lead to anything beyond ignorance. However, an animal's environment is not a homogeneous soup, although it rarely
provides 'sure things'. The basic fact of the matter is that the world is awash with stuff best described as 'tendencies', 'maybes',
'estimates', and 'generally-speakings'. Furthermore, these terms characterize a wide array of domains, from foraging to parsing,
and must be confronted by creatures from aardvarks to zebus. Finally, animals must often make rapid decisions while
minimizing the likelihood and severity of errors. These considerations lead to the following conclusion: Animals that have the
capacity to detect probabilistic patterns and exploit them will have an advantage over those that do not. Furthermore, confidence
in the solution to a problem should increase if multiple cues converge on that solution. Hence, one might expect to see
widespread sensitivity to multiple probabilistic information sources in the animal kingdom. Since many aspects of the
environment are characterized by probabilistic relations among variables, one would also expect this sensitivity to be relatively
domain-independent. Indeed, such sensitivity might permeate the activities of specialized abilities, including language. In the
remainder of this paper, we will attempt to justify these claims. We will begin by highlighting some phenomena in the area of
animal learning and cognition that support the view, perhaps surprising, that rather precise sensitivity to probabilistic
information is prevalent in many species. In fact, it seems to be a rather everyday, ho-hum ability. We will then turn to human
perception and cognition, and review some evidence in nonlinguistic domains suggesting that, like rats, human beings exploit
probabilistic information in a variety of tasks. Furthermore, even when sufficient information exists to apparently deduce the
solution to a problem, human beings still use additional, probabilistic information that might also be present. We will discuss
some of the possible advantages to this strategy. Finally, we will summarize evidence from the domain of language processing
that also supports widespread sensitivity to probabilistic relations among linguistic variables, as well as the types of tasks that
could be assisted by exploiting this information. The research here will suggest that, just as human memory is critically involved
in areas ranging from visual pattern recognition to parsing, so the ability to learn and exploit probabilistic information is
widespread in perception and cognition, including traits that might be highly specialized and species-specific.
purpose here is not, however, to instigate a general rapprochement between divorced parties. Rather, we will attempt to direct
attention to certain fundamental abilities that animals apparently must possess in order to behave the way they do in learning
experiments. These abilities can be characterized as a keen sensitivity to the rate at which certain probabilistic events occur in
the animal's environment.
2.1. Behavioral decisions based on detection of rate information
Consider the following situation: On each of a series of trials, a rat is placed in the long axis of a T-shaped maze. If the rat
traverses down the correct arm of the T, it will receive a morsel of food as a reward. The correct arm varies from trial to trial,
but the right arm is biased to be correct 70% of the time, with the left arm correct 30% of the time. Across a wide range of
studies, the results of this situation are clear: the rat will distribute its choices in accord with the particular probabilistic bias
established by the experimenter. In this case, the rat will travel down the right arm 70% of the time and the left arm 30% of the
time. If the probabilities change to, say, 8515 or 6040, then the rat's choices will adjust accordingly. The rat's behavior reveals
precise sensitivity to the rate of reward in the two arms (see Gallistel, 1990, for more extensive discussion).
Despite this sensitivity to rate information, the rat's behavior in this paradigm seems paradoxical. After all, in order to optimize
its food intake, the rat should always choose the side with the higher rate of return.1 Although this conclusion is true in the
laboratory, where the animal faces no competition for the food resource, it is not true in the wild. Suppose that two food patches
exist in a local habitat, with one of them being twice as rich as the other. If all of the animals went to the richer patch, then
selection pressures would favor animals who would exploit the food-poor, but competitorless patch. Evolutionary biologists
have argued that, in the long run, the most stable evolutionary strategy would involve dividing one's time between the two food
patches in accord with their rates of return. Investigations of the spatial distributions of animals while foraging indicates that this
strategy is indeed used in the natural environment (see Gallistel, 1990, for summary). Of course, as Gallistel emphasizes, this
strategy presumes the ability to detect the
1 For instance, suppose that the probability of reward on the right arm is 75%. In 100 trials, the rat on average would
receive a reward on 0.75(100)+0.25(0), or 75, of the trials if it always chose the right arm. If it chooses the arms in accord
with their reward probabilities, then the rat can expect reward on 0.75(75)+0.25(25), or only 62.5 of the trials.
rate of food return in different parts of the environment. Both laboratory studies and natural observations indicate that a range of
species, such as rats, pigeons, ducks, and fish possess this ability.
This sensitivity to probabilistic patterns reveals itself in a variety of behaviors besides the choice of where a food resource might
be located. Furthermore, the pervasiveness of this sensitivity across species is shown by the fact that different species produce
analogous results in the same task. Consider, for example, the speed with which an action is performed after a stimulus is
perceived. Gradual increases in the likelihood of a certain event leads to corresponding decreases in the time needed to react to
that event. A notable example in the field of language research is the inverse relationship between word frequency and the
reaction time needed to make judgments about a word. However, numerous events vary in their frequency of occurrence, and
animals generally become faster at reacting to more frequent events. Indeed, procedures have been developed to test human
beings and other animals in very similar circumstances, and the relationship between frequency of an event and speed of
response is virtually identical across the species groups (Pang et al. 1992). Such results clearly imply that (a) a learning
mechanism exists that permits adaptive adjustments to the likelihoods of different events, (b) the mechanism is domain-general
in that it appears for numerous events and behaviors, and (c) the mechanism is species-general, and so represents a basic aspect
of animal learning, including learning by human beings.
2.2. Catching the contingencies
In the famous learning experiments conducted by Ivan Pavlov, a dog was presented with a neutral stimulus, such as a tone,
paired with food. As a result of these pairings, the dog began to salivate when the tone was presented alone. In the technical
terms of the field, an unconditioned stimulus (US), the food, normally elicits an innate unconditioned response (UR), salivation.
By pairing the tone with the food, the former becomes a conditioned stimulus (CS) that produces a conditioned response (CR),
in this case salivation. Thousands of subsequent experiments, involving a huge assortment of stimuli, responses, and species
have verified that this type of learning is quite basic throughout the animal kingdom.
Extensive research has tried to understand the mechanisms that are responsible for learning in the Pavlovian paradigm. We will
focus here on one theme that has emerged from this research. This theme is, in many ways, reflected in
the description of Pavlovian conditioning with which we opened this section. This description, which is commonly used in
introductory psychology texts, assumes that learning depends on explicit pairings between an unconditioned stimulus and a
conditioned stimulus. Various predictions follow from this assumption. Thus, learning should increase with the number of CS-
US pairings, and learning should not occur in the absence of such pairings. However, as Rescorla (1988) has emphasized, this
assumption and its empirical consequences are untrue. Under certain conditions, even multiple pairings of a CS and US will not
cause the animal to assume that the CS predicts the US. On the other hand, learning can be achieved without a single CS-US
pairing. Thus, CS-US pairings are neither sufficient nor necessary to produce learning. These facts, and the conditions that
create them, have led to major revisions in learning theory over the past two decades, and indicate that animals have far more
sophisticated learning abilities than were previously imagined.
In a series of classic experiments, Rescorla (1968) demonstrated that animals are sensitive not to the raw frequency of CS-US
pairings, but rather the general contingency between the variables. In his studies, rats were first trained to press a bar for food.
In the next phase, the animals received a number of pairings between a tone and a mild electric shock. To measure the extent of
learning, the rats were then permitted to press the food bar once again. During this period, the tone was occasionally presented,
and the animal's behavior monitored. If the animals learned that the tone signals shock, they would decrease their levels of bar
pressing and hold a motionless posture. Slight levels of learning would produce slight decreases in the rates of bar pressing
whereas higher levels of learning would produce correspondingly greater suppression of the bar presses. The critical
manipulation in the experiments concerned the likelihood that the shocks in the second phase occurred in the absence of the
tone. That is, in addition to the explicit pairings between the tone and shock, a number of other shocks could be administered
without a corresponding tone. Traditional learning theory would predict that these additional unpaired shocks would be
irrelevant to the learning process since the number of US-CS pairs would be unaltered. However, instead of responding just to
explicit pairings of the CS and US, suppose that the rat is sensitive to the appearance of the US in the absence of the CS. If the
animal can detect these general contingencies between the shock and tone, then the signaling power of the tone should decrease
as higher rates of independent shocks are encountered. Indeed, a certain point should be reached at which the shock and the tone
are statistically independent.
That is, given the overall base rate of shocks, one would expect a certain number of tone-shock pairs JUST BY CHANCE.
Given that such encounters are mere coincidences, one should make no conclusion about the relevance of the tone to the shock,
and hence the tone should be ignored in the final test phase. Rescorla's results strongly supported the contingency interpretation
of learning. As the predictive power of the CS dropped, so did the effects of the tone on the rats' bar pressing. When the tone
and the shock reached statistical independence, the presence of the tone had no effect on behavior.
In sum, Rescorla's experiments refute the traditional belief, and one still generally held outside of the animal learning field, that
the number of CS-US pairings is the crucial variable in learning. Rather, animals are sensitive to the statistical relationship
between the variables. If no statistical relationship exists, then even the presence of numerous CS-US pairings will not cause the
animal to assume that the CS predicts the US. Furthermore, statistical relationships can exist and be learned even in the absence
of any CS-US pairings. For instance, suppose Rescorla's experiment is repeated, but with all of the scheduled tone-shock pairs
removed. Thus, the rat would receive a number of shocks, but none accompanied by the tone. Since no CS-US pairs are
presented, the stereotyped view of Pavlovian conditioning would predict that the animal would learn no relation between these
variables. However, the rats do in fact learn something useful in this situation, namely that the presence of the tone predicts the
absence of the shock. Hence, the rats treat the tone as a safety signal and actually show less fear reactions when the tone is
present than when it is absent (Rescorla 1969).
So far, we have discussed some evidence that animals are sensitive to the probabilistic relationship between a single CS and a
US. However, numerous experiments demonstrate that animals can attend to multiple predictors of an event. For example,
suppose that an animal has learned that a tone and a light both signal shock. If the stimuli are now presented together, then, up
to a certain point, the animal will exhibit a stronger fear reaction to the joint than to the separate presentation of the stimuli.2
Thus, an animal's behavior
2 The 'up to a certain point' phrase turns out to be crucial. If two stimuli are strongly predictive of another, then repeated
joint presentations, even if coupled with US presentation, can actually reduce the reaction to individual presentations of
the CSs. As Lieberman (1990) points out, this apparently counterintuitive phenomenon makes sense for the following
reason. One might expect that joint presentation of the CSs would signal an increased probability of the US or strength of
the US. If these expectations are not confirmed, then they must be adjusted downward accordingly. These downward
adjustments can then be observed in reactions to the separate presentations of the CSs.
in a particular situation is best predicted by what it has learned about the set of informative cues present in its environment
rather than any single cue examined in isolation. Indeed, what an animal learns about a novel stimulus will depend on what it
has learned about other stimuli present at the same time. Consider, for example, the following situation. An animal learns that a
tone is associated with a mild shock. After this relationship is established, the animal now is presented with two stimuli the
original tone and a novel light. No shocks are experienced during this phrase of compound stimulus presentation. Later, the light
can be presented alone, and the animal's behavior monitored to determine what, if anything, has been learned about the light. A
number of possibilities are conceivable. Since the light was never experienced with the shock, one might expect that the former
would later be perceived as a neutral stimulus. Alternatively, some of the fear induced by the tone might spread to the light.
However, the animal actually treats the light as a safety signal, and the strength of the safety reaction varies with the strength of
the fear associated with the tone (see Rescorla and Wagner, 1972, for summary). In learning about the impact of the light, the
animal apparently takes into consideration its knowledge about the tone. Given that the latter normally signals shock, the
absence of the latter becomes associated with the novel light. As Rescorla and Wagner (1972: 73) state, 'The effect of a
reinforcement or nonreinforcement in changing the associative strength of a stimulus depends upon the existing associative
strength, not only of that stimulus, but also of other stimuli concurrently present'.
In sum, then, numerous experiments in animal learning indicate (a) widespread sensitivity to probabilistic relations between
events and (b) the use of multiple cues in identifying the probable course of imminent events, and the types of behaviors needed
to deal with them. Nonetheless, we must emphasize that even though animals behave in accordance with statistical patterns in
their environment, this behavior does not lead to strong conclusions about the underlying mechanisms that support it. Animals
might reveal sensitivity to statistical patterns without actually representing or manipulating probabilistic data. For example, each
learning trial could lead to the adjustment of the association strength between a CS and a US. This association strength would
be increased when the US appears with the CS and decreased when the US appears without the CS. Over time, the probability
of a CS-US pairing would become more and more correlated with the association strength, though the probability itself would
not be represented by the animal. Thus variations in the probabilities of different events could produce behavior that is attuned to
those probabilities without entailing that these values are
explicitly represented or that the animal even remembers each individual learning trial. In fact, the most influential model of
Pavlovian conditioning over the past two decades does reproduce apparent sensitivity to statistical patterns by using trial by trial
updates of association strengths between a US and potential CSs (Rescorla and Wagner 1972). But regardless of the exact
nature of the mechanisms that underlie it, animals clearly have evolved some form of sensitivity to probabilistic patterns in their
environment, and this sensitivity is finely tuned to the actual probabilities of various events.
2.3. Some implications for language learning
Let us translate some of these animal learning studies into a more familiar setting, though with controversial implications.
During language acquisition, children must learn the subcategorization privileges of the verbs in their language. Thus, English
children must learn that 'give' can appear in both prepositional (e.g., 'John gave the money to the charity') and double object
dative structures ('John gave the charity the money'), but that 'donate' can only appear in the prepositional form. How do
children learn these relationships? The 'give' case seems straightforward in that the child will observe the verb in both structures
and conclude that 'give' is legal in those frames. Thus, the child makes conclusions based on positive evidence. However, cases
like 'donate' seem to create more problems for language acquisition. One possibility is that general principles of learning could
apply to the linguistic domain. Thus, if children get explicit reinforcement that 'donate' is ungrammatical in the double object
form, then such negative evidence could help them to learn syntactic restrictions on 'donate'. However, numerous studies have
documented that children do not get such explicit negative evidence (see Pinker, 1989, for review). Given the lack of
reinforcement coupled with successful learning, the dominant conclusion has been that general principles of learning will not
apply to language acquisition.
This argument rests on incorrect assumptions about current theoretical views in animal learning, or the experiments on which
those views are based. As the preceding discussion has emphasized, animals can readily learn not only that tones predict the
presence of shock, but also that tones can predict the absence of shock. They need no additional feedback for justification or
refutation of their conclusions. If learning is considered, at least in part, as the detection of statistical regularities in the
environment, then children might learn that 'donate' systematically predicts the absence of double-object structures in the same
manner as rats can learn that tones systematically predict
the absence of shocks. Just as a species-general memory system can be critically involved in language parsing, so species-
general learning abilities, in this case the ability to detect the statistical texture of the environment, could be involved in
language learning.
References to statistical sensitivity have recently grown in critiques of the negative evidence problem (e.g., Liberman 1991).
However, we believe that prima facie incredulity over the potential of this approach, or even the mere existence of the statistical
sensitivity, would decline if animal learning studies such as those reviewed above were more seriously considered in discussions
of language learning. The evidence from the animal literature is now quite conclusive in that species after species has been
found not only sensitive to probabilistic information, but even to the precise quantitative nature of this information (bearing in
mind once again that the exact mechanisms responsible for this sensitivity must be identified). Furthermore, they show such
sensitivity in a variety of learning situations that use quite arbitrary stimuli, suggesting that the ability is domain-general, and
not tied to information that the animal is biologically prepared to learn.
2.4. Objections
Numerous objections could, of course, be raised against the conjectures we have just made. We do not pretend to have definitive
answers to these problems, but the problems themselves have not been worked out in sufficient detail to determine their scope or
their resistance to repair.
2.4.1. How to define what's missing?
For a language learner to determine that a verb has not appeared in a structure, and that this absence is statistically meaningful,
the nature of the expected structure must be defined. As Pinker (1989) has pointed out, the structures cannot be individual
sentence tokens, because there are an infinite number of such structures, and the odds that one of them would be encountered in
the corpus of speech heard by the learner is vanishingly small. Thus, the items that enter into the learner's statistical tabulations
must be more abstract than individual sentences. Hence reference to the statistical structure of the environment will not be
sufficient to characterize learning since the nature of the environment is partly defined by how the learner categorizes items
within it. Furthermore, these categories may in fact be intrinsically linguistic in nature. Hence we are back to a basic problem in
language learning: What notions about the nature of language do children
bring to the task at hand? Reference to statistical regularities has not eliminated this problem at all.
We believe this characterization of the learning task is correct, but note that it is not specific to language learning. The use of
statistical regularities by animals also depends on abstracting away from individual tokens. Furthermore, the categories involved
in this abstraction process may depend on the learning domain in question. For example, suppose a rat learns that a tone signals
the absence of shock. Each instance of a tone is a unique event, and yet the animal must abstract away from this uniqueness to
similarities across the various tokens. What counts as similar may depend on the nature of the auditory system just as what
counts as similar in language learning may depend on the nature of a species-specific linguistic system. As we emphasized at the
beginning of this paper, the use of some general purpose abilities (e.g., memory) in a particular task (e.g., parsing) does not
preclude the important role of domain-specific factors. A complete account of the task will require a specification of the general
and the specific, along with how they interact.
2.4.2. Outcome similarity despite input variability
Given that language learning will depend in part on the statistical structure of the input, why is the outcome of the learning so
robust in the sense that all normal children attain the same basic level of competence despite (probably) wide variations in what
they have heard? One answer could focus on the domain-specific language faculty, which makes certain assumptions about the
nature of the language (in statistical terms, the population) from which the sample was drawn (e.g., all of the sentences in the
language are either right-branching or left-branching). We would also suggest a statistical answer as an accompaniment. In
particular, the learning is robust because of the robust nature of statistical sampling. One of the fundamental theorems of
statistics is that as the size of a sample increases, it reflects the actual nature of the population from which it was drawn with
greater accuracy. Hence, one can make reliable conclusions about the population (which is too large to be examined in its
entirety) from a relatively small sample. Although no two samples from such a population will be alike, they demonstrably will
converge on the same conclusions given a sufficiently large sample size.3 Note
3 Here is an easy way to demonstrate this statistical principle: Instruct your computer to generate samples of random
numbers between 0 and 100. Start off with two samples, each containing 5 cases. The means of these samples will
probably differ greatly. Now continue generating sample pairs, but increase the number of cases in each pair. As the
number of cases gets larger, the means of the samples will move closer together, converging on a value of 50. Yet the
samples will obviously differ in their exact sequence of numbers.
that such an account would predict that the greatest variability in linguistic judgments should occur for relatively rare
constructions, where the idiosyncrasies of the sample will have a much greater impact on learning.
2.4.3. Going beyond idiosyncrasies
Instead of relying on some general learning mechanism, such as one that involves feedback regarding ungrammatical sentences,
children might be able to exploit regularities about the sentence frames that verbs could take (Pinker 1989). For example, the
absence of 'donate' from double-object datives is not an isolated fact. Rather, polysyllabic English verbs and verbs with certain
stress patterns are, in general, barred from double-object dative structures. If children could learn such regularities, they could
then generalize them to new instances. In fact, Gropen et al. (1989) have found that children learning English were reluctant to
generalize polysyllabic nonsense verbs from a propositional dative to a double-object dative, though they were more willing to
do so for monosyllabic nonsense verbs. This effect of syllable number did not extend to other syntactic frames, suggesting that
the children had learned about the biases concerning the English dative.
Although Pinker (1989) casts the general learning and regularities approaches as conflicting alternatives, we see them as quite
similar. English-speaking children may in fact detect regularities in the structure of English by being sensitive to the systematic
absence of certain verbs from double-object datives. They would, of course, have to recognize dative structures despite
variations in their lexical content, and they would have to recognize that certain phonological properties of verbs are correlated
with the dative structures. As noted above, some of these classification abilities may derive from characteristics of the child's
native linguistic capacity. However, granted that these classification abilities exist, the results found by Gropen et al. (1989) are
quite consistent with the claim that children are sensitive to statistical patterns in their linguistic environment. The children in
the Gropen et al. experiments were also probabilistic in their responses, rather than discrete. That is, although they were biased
against generalizing polysyllabic words to double-object datives, they nonetheless did so. This pattern follows from the fact that
the English restriction is also probabilistic in nature (e.g., 'offer' can appear in the double-object dative). Given a probabilistic
input, a probabilistic output is expected. This leads to the next point.
that we and other animals have is probably obtained through numerous encounters with individual objects or events. Hence, we
cannot use experiments in decision making as a strong basis for conclusions about human sensitivity to statistical patterns,
especially since studies have found differences in the way people treat the same data when presented in summary format or as a
series of events (e.g., Wasserman and Shaklee 1984). Rather, we must examine direct investigations of human sensitivity to
frequency information, and the potential roles of such knowledge in solving problems in language and other domains. We turn to
these issues in the following sections.
lethality of different events (Lichenstein et al. 1978), and even fast-food restaurant chains (Shedler et al. 1985). This frequency
sensitivity appears not only across domains, but also across tasks. Whereas some studies explicitly ask subjects to rate items in
terms of frequency, others use more indirect probes. For instance, in numerous experiments, Zajonc (1968) has found that the
more frequently something is experienced, the more it is liked (even though the subjects have no control over these
experiences). Hence frequency sensitivity can be revealed by subjective evaluations.
As with fast-food restaurants, numerous experiments have shown that human beings are sensitive to the relative frequencies of
many linguistic events, such as words (Shapiro 1969), syllables (Rubin 1974), and letters (Attneave 1953).4 Indeed, this
sensitivity to frequency is quite strong. For example, Attneave asked subjects to estimate the frequency of English letters per
thousand tokens. He found that fully three-quarters of the variance in the subjects' responses could be accounted for by the
actual frequencies of the letters in English text. Using different techniques and word frequency as the object of study, Shapiro
(1969) found that over 85% of the variance in judgments of word frequency could be accounted for by measures of actual
frequency.5
In fact, this frequency sensitivity might be even stronger than these experiments indicate, given that our measures of actual
frequency are subject to typical errors of sampling. These errors will be particularly apparent for low frequency items. For
example, Gernsbacher (1984) points out that the words 'boxer', 'icing', and 'joker' have the same frequencies (namely, one per
million) as 'Loire', 'gnome', and 'assay' in the two most widely used frequency norms for English. However, there is clearly no
doubt that the former words are 'really' more commonly encountered, and that a sufficiently large sample of English would
confirm these intuitions. In fact, the validity of the
4 These variables are, of course, correlated in that relatively common syllables tend to contain relatively frequent letters
and phonemes. However, the variables can be statistically separated. Thus, Rubin (1974) found that English speakers are
sensitive to syllable frequency independently of phoneme frequency.
5 The strength of these correlations can, of course, be influenced by the range of materials used. If only the words 'the',
'house', and 'zinc' were judged, the correlation between rated and actual frequency would obviously (and trivially) be
perfect. Since Attneave used all the letters of the English alphabet, the results from his experiment on letter frequency
cannot be criticized as artifacts arising from the types of letters selected. Shapiro (1969) used 60 words for his adult
subjects, and these ranged from very high through intermediate to low frequencies. Using 455 words in a lexical decision
task, Gernsbacher (1984) found that rated familiarity accounted for more than 71% of the variance in reaction time.
intuitions in this case suggests that human ratings of frequency might in fact be more useful in experiments than listings in word
frequency norms, and Gernsbacher (1984) found that subjects agreed quite well on their ratings of word familiarity, and that
these ratings were better predictors of reaction times in a lexical decision task than supposed measures of 'actual' frequency.
Given that human beings are sensitive to frequency, sampling considerations might lead one to expect that their frequency
judgments would be more accurate than current frequency norms. After all, the Francis and Kucera frequency norms are based
on an analysis of one million word tokens, which is a relatively small sample. Given conservative assumptions (speech/reading
rates of 150 words per minute along with an eight-hour sample per day), a typical person will be exposed to a million-word
sample in about two weeks.
In sum, human beings are sensitive to frequency information in a variety of domains. This domain-independence suggests that
this sensitivity is a general ability and hence might be exploited in a variety of domain-specific tasks. Nonetheless, we must
emphasize that human beings are far from infallible in their frequency judgments. For example, they tend to overestimate the
frequency of rare events and underestimate the frequency of common events (see Baron, 1988, for a summary). In addition,
frequency estimates are based at least in part on factors other than frequency. For instance, one category is often considered to
be larger than another if instances of the former class are more easily retrieved from memory. Now, these retrieval effects might
be correlated with actual frequency, and so be reasonably accurate. However, they also might reflect aspects of memory
organization itself. Thus, people generally and falsely believe that more English words begin with 'r' than have 'r' as the third
letter. These intuitions probably reflect the fact that the mental lexicon is organized by first rather than third letter and/or
phoneme, and hence exemplars of the former class are more easily retrieved from memory. however, despite these
inconsistencies, the evidence as a whole supports the view that human beings, like other animals, can learn probabilistic patterns
in their environment. We will now turn to a sample of some areas where multiple, probabilistic information sources are available
and might be used to solve particular problems in perception and cognition.
3.2. Depth perception
One of the fundamental questions in the history of visual perception concerns how we perceive a three-dimensional world based
on a two-dimensional image. Centuries of research on this question have focused on two
issues, one optical and the other psychological. First, what types of optical patterns exist in the two-dimensional image that are
correlated with the three-dimensional world that reflected the image to the eye? Second, which of these information sources are
actually used in visual perception, and how are they weighed? The second question becomes particularly important given that
numerous potential cues to depth exist, such as those listed in table 1.
Table 1
Some variables that are correlated with distance
Variable Description
Binocular disparity Differences in the image of an objected projected to the two eyes
Motion parallax The relative velocity of images across the retina
Accommodation Changes in the shape of the lens
Relative size The size of an image on the retina
Height in picture
plane The location of an image along the vertical axis
The size, shape, and density of texture elements like pebbles, tiles
Texture gradients on a floor
Occlusion An object blocking the view of a more distant object
Of course, these cues vary in their range and predictive power. Thus accommodation is not effective beyond 10 feet, and height
in the projection place can easily be violated (e.g., a person's head will be higher in the projection place than the feet, yet is not
further away from an observer). Nonetheless, many cues are simultaneously available, and hence an important issue concerns
how the various cues are weighed and whether an available cue might even be used. The situation could be complicated further
because the weights given to a set of cues might vary with context. Consider, for example, the case of textural cues to depth.
Many surfaces consist of textured elements, such as pebbles along a path or tiles in a room, that have roughly the same size and
shape and are distributed randomly. However, because the elements can systematically vary in their distance from an observer,
their projected images will vary also. As distance from the observer increases, the images of the texture elements will decrease
in size, compress in shape, and become more densely packed. Investigations of these variables have found that human observers
(a) are sensitive to all three dimensions, but (b) respond to some dimensions more strongly than others, and (c) change their
weightings with context (Cutting and Millard 1984). In particular, for flat horizontal surfaces, the size gradient had the greatest
effect in creating impressions of depth, followed by the density and shape gradients. However, for curved surfaces, the shape
gradient dominated over the other variables.
Given that cues vary in their effectiveness, one might consider the extreme case, and inquire whether some cues, though present,
might receive a weight of zero, and hence be totally ignored. Such total neglect might appear most clearly when another more
powerful cue is present. In such a situation, visual processing might engage in what Bruno and Cutting (1988: 162) call cue
'selection', in which 'observers use the single most effective available [information] source and disregard the others'. For
instance, a static, pictorial cue to depth, like height in the image plane, might become ignored when motion parallax is present,
given that the former is less consistently predictive of distance. However, using a variety of experimental methods, Bruno and
Cutting examined the impact of motion parallax, height in the image plane, occlusion, and relative size on distance judgments,
and found that all of these variables contributed significantly to those judgments. Such results indicate that human beings use a
variety of cues in three-dimensional perception, even cues that are not completely reliable.
3.3. Categorization
When subjects are asked to judge the truthfulness of category membership statements like 'An x is a Y', where Y is a category
term and x is a possible member, their time to respond consistently varies depending on x (cf. Smith and Medin, 1981, for
summary). For example, subjects take longer to respond 'true' to 'An ostrich is a bird' than to 'A robin is a bird'. These reaction
time patterns can be predicted by examining the extent to which an instance possesses features typically associated with the
category in question (e.g., Rosch and Mervis 1975). In the case of birds, these might be flight capacity, relatively small size,
song, etc. This interpretation of the reaction time phenomena indicates that numerous information sources are used in the process
of categorization. Indeed, some have claimed that the reaction time (and other) data indicate that membership in categories is
not discrete, but rather continuous, or that human beings believe that category membership is usually fuzzy. However,
subsequent research has cast serious doubt on these conclusions. For example, the status of an integer as even or odd is clearly
discrete. A simple, straightforward definition suffices to unambiguously classify any integer as even or odd. If the classification
decision data reflect the fact that human beings typically lack knowledge of category definitions and/ or believe that category
membership is generally fuzzy, then the decision data should look rather different if subjects have to classify integers as even or
odd. If subjects know the mathematical definition and explicitly admit that
category membership is completely and unambiguously determined by that definition, then all integers should be classified as
even or odd with equal speed, so long as confounding variables like word frequency are controlled. However, Armstrong et al.
(1983) found that some numbers are in fact classified as even or odd faster than other numbers. Indeed, for a range of categories
that had clear, agreed-upon criteria for membership, Armstrong et al. found that some members were consistently classified
faster than others. Given these results, Armstrong et al. argue that, whatever the reaction time data indicate, they do not imply
that items are members of categories to varying degrees, or that people believe in graded category membership.
We agree completely with the rationale for the Armstrong et al. experiments and their conclusions. However, we nonetheless
believe that the reaction time data reflect certain aspects of the process of categorization. In particular, even when a criterion
exists that is necessary and sufficient for category membership, people cannot help but weigh other factors that are correlated
with membership, though in the final analysis such features are irrelevant. For instance, prime numbers are almost always odd (2
being the only counterexample). This feature is, however, neither necessary nor sufficient for being classified as odd. Still, the
feature might be weighed in classification, with prime numbers being classified as odd faster than non-prime. Consideration of
such other factors might be beneficial by speeding up decision times while keeping error rates low. This advantage might also be
illustrated in the following phenomenon.
3.4. The word superiority effect
Suppose subjects view a clearly printed letter 'k' either by itself or in a word, and they must identify whether a 'k' or some other
letter was presented. Numerous experiments using variations on this task have repeatedly found that subjects identify the target
letter more accurately if it appears in a word than if it appears by itself (e.g., Reicher 1969, Johnston 1978). Thus, 'k' would be
identified more accurately in the context of 'work' than alone (or in a string of letters that do not form a word). Why is the word
context advantageous even though the letter in isolation is clearly printed? One possibility concerns the high speeds with which
skilled readers process orthographic material. Given that the context in which a letter appears is partly predictive of its identity,
observers could bypass complete perceptual encoding of a letter, and use partial information about a letter's appearance plus
partial information about the identity of surrounding letters to speed
overall processing while minimizing error rates. Thus, observers could circumvent the usual tradeoff between speed and
accuracy by using multiple sources of information about letter identity. Of course, this additional information is probabilistic in
nature, given that words other than 'work' begin with 'wor,' such as 'word' and 'worm'. Nonetheless, human beings are apparently
sensitive to this probabilistic information, and use it to increase processing speed.
4. Probabilistic information and language processing
In this section, we will review some evidence that multiple, probabilistic information sources are used to solve various tasks in
language processing. We will focus on phoneme perception, word boundary identification, and the assignment of words to
grammatical categories. Other potential examples also exist, such as word identification in sentence contexts and assignments of
words to agent and patient roles. See Massaro (1991) for discussion of these and other cases, along with mathematical models of
some of these phenomena.
4.1. Speech perception
As in the case of visual depth perception, numerous acoustic cues for phoneme identification exist. Indeed, Lisker (1978) lists
fully sixteen variables that could be used by listeners to determine whether a /b/ or /p/ is present in 'ra__id'. Furthermore,
numerous experiments have shown that listeners weigh multiple variables in speech perception. Many of these experiments
exploit the possibility of trading relationships between different acoustic cues for phoneme categories. In particular, suppose that
the values along two acoustic dimensions could be used to distinguish between two phonemes A and B. If these two dimensions
are in fact weighed by the listener, then shifts in the value of one dimension toward the B category should be offset by
compensatory changes toward the A category in the value of the other dimension. This paradigm has been used to document the
existence of a variety of trading relationships. For example, both voice onset time (VOT) and relative aspiration are potential
cues for distinguishing voiced from voiceless stop consonants. With amplitude of aspiration held constant, a particular VOT
value can be identified as the approximate boundary between a voiceless and voiced stop consonant. However, if the aspiration
amplitude is then raised, thus
increasing the evidence for an unvoiced consonant, the VOT boundary between the two phoneme classes will shift toward
shorter VOTs. Hence, an aspiration value that indicates 'unvoiced' can be offset by a suitable VOT indicating 'voiced'. Such
results indicate that both VOT and aspiration are weighed in distinguishing voiced from voiceless consonants (Repp 1979).
Similar trading relations have been found for formant transition duration and the duration of the following vowel, which are
used to distinguish /b/ from /w/ (Miller and Liberman 1979), silence duration and F1 onset frequency, which are involved in the
distinction between 'say' and 'stay' (Best et al. 1981), and many other contrasts (see Repp, 1982, for summary). Furthermore
experiments indicate that infants as well as adults weigh multiple variables in speech categorization (Miller and Eimas 1983).
The huge mass of this speech perception evidence leads to the conclusion that 'listeners will make use of any cue for a given
phonetic distinction' (Repp and Liberman 1987: 98). Even further, listeners will combine multiple cues to identify phonetic
segments.
The search for multiple information sources in speech perception has been driven in large part by the failure to find invariant
relationships between acoustic structure and phoneme categorization. If such invariants did exist, then they would presumably be
sufficient in themselves to distinguish between phonemes. Other, probabilistic cues, though present, might then be considered
superfluous. One possible invariant that has been identified and explored over the past decade involves the spectral shape of a
stop consonant release burst. Blumstein and Stevens (1979) have argued that place of articulation distinctions among stop
consonants are invariantly signaled by this variable. For example, alveolar stops exhibit a gradual rise in amplitude as frequency
increases, whereas labial stops show a falling or flat amplitude pattern. Experiments have, in fact, shown that listeners are
sensitive to this information (Blumstein and Stevens 1980). However, even if this information is invariant, its presence
apparently does not eliminate the effects of other, probabilistic cues for stop consonant identity. Indeed, when the invariant and
probabilistic information conflict, the latter dominates (Walley and Carrell 1983). These results are analogous to the literature in
visual depth perception, in which probabilistic cues to depth are exploited even if more powerful cues are also present. They
also emphasize the danger of assuming that the measured strength of a cue will map directly onto cue weights in perception.
4.2. Identification of word boundaries
Identifying word boundaries in continuous speech is one of the major
problems in speech perception, language acquisition, and the development of speech recognition devices. In contrast to our
intuitions regarding our native language, invariant cues to word segmentation do not appear to exist. Hence, our impressions of
an unfamiliar language might better capture the true state of affairs in the sound stream: Words seem to be tightly knit together,
with no obvious seams that can be used to tease them apart. In fact, errors in word segmentation occur in learning and listening
to one's native language, testifying to the difficulties involved in this task. Segmentation problems are well-attested in the
acquisition literature (e.g., Gleitman et al. 1988), and uncertainties over whether 'a napron' or 'an apron' was spoken may have
caused the former to have lost its initial /n/ over the course of English history.
Despite its clear difficulties, listeners nonetheless become quite skilled at word segmentation, and numerous researchers have
tried to identify the types of cues listeners can and do use to solve this problem. These investigations have uncovered cues that
exist at various levels of language structure, from constraints on phoneme sequences to patterns in the prosody. None of these
cues can guarantee success in word segmentation, either alone or in concert, but their joint effects may conspire to make word
segmentation by and large successful.
4.2.1. Phoneme sequences
Languages often have restrictions on phonemes and/or phoneme sequences at syllable, morpheme, and word boundaries. Some
of these restrictions are strong constraints, such as the impossibility of obstruent + nasal sequences within English syllables.
However, large-scale corpora analyses might also reveal probabilistic relationships between the distributions of phonemes and
various boundary types. Speakers do appear to have some knowledge of such probabilistic relations. Thus, Cutler et al. (1987)
have shown that English speakers have learned that English consonant-vowel sequences are more likely to follow a CVCV than
a CVCC pattern. Experiments on human sensitivity to frequency information have found that people are not only sensitive to the
frequencies of individual letters (Attneave 1953) and syllables (Rubin 1974), but also to letter combinations (Underwood 1983).
Given that letter combinations will be correlated with phoneme combinations, such data indicate that speakers also have
knowledge of the relative frequencies of phoneme sequences. However, experiments need to be performed to see if speakers are
sensitive not just to the overall frequency of phoneme sequences, but also to significant interactions between those frequencies
and syllable, morpheme, and word boundaries.
addition, / / was significantly easier than /In/ on both the speed and accuracy measures. The / / and /t / conditions were not
significantly different. We are currently planning a second experiment involving /t /, / /, and /In/. The same one-word/two-
word task will be used, but in this case, the target syllables will actually be part of a word as in 'tomorrow', 'abandon' and
'infection'. The correct answer here is 'one word', but the predictions are exactly the opposite of those made for the same
syllables in the 'two word' context. Now, listeners should be fastest and make the fewest errors with /In/ items and be slowest
and make the most errors with the /t /items.
4.2.3. Prosodic structure
Cutler and her colleagues have recently argued that the distinction between strong and weak syllables, which is a major
characteristic of English prosody, can provide powerful cues to word segmentation. Whereas strong syllables have a full vowel,
weak syllables have a reduced vowel, which is typically though not exclusively realized as a schwa. In extensive analyses of
English speech corpora, Cutler and Carter (1987) found that over 70% of strong syllables coincided with a word boundary. Our
own unpublished analysis of parental speech to children between 12 and 25 months of age is even more strongly marked. Across
the 14 mothers in our sample, strong syllables corresponded with word boundaries 95% of the time, with 93% being the
LOWEST value in the corpus.6 Hence, a productive word segmentation strategy for English would divide an utterance at the
beginning of each strong syllable, and submit the resulting units to a lexical search. In general, these units will match words in
the lexicon, and the failures can be subjected to some type of re-analysis.
Current evidence indicates that listeners do, in fact, use some version of this strategy. First, Cutler and Butterfield (1992)
predicted that when listeners make word segmentation errors, they should remove weak syllables from the beginnings of words
and attach them to the ends of words. Analyses of
6 Of course, the mothers are not deliberately making their speech more predictive of word boundaries so as to assist their
children in acquiring English. Rather, parental speech contains a strikingly high percentage of words that refer to concrete
objects and easily perceivable events, for the good reason that children are more interested in these entities rather than
more abstract themes like religion or politics. Given that the vocabulary of everyday objects contains a high proportion of
monosyllabic words and is drawn from a core Germanic lexicon that is characterized by polysyllabic words beginning
with strong syllables, a greater correspondence between strong syllables and word boundaries will appear in parental
speech than in speech between adults.
naturally-occurring and experimentally-induced segmentation errors supported these hypotheses. For example, a listener
misheard 'bought a Mercedes' as 'Mortimer Sadies', which involved detaching the weak initial syllable of 'Mercedes' and
appending it along with the weak 'a' to the strong first syllable. (Of course, some phoneme misperceptions occurred as well.)
Second, Cutler and Norris (1988) found that listeners took longer to identify the real word 'mint' in 'mintayf' than in 'mintef'.
Cutler and Norris argued that since 'mintayf' contains two strong syllables, listeners would initially segment the syllables into
separate word-units after the /n/. As a result, the listeners would have to overrule the initial word boundary and reunite the /t/
with the preceding syllable. In 'mintef', on the other hand, no word boundary would be inserted between the /n/ and /t/ since the
final syllable is weak. Thus, the subjects would not have to rectify incorrect word boundaries in this case, and their reaction time
would be faster. Finally, although Cutler and Carter's corpus analysis indicated that a strong syllable generally marked a word
boundary, numerous segmentation errors would still be made with a simple heuristic that always and only placed word
boundaries before strong syllables. However, Cutler and Carter point out that most of the errors in this case would involve
closed-class items as these are the principal words that violate the strong syllable heuristic. Indeed, they systematically violate
the heuristic because word boundaries generally appear before weak syllables of closed class words but strong syllables of open
class words. If English speakers have learned this pattern, and supplemented their word segmentation strategies with it, they
could further reduce the chances of a mis-segmentations. In fact, segmentation errors themselves provide some evidence that
English speakers have learned these prosodic patterns. Although Cutler and Butterworth (1982) found that segmentation errors
tended to place word boundaries before strong syllables, some did in fact occur before weak syllables. However, most of these
cases posited a closed-class word immediately after the boundary.
4.3. Parsing
In determining the grammatical structure of a sentence, listeners and readers are often faced with local ambiguities. For
example, in a sentence beginning with 'The judge knew the law ', the final noun phrase could be the direct object of 'knew' or
the subject of a complement clause. In attempting to understand how such ambiguities are dealt with, two major questions have
been intensely investigated: What information is brought to
bear on the ambiguity, and when is that information available? Over the past two decades, the dominant answers have claimed
that only a restricted range of information contributes to initial parsing decisions, with other classes of information only coming
into play later to rectify errors in the first parse (e.g., Frazier and Fodor 1978, Frazier and Rayner 1982, Ferreira and Clifton
1986, Ferreira and Henderson 1990). In particular, preferences for certain phrase structure geometries are used to guide initial
parsing, such as a preference to create the simplest phrase structure tree consistent with the preceding input. These preferences
are reputedly blind to other types of information, such as the identity of the verb in the sentence.
The impenetrability of certain information classes to parsing mechanisms creates very interesting predictions. Consider, for
example, the sentences 'The student forgot the solution was in the back of the book' and 'The student hoped the solution was in
the back of the book'. In both sentences, 'the solution' is the subject of a complement clause. However, a complement clause
interpretation would create a more complex phrase structure tree than an interpretation than initially categorized 'the solution' as
the direct object of the respective verbs 'forgot' and 'hoped'. Hence, upon reaching 'the solution', a reader might opt initially for
the simpler direct object parse, which would have to be rejected when later disconfirming information is encountered. Still, one
might entertain the possibility that the identity of the verbs themselves could affect these initial parsing biases. Although 'forgot'
can be followed by an NP object or an object clause, 'hoped' cannot appear with the former structure. If such facts about verb
argument types can influence the earliest stages of parsing, then one would predict that readers would not posit a direct object
interpretation for 'hoped', and so would not need to reject such a hypothesis when subsequent portions of the sentence are read.
One would therefore predict that reading times for the disambiguating areas of the sentence with 'hoped' would be faster than
those for the same areas of the sentence with 'forgot'. This prediction and analogous ones using other dependent measures and
experimental procedures were supported in a set of studies by Trueswell et al. (in press b). The results could not be attributed to
generally faster processing of 'forgot' than 'hoped' because the reading time differences between the verbs were significantly
reduced in control sentences containing the complementizer 'that', which made the sentences unambiguous. Although some
studies have been reported that did not find effects of verb subcategorization preferences on parsing (e.g., Ferreira and
Henderson 1990), Trueswell et al. argue that these studies have a number of methodological flaws. For instance, the verbs used
by Ferreira and Henderson (1990)
did not differ very much in their preference for noun phrase versus sentential objects. Consequently, their methods were not
sensitive enough to detect differences.
In sum, then, the experiments conducted by Trueswell et al. (in press b) indicate that subcategorization information about a verb
has immediate effects on parsing (see also Shapiro et al. 1993), contrary to models that severely restrict the class of information
available to parsing mechanisms. Indeed, Trueswell et al. argue that a wide range of information sources contribute to parsing
decisions. As one final illustration, Trueswell et al. (in press a) have recently provided evidence that semantic properties of the
sentence subject can influence parsing. They contrasted sentences like 'The defendant examined by the lawyer turned out to be
unreliable' with 'The evidence examined by the lawyer turned out to be unreliable'. Parsing models that rely solely on syntactic
preferences would claim that readers would have difficulty with these sentences because they would initially treat
'defendant/evidence examined' to be the subject and verb of the main clause. Although such syntactic preferences might exist,
Trueswell et al. explore the possibility that they could be modified by the relative prototypicality of the preceding nouns as the
subject for the verb. Although 'defendant' is a reasonable subject of the verb 'examined', 'evidence' is not, and Trueswell et al.
provide evidence that readers weigh such information in their initial parsing processes. Thus, like visual depth perception,
parsing appears to proceed by considering multiple sources of information, which may range from general syntactic preferences
to properties of the specific lexical items composing a sentence.
4.4. Grammatical category assignments
One problem that listeners must solve during language comprehension is the assignment of words to the correct grammatical
classes, such as noun and verb. These assignments must be made rapidly given that conversational speech proceeds at about 150
words per minute (Maclay and Osgood 1959). What types of information might listeners exploit in order to make these
categorizations? Both semantic and syntactic factors are certainly available. Semantically, nouns tend to denote concrete objects
whereas verbs tend to denote actions. These patterns are not invariant, but children do show sensitivity to them in experiments
(e.g., Brown 1957). Syntactically, words from different grammatical classes vary in their distributional requirements in
sentences. Thus, English nouns, but not verbs, can appear in the sentence
'The ____ stole the base' whereas verbs, but not nouns, can appear in the sentence 'The runner ____ the base'.
While not disputing the important role of semantic and syntactic information for grammatical class, we would like to explore a
relatively neglected domain of information. In particular, a large number of phonological cues to grammatical class exist, and
experiments have repeatedly shown that speakers are sensitive to them. For example, nouns and verbs in English differ in stress
pattern, syllable number, duration, vowel characteristics, and other phonological dimensions (see Kelly, 1992, for a review).7
English speakers have revealed their sensitivity to these and other correlations in a wide range of tasks. For instance, if listeners
hear disyllabic pseudowords that differ in stress and are asked to use each of these words in a sentence, they will use words with
first syllable stress more often as nouns and words with second syllable stress more often as verbs (Kelly 1988b). Another set of
experiments took advantage of the fact that English words often develop uses in other grammatical categories. Thus, 'police'
originated in English as a noun, but subsequently developed a verb use, whereas 'fumble' originated as a verb, but later
developed a noun use. Despite the frequency with which these lexical extensions occur, biases against certain types of
extensions exist (Clark and Clark 1979). Although the cited biases tend to be based on semantic and pragmatic factors, Kelly
(1988b) speculated that the phonological characteristics of nouns and verbs could influence the ease with which they develop
uses in the other category. In particular, if English speakers have learned the correlation between stress and grammatical class in
English, then they might use this knowledge as a measure of 'fit' between a current noun and possible verb use and vice versa.
Thus, they might consider nouns to be better verb candidates to the extent that they have the prototypical verb stress pattern.
This possibility was tested by presenting one group of subjects with pairs of disyllabic nouns that differed in stress but were
controlled for other factors,
7 Although these correlations are well-documented, their origins may seem mysterious. However, explanations for some
of these patterns have been proposed and experimentally evaluated. For example, Kelly (1992) has argued that three
factors might be involved in the evolution of phonological predictors of grammatical class: (1) Words from different
grammatical classes differ in their distributions in sentences. (2) These distributional differences may have phonological
reflexes associated with them, which create contextual differences in the way words from various classes are pronounced
and/or perceived. (3) Over time, listeners view these contextual effects as permissible context-free pronunciations. This
account has been applied to the English noun-verb stress difference (Kelly 1988a, 1989) and duration difference (Davis et
al. 1992), and might be used as a heuristic to predict the existence of other, currently unknown phonological correlates
with grammatical class.
such as word frequency (e.g., llama and gazelle). Another group of subjects were presented with pairs of disyllabic verbs that
differed in stress (e.g., grovel and beseech). None of the words had uses in the other category as determined by the most
recently available Webster's Collegiate Dictionary. Subjects in the two groups were asked to select one noun (verb) from each
pair and use it as a verb (noun) in a sentence. The choices were significantly affected by the stress patterns of the words, as
nouns with second syllable stress were used as verbs more often than nouns with first syllable stress whereas verbs with first
syllable stress were used as nouns more often than verbs with second syllable stress. These patterns are not simply a laboratory
artifact. Analyses of the history of English also found the same patterns in actual grammatical category extensions down through
the centuries (see Kelly, 1988b, for details).
Although such experiments demonstrate that English speakers possess implicit knowledge of the nounverb stress difference, and
that such knowledge may even have affected the history of English, they do not entail that the initial identification of a word as
a noun or verb is at all affected by stress or other phonological variables. That is, the subject might have recognized that 'gazelle'
is a noun just as easily as they recognized 'llama'. Stress then came into play only later as a metalinguistic variable, but did not
affect initial categorization of the words. In order to determine whether stress actually affects grammatical category assignments
early in speech processing, Kelly and Martin (1993) recently conducted a nounverb categorization task that we will summarize
here.
The task was straightforward: On each trial, the subjects heard a word that they had to classify as a noun or a verb as quickly as
possible. The words were disyllabic and differed in stress, with half of the nouns and half of the verbs having first syllable stress
and the other half having second syllable stress. In addition to varying a phonological correlate to grammatical class, we also
manipulated a semantic cue. Given that nouns generally denote concrete objects and verbs readily perceived actions, we selected
our words such that half of the nouns and half the verbs had the prototypical meanings of their class, whereas the other half
denoted abstractions. It is conceivable that the phonological cue to grammatical class would only influence grammatical category
assignments when the semantic cue is absent. Given that the semantic cue is universal whereas the phonological cues are
language specific, one might expect the former to be more basic and perhaps completely eliminate effects of the phonological
cue. The results indicated that both semantic and phonological factors affected the subjects' judgments. Nouns
and verbs were classified faster if they had meanings typical of their classes. In addition, the phonological variable of stress
interacted with grammatical class such that nouns were classified faster if they had first syllable stress but verbs were classified
faster if they had second syllable stress. Most importantly, the phonological effects were not eliminated for words that had
concrete meanings. Indeed, these effects were significantly magnified by the presence of semantic features that converged on the
same categorization. The effect of concreteness was likewise magnified by the presence of appropriate phonological features.
For example, concrete nouns with first or second syllable stress were classified faster than abstract nouns with either stress
pattern. However, the difference between the concrete and abstract nouns was larger for items with first syllable stress. For
verbs, on the other hand, the concrete/abstract difference was larger for items with second syllable stress. Thus, listeners appear
to use a conspiracy of cues to identify the grammatical category of a word. These cues can either be language-universal, like the
semantic cue to grammatical class, or language-specific, like stress in English. Finally, the language-specific cues are not
overwhelmed or even damped by the language-universal factors. Instead, they mutually reinforce one another.
5. Conclusions
In this paper, we have submitted arguments and evidence for the following claims:
(1) Multiple sources of information are available to solve problems in perception and
cognition.
(2)These information sources, though plentiful in number, are often probabilistic in
nature. Nonetheless, if multiple probabilistic sources converge on the solution to a
problem, that solution is likely to be correct. Reliance on multiple cues should
therefore produce greater success than reliance on an individual cue.
(3)Given (1) and (2), one would expect to see widespread sensitivity to probabilistic
information throughout the animal kingdom. Furthermore, since many problem
domains are characterized by probabilistic solutions, sensitivity to variables such as
frequency and rate of return should be a domain-general ability. This general ability
could, like memory, be involved in species-specific problems that have domain-
dependent characteristics. The current evidence from the human and non-human
animal literature strongly supports the hypothesis concerning sensitivity to
probabilistic
In conclusion, over the past few decades, cognitive scientists have moved away from the position that general principles operate
across cognitive and perceptual domains. Instead, research and theory ranging from cognitive development (e.g., Chi 1978, Keil
1989) to animal learning (e.g., Seligman 1970) have emphasized that domain-dependent principles might constrain the ways in
which human beings gather, organize, and draw inferences from different kinds of information. This approach recognizes that
organisms confront many problems whose solutions require specific kinds of information that must be manipulated in highly
constrained ways. As a result, the manner in which, say, migrating birds solve their navigational problems will probably not
provide us much guidance in determining how children acquire the meanings of words in their language. However, the value of
the domain-specific approach should not cause us to neglect the fact that similar types of problems are in fact encountered
across domains, and hence domain-general solutions might have evolved to deal with them. The inherent probabilistic nature of
an animal's environment is one such invariant, and sensitivity to probablistic patterns appears to permeate perception and
cognition, even in areas like language that seem to be highly constrained by domain-dependent principles.
References
Alloy, L.B. and N. Tabachnik, 1984. Assessment of covariation by humans and animals: The joint influence of prior
expectations and current situational information. Psychological Review 91, 112149.
Armstrong, S., L. Gleitman and H. Gleitman, 1983. What some concepts might not be. Cognition 13, 263308.
Attneave, F., 1953. Psychological probability as a function of experienced frequency. Journal of Experimental Psychology 46,
8186.
Baron, J., 1988. Thinking and deciding. New York: Cambridge University Press.
Best, C.T., B. Morrongiello and R. Robson, 1981. Perceptual equivalence of acoustic cues in speech and nonspeech perception.
Perception and Psychophysics 29, 191211.
Blumstein, S.E. and K.N. Stevens, 1979. Acoustic invariance for place of articulation in speech production: Evidence from
measurements of the spectral characteristics of stop consonants. Journal of the Acoustical Society of America 66, 10011017.
Blumstein, S.E. and K.N. Stevens, 1980. Perceptual invariance and onset spectra for stop consonants in different vowel
environments. Journal of the Acoustical Society of America 67, 648662.
Brown, R., 1957. Linguistic determinism and part of speech. Journal of Abnormal and Social Psychology 49, 454462.
Bruno, N. and J.E. Cutting, 1988. Minimodularity and the perception of layout. Journal of Experimental Psychology: General
117, 161170.
Chi, M.T.H., 1978. Knowledge structures and memory development. In: R.S. Siegler (ed.), Children's thinking: What develops?,
7396. Hillsdale, NJ: Erlbaum.
Clark, E.V. and H.H. Clark, 1979. When nouns surface as verbs. Language 55, 767811.
Coren, S. and C. Porac, 1977. Fifty centuries of right-handedness: The historical record. Science 198, 631632.
Cutler, A. and S. Butterfield, 1992. Rhythmic cues to speech segmentation: Evidence from juncture misperception. Journal of
Memory and Language 31, 218236.
Cutler, A. and D.M. Carter, 1987. The predominance of strong initial syllables in the English vocabulary. Computer Speech and
Language 2, 133142.
Cutler, A. and D.G. Norris, 1988. The role of strong syllables in segmentation for lexical access. Journal of Experimental
Psychology: Human Perception and Performance 14, 113121.
Cutler, A., D. Norris and J.N. Williams, 1987. A note on the role of phonological expectation in speech segmentation. Journal of
Memory and Language 26, 480487.
Cutting, J.E. and R.T. Millard, 1984. Three greadients and the perception of flat and curved surfaces. Journal of Experimental
Psychology: General 113, 198216.
Davis, S., J. Morris and M.H. Kelly, 1992. The causes of duration differences between English nouns and verbs. Unpublished
manuscript.
Estes, W.K., 1985. Some common aspects of models for learning and memory in lower animals and man. In: L. Nilsson, T.
Archer (eds.), Perspectives on learning and memory, 151166. Hillsdale, NJ: Erlbaum.
Ferreira, F. and C. Clifton, 1986. The independence of syntactic processing. Journal of Memory and Language 25, 348368.
Ferreira, F. and J.M. Henderson, 1990. The use of verb information in syntactic parsing: A comparison of evidence from eye
movements and word-by word self-paced reading. Journal of Experimental Psychology: Learning, Memory, and Cognition 16,
555568.
Fisher, C., H. Gleitman and L.R. Gleitman, 1991. On the semantic content of subcategorization frames. Cognitive Psychology
23, 331392.
Francis, W.N. and H. Kucera, 1982. Frequency analysis of English usage: Lexicon and grammar. Boston, MA: Houghton-
Mifflin.
Frazier, L. and J.D. Fodor, 1978. The sausage machine: A new parsing model. Cognition 6, 291326.
Frazier, L. and K. Rayner, 1982. Making and correcting errors during sentence comprehension: Eye movements in the analysis
of structurally ambiguous sentences. Cognitive Psychology 14, 178210.
Frazier, L. and K. Rayner, 1987. Resolution of syntactic category ambiguities: Eye movements in parsing lexically ambiguous
sentences. Journal of Memory and Language 26, 505526.
Gallistel, C.R., 1990. The organization of learning. Cambridge, MA: MIT Press.
Gernsbacher, M.A., 1984. Resolving 20 years of inconsistent interactions between lexical familiarity and orthography,
concreteness, and polysemy. Journal of Experimental Psychology: General 113, 256281.
Gleitman, L.R., H. Gleitman, B. Landau and E. Wanner, 1988. Where learning begins: Initial representations for language
learning. In: F.J. Newmeyer (ed.), Linguistics: The Cambridge survey, Vol. 3, 150193. Cambridge: Cambridge University Press.
Gluck, M.A. and G.H. Bower, 1988. From conditioning to category learning: An adaptive network model. Journal of
Experimental Psychology: General 117, 227247.
Gould, S.J., 1991. Bully for brontosaurus: Reflections in natural history. New York: Norton.
Gropen, J., S. Pinker, M. Hollander, R. Goldberg and R. Wilson, 1989. The learnability and acquisition of the dative alternation
in English. Language 65, 203257.
Hasher, L. and R.T. Zacks, 1984. Automatic processing of fundamental information: The case of frequency of occurrence.
American Psychologist 39, 13721388.
Hintzman, D.L., 1988. Judgments of frequency and recognition memory in a multiple-trace memory model. Psychological
Review 95, 528551.
Johnston, J.C., 1978. A test of the sophisticated guessing theory of word perception. Cognition 10, 123153.
Jonides, J. and C.M. Jones, 1992. Direct coding of frequency of occurrence. Journal of Experimental Psychology: Learning,
Memory, and Cognition 18, 368378.
Just, M.A. and P.A. Carpenter, 1992. A capacity theory of comprehension: Individual differences in working memory.
Psychological Review 99, 122149.
Keil, F.C., 1989. Concepts, kinds, and cognitive development. Cambridge, MA: MIT Press.
Kelly, M.H., 1988a. Rhythmic alternation and lexical stress differences in English. Cognition 30, 107137.
Kelly, M.H., 1988b. Phonological biases in grammatical category shifts. Journal of Memory and Language 27, 343358.
Kelly, M.H., 1989. Rhythm and language change in English. Journal of Memory and Language 28, 690710.
Kelly, M.H., 1992. Using sound to solve syntactic problems: The role of phonology in grammatical category assignments.
Psychological Review 99, 349364.
Kelly, M.H. and S. Martin, 1993. Phonological cues to grammatical class. Unpublished manuscript.
Kimball, J., 1973. Seven principles of surface structure parsing in natural language. Cognition 2, 1547.
Levy, Y., 1983. It's frogs all the way down. Cognition 15, 7593.
Liberman, M., 1991. Colloquium presented to the Department of Psychology, University of Pennsylvania.
Lichenstein, S., P. Slovic, B. Fischoff, M. Layman and B. Combs, 1978. Judged frequency of lethal events. Journal of
experimental Psychology: Human Learning and Memory 4, 551578.
Lieberman, D.A., 1990. Learning. Belmont, CA: Wadsworth.
Lisker, L., 1978. Rapid vs. rabid: A catalogue of acoustic features that may cue the distinction. Haskins Laboratories Status
Report on Speech Research, SR-54, 127132.
Maclay, H. and C.E. Osgood, 1959. Hesitation phenomena in spontaneous English speech. Word 15, 1944.
MacWhinney, B., 1978. The acquisition of morphonology. Monographs of the Society for Research in Child Development 43
(1/2), 174.
Massaro, D.W., 1991. Language processing and information integration. In: N.H. Anderson (ed.), Contributions to information
integration theory, Vol. 1: Cognition, 259292. Hillsdale, NJ: Erlbaum.
McCloskey, M., 1983. Intuitive physics. Scientific American 24, 122130.
Miller, J.L. and P.D. Eimas, 1983. Studies on the categorization of speech by infants. Cognition 13, 135165.
Miller, J.L. and A.M. Liberman, 1979. Some effects of later-occurring information on the perception of stop consonant and
semivowel. Perception and Psychophysics 25, 457465.
Nisbett, R.E. and L. Ross, 1980. Human inference: Strategies and shortcomings of human judgment. Englewood Cliffs, NJ:
Prentice-Hall.
Pang, K., F. Merkel, H. Egeth and D.S. Olton, 1992. Expectancy and stimulus frequency: A comparative analysis in rats and
humans. Perception and Psychophysics 51, 607615.
Pinker, S., 1989. Lernability and cognition. Cambridge, MA: MIT Press.
Popova, M.I., 1973. Grammatical elements of language in the speech of preschool children. In: C.A. Ferguson, D.I. Slobin
(eds.), Studies of child language development, 269280. New York: Holt, Rinehart, & Winston.
Reicher, G.M., 1969. Perceptual recognition as a function of meaningfulness of stimulus material. Journal of Experimental
Psychology 81, 275280.
Repp, B.H., 1979. Relative amplitude of aspiration noise as a voicing cue for syllable-initial stop consonants. Language and
Speech 22, 173189.
Repp, B.H., 1982. Phonetic trading relations and context effects: New experimental evidence for a speech mode of perception.
Psychological Bulletin 92, 81110.
Repp, B.H. and A.M. Liberman, 1987. Phonetic category boundaries are flexible. In: S. Harnad (ed.), Categorical perception:
The groundwork of cognition, 89112. Cambridge: Cambridge University Press.
Rescorla, R.A., 1968. Probability of shock in the presence and absence of CS in fear conditioning. Journal of Comparative and
Physiological Psychology 66, 15.
Rescorla, R.A., 1969. Conditioned inhibition of fear. In: W.K. Honig, N.J. Mackintosh (eds.), Fundamental issues in associative
learning. Halifax: Dalhousie University Press.
Rescorla, R.A., 1988. Pavlovian conditioning: It's not what you think it is. American Psychologist 43, 151160.
Rescorla, R.A. and A.R. Wagner, 1972. A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and
nonreinforcement. In: A.H. Black, W.F. Prokasy (eds.), Classical conditioning, Vol. 2: Current research and theory, 6499. New
York: Appleton-Century Crofts.
Rosch, E. and C.B. Mervis, 1975. Family resemblances: Studies in the internal structure of categories. Cognitive Psychology 7,
573605.
Rubin, D.C., 1974. The subjective estimation of syllable frequency. Perception and Psychophysics 16, 193196.
Ruke-Dravina, V., 1973. On the emergence of inflection in child language: A contribution based on Latvian speech data. In:
C.A. Ferguson, D.I. Slobin (eds.), Studies of child language development, 252267. New York: Holt, Rinehart, & Winston.
Saegert, S., W. Swap and R.B. Zajonc, 1973. Exposure, context, and interpersonal attraction. Journal of personality and Social
Psychology 25, 234242.
Seligman, M.E.P., 1970. On the generality of the laws of learning. Psychological Review 77, 406418.
Shannon, B., 1976. Aristotelianism. Newtonianism, and the physics of the layman. Perception 5, 241243.
Shapiro, B.J., 1969. The subjective estimation of relative word frequency. Journal of Verbal Learning and Verbal Behavior 13,
638643.
Shapiro, L.P., H.N. Nagel and B.A. Levine, 1993. Preferences for a verb's complements and their use in sentence processing.
Journal of Memory and Language 32, 96114.
Shedler, J.K., J. Jonides and M. Manis 1985. Availability: Plausible but questionable. Paper presented at the 26th annual
meeting of the Psychonomic Society, Boston, MA.
Smith, E.E. and D. Medin, 1981. Categories and concepts. Cambridge, MA: Harvard University Press.
Trueswell, J.C., M.K. Tanenhaus and S.M. Garnsey, in press a. Semantic influences on parsing: Use of thematic role
information in syntactic ambiguity resolution. Journal of Experimental Psychology: Learning, Memory, and Cognition.
Trueswell, J.C., M.K. Tanenhaus and C. Kello, in press b. Verb-specific constraints in sentence processing: Separating effects of
lexical preference from garden-paths. Journal of Experimental Psychology: Learning, Memory, and Cognition.
Underwood, B.J., 1983. Attributes of memory. Glenview, IL: Scott, Foresman.
Walley, A.C. and T.D. Carrell, 1983. Onset spectra and formant transitions in the adult's and child's perception of place of
articulation in stop consonants. Journal of the Acoustical Society of America 73, 10111022.
Wasserman, E.A. and H. Shaklee, 1984. Judging response-outcome relations: The role of response-outcome contingency,
outcome probability, and method of information presentation. Memory and Cognition 12, 270286.
Zajonc, R.B., 1968. Attitudinal effects of mere exposure. Journal of Personality and Social Psychology Monograph Supplement
9(2), Part 2, 128.
Section 3
Categorizing the world
1. Introduction
Many students of language acquisition and cognitive development argue that the continuity hypothesis should be the default, to
be defeated only in the face of extraordinary evidence (e.g., Pinker 1984, Macnamara 1982). The continuity hypothesis is that
representational format is constant throughout development; that the child has innately the logical and conceptual resources to
represent his or her world as do adults. The continuity hypothesis denies stage changes of the sort envisioned by Piaget, denies
changes in the child's linguistic representations such as the putative 'semantic category/syntactic category' shift posited some
years ago. According to the continuity hypothesis, language learning is a very complex mapping process; the child must learn
which syntactic devices his/her language employs, and which of a universal set of semantic distinctions are expressed in the
syntax of his/her language. What the child need not do, on the continuity hypothesis, is construct genuinely new representational
resources.
Of course, whether the continuity hypothesis is true or not is an empirical question, and to examine it, one must entertain
possibilities as to what types
of discontinuities could possibly obtain in the course of development. If evidence for discontinuities is found, several further
questions are then licensed, including: (1) by what mechanism is the change effected (e.g., maturational, learning by some other
process than currently understood parametersetting or hypothesis testing methods). (2) What is the relation between the
discontinuity and language learning? Is some change in representational resources required as a prerequisite to some aspect of
language learning? Alternatively, does language learning play a role in causing the change?
Here I examine an important discontinuity proposal of Quine's, versions of which are endorsed by thinkers as diverse as the
British empiricists and Piaget. Quine, Piaget, and others maintain that early representations of the world are formulated over a
perceptual quality space (Quine, the empiricists) or sensori-motor representational system (Piaget). On both Quine's and Piaget's
views, the baby is not capable of formulating any representations with the properties of adult concepts such as object, dog, table.
Quine's proposal is that the ontology that underlies language is a cultural construction. 'Our conceptual firsts are middle-sized,
middle distanced objects, and our introduction to them and to everything comes midway in the cultural evolution of the race'
(Quine 1960: 5). Before the child has mastered this cultural construction, the child's conceptual universe consists of
representations of histories of sporadic encounters, a scattered portion of what goes on. Quine speculates as to the
representations underlying the toddler's uses of the words 'water', 'red', and 'Mama'. 'His first learning of the three words is
uniformly a matter of learning how much of what goes on about him counts as the mother, or as red, or as water. It is not for the
child to say in the first case, 'Hello, Mama again', in the second case 'Hello, another red thing', and in the third case, 'Hello,
more water'. They are all on a par: Hello, more Mama, more red, more water' (Quine 1960: 92). The child masters the notion of
an object, and of particular kinds of objects, in the course of getting the hang of what Quine calls 'divided reference', and this
through the process of mastering quantifiers and words like 'same'. 'The contextual learning of these various particles goes on
simultaneously, we may suppose, so that they are gradually adjusted to one another and a coherent pattern of usage is evolved
matching that of one's elders. This is a major step in acquiring the conceptual scheme that we all know so well. For it is on
achieving this step, and only then, that there can be any general talk of objects as such' (Quine 1969: 910). And in another place
he finishes the same idea with a bootstrapping metaphor, underlining the degree of conceptual change he thinks is occurring:
'The child scrambles up an intellectual chimney, supporting himself against each side by pressure against the others' (Quine
1960: 93). Quine also states that once the child has mastered the notion of an object, and got the trick of divided reference, he
goes back and reanalyzes 'Mama', so that it is now the name of a unique enduring person.
Quine's view can be schematized as follows. Imagine a portion of bottle experience that we adults would conceptualize as a
single bottle. Babies respond to bottleness or bottlehood also, and can learn many things about bottlehood; for instance, they can
come to associate bottlehood with milk, or with the word 'bottle'. Now imagine a portion of bottle experience that we would
conceptualize as three bottles. The infant would also expect to obtain milk (indeed, more milk) from this bottleness and could
also refer to it with the word 'bottle'. Note that shape is important to the identification of bottlehood, just as the shape of the
individual grains is important for distinguishing rice from spaghetti from macaroni. Similary, even if Mama is a scattered portion
of what goes on, shape is important for distinguishing Mama from Rover or from Papa. That shape is important for
distinguishing what scattered portion of experience constitutes bottlehood does not mean that the baby is capable of representing
'a bottle', 'two bottles', or 'the same bottle I had yesterday'. Thus, demonstrations that toddlers are sensitive to shape in inductions
of word meanings when new words are ostensively defined over objects (e.g., Landau, this volume) do not bear on Quine's
proposal.
In this discussion I will not make contact with Quine's radical philosophical views such as the indeterminacy of translation. I
assume that we can characterize the adult's ontological commitments, that these include middle-sized physical objects, and that
words such as 'table', 'dog' and 'person', function as sortals in the adult lexicon, in Wiggins' (1980) sense. Sortals refer to kinds of
individuals (i.e., divide reference), providing conditions for individuation (establishing the boundaries of entities) and for
numerical identity (establishing when an individuated entity is the same one as one experienced at some other time, or in some
counterfactual world). One way of stating Quine's hypothesis, as I construe it, is that babies and toddlers represent no sortal
concepts, no concepts that provide conditions of individuation and numerical identity, no concepts that divide reference.
Two reviewers of this paper raised the objection that representations of shapes presuppose representations of individuals that
have those shapes, claiming therefore that Quine's proposal (at least as construed above) is incoherent. This is not so. Please
dwell on the spaghetti, macaroni case. It's
true that if the contrast between the two types of stuff is based on the shape differences of individual pieces, then some
representation of those individual pieces must enter into the representation of shape. But our concepts of spaghetti and macaroni
(and the words spaghetti, macaroni) do not quantify over those individuals. Similarly, we can represent the shape of a scattered
portion of sand, arranged, for example, into an S, and when we refer to it as 'a portion' or 'an S' we are quantifying over that
individual. But when we think of it as sand, we are not. Quine's proposal is that the child's conceptual/linguistic system has only
the capacity to represent the world in terms of concepts like furniture, sand, bottlehood. Of course the child's perceptual system
must pick out individuals in order to represent shape, to determine what to grasp, and so on. This is part of what Quine meant
when he claimed that the child is inherently 'body minded' (Quine 1974).
Piaget, like Quine, believed that that baby must construct the concept of enduring objects, although he differed from Quine as to
the mechanisms he envisioned underlying this construction. Quine saw the child's mastery of the linguistic devices of noun
quantification, the machinery by which natural languages such as English manage divided reference, as the process through
which the child's ontology comes to match his or her elders'. Piaget held that the baby constructs the concept object during the
course of sensori-motor development by the age of 18-months or so, and that this construction is the basis for the child's mastery
of natural language. Since Piaget did not frame his discussion in terms of an analysis of the logic of sortals, it is not clear when
he would attribute full sortals to the child.1
The Quine/Piaget conjecture about the baby's representational resources is a serious empirical claim, and as I will show, it is
difficult to bring data to bear on it. In what follows, I first consider Quine's views, contrasting his hypothesis that children come
to represent sortals only upon learning the linguistic devices of noun quantification with what I will call the 'Sortal First'
hypothesis. The Sortal First hypothesis is that babies represent sortal concepts, that toddler lexicons include words that express
sortals, and that these representations underly the capacity for learning quantifiers rather than resulting from learning them. I
then turn to early infancy, and explore the contrast between the Quine/Piaget hypothesis and the Sortal First hypothesis as
regards the earliest phases of word learning. A preview of my conclusions: whereas the Sortal First hypothesis is ultimately
favored, evidence
1 For example, Piaget thought that the logical prerequisites for representing the adult concepts all and some are not
acquired until after age 5.
kind, respectively, this would tell against Quine. This is because these are the first relevant quantifiers the child learns. If he or
she interprets them correctly from the beginning, the interpretation could not have been acquired through an adjustment process
involving the entire set of quantificational devices of noun syntax. This last point is important. In the beginnings of language
learning, on Quine's view, children will not interpret those few quantifiers in their lexicons as adults do. The scramble will have
just begun. Data showing that children use 'a' and plurals will not be itself relevant to Quine's hypothesis; it must be shown that
such quantificational devices are doing the same work as they do in the adult language.
thus be a salient feature of the blicket, but not of the stad, for non-solid substances do not maintain their shapes when
manipulated. For non-solid substances, properties such as texture and color might be salient, for these stay constant over
experiences with substances. In other words, the two-year-old could be using 'blicket' to refer to blicketness, and recognize
blicketness by shape. The differential patterns of projection do not establish that the toddler is using 'blicket' to refer to any
individual whole object of a certain kind, that the toddler divides the reference of 'blicket'.
One detail of the data from figure 2 favors the Sortal First over the Quinian interpretation, and that is that toddlers performed
more like adults on the object trials than on the substance trials. Quine's interpretation of this would have to be ad hoc, perhaps
that the baby has had more object experience than substance experience. But the Sortal First hypothesis predicts this asymmetry.
To see this, suppose the Sortal First hypothesis is true, and suppose that upon first hearing the word 'blocket' the child assumes
that it refers to each individual object of a certain kind. The choices for testing how the child projects 'blicket' included another
single object, and 3 small objects. Even if the child isn't exactly sure of which features of the blicket establish its kind, the child
can rule out that the 3 small objects are a blicket, for under no interpretation can they be an individual object of the same kind
as the original referent. Children should then be at ceiling on the object trials, which they are. The substance trials are another
story. If upon first hearing 'stad', the child takes it to refer to the kind of substance of the original referent, then scattered
portions have no different status from unitary portions. There is no clue from number of piles which of the choices on the test
trials is the stad. If children are not certain what properties of the original sample of stad determine the kind stad, they might do
worse on the stad trials. And indeed, they do.
The key issue here is the role of number in determining how to project 'blicket'. If the Quinian interpretation of the data is
correct, the baby should project 'blicket' on the basis of shape similarity, no matter whether the choice that does not match in
shape consists of one object or three objects. That is, the baby should succeed on an object trial as on figure 3 as well as on an
object trial as in figure 1. The Sortal First interpretation predicts that performance on the object trials will fall to the level of
performance on the substance trials if the cue from number is removed (figure 3). In an object trial such as that on figure 3,
'blicket' is ostensively defined as before, but the choices for projection are changed: another blicket of a different material (as
before) and another whole object of a different kind made of the same
learning English noun quantifiers. This assumption seems warranted, given that as a whole toddlers at 2:0 do not produce
quantifiers, and given that the pattern of projection was independent of whether the individual subjects produced any noun
quantifiers selective for count nouns. A worry, though, is that babies may have better comprehension than production of the
quantifiers.
We attempted to address that possibility by manipulating the syntactic context in which the word appeared. As mentioned
above, the syntactic environment in which the new word appeared had no effect in Soja et al.'s experiments, even at age 2 ½
when many children did produce quantifiers differentially for what are count and mass nouns in the adult lexicon. The Quinian
interpretation of this fact is that quantifiers like 'a', 'another', 'some NOUN_', 'some more NOUN_' do not yet signal the
distinction between individuated and nonindividuated entities, just as the child is not projecting 'blicket' and 'stad' on the basis of
that distinction. The Sortal First interpretation: objects are naturally construed as individuals of a kind and non-solid substances
are naturally construed as non-individuated entities, even by toddlers, as shown by performance in the neutral syntax condition.
Informative syntax merely reinforces the child's natural construal of the two types of entities.
A study by Soja (1992) decided between these two interpretations, and also established that our production data did not
underestimate toddlers' interpretation of the quantifiers. Soja taught toddlers words for the objects and substances in a new
condition: contrastive syntax. 'Blicket' was introduced in a mass noun context; 'stad' in count noun context. That is, when shown
a novel solid object, the child was told, 'Here's some blicket Would you like to see some more blicket?' And when shown a non-
solid substance fashioned into a distinctive shape, the child was told, 'Here's a stad Would you like to see another stad?' As can
be seen from figure 4, at both ages 2 and 2 ½, the pattern of projection was markedly different from that seen in the neutral and
informative syntax conditions (figure 2). At both ages, the syntactic context 'some NOUN_', 'some more NOUN_' made children
slightly less likely to construe 'blicket' as referring to an individual whole object of a kind. There was a slight tendency towards
interpreting it to mean something like brass. The syntactic context 'a stad' made children significantly less likely to construe the
non-solid substance as a non-idividuated entity. Rather, they interpreted the word as meaning something like s-shaped pile.
Wait, you might say, doesn't this show that children at these ages do know the force of 'a', 'another', and so might have learned
to represent sortal
construes physical objects as individuals in distinct kinds, and naturally construes non-solid substances in terms of kind of non-
individuated entities. These natural construals support adult-like projection of word meaning (figure 2), and support adult-like
interpretation of newly learned quantifiers like 'a', 'some' and plurals.
7. Younger infants
Altogether the data support the Sortal First hypothesis over Quine's conjecture, but they do not establish when the child first
begins to represent sortal concepts. As noted earlier, it is not clear when Piaget would attribute sortal concepts to children, but it
is certain that he would deny them to young infants. The argument I have developed so far does not bear on Piaget's claims
about the representational capacities of infants, as it concerns children age 24 months and older. Of course, a demonstration that
young infants represent sortal concepts would defeat Quine's conjecture as well as Piaget's characterization of the infants'
conceptual resources.
Studies by Cohen and his colleagues (e.g., Cohen and Younger 1983) show that quite young babies will habituate when shown,
for example, a series of distinct stuffed dogs, and that they generalize habituation to a new stuffed dog and will dishabituate
when shown a stuffed elephant. Similarly, when shown a series of distinct stuffed animals, babies of 8 or 9 months habituate,
generalize habituation to a new stuffed animal, but dishabituate to a toy truck. Do these data not show that babies of that age
represent concepts such as 'dog' and 'animal?'
Certainly not. Babies may be sensitive to dog shapes or animal shapes; babies may be habituating to doghood or animalhood. To
credit the baby with sortals such as 'dog', or 'animal', we must show that such concepts provide the baby with criteria for
individuation and identity.
My discussion of this question has two steps. First, I argue that babies represent at least one sortal, object. Second, I present
some recent data from my lab that suggest that as late as 10 months of age, the baby may have no more specific sortal concepts
not cup, bottle, truck, dog, animal . Thus, a Quinian interpretation of the above habituation data may well be correct.
original looking time. The baby is getting bored. After looking time has decreased to 1/2 its original level, the baby is presented
with an array containing one object, or three objects. In both cases, looking time recovers to its original level. The baby notices
the difference between two objects, on the one hand, and a single object or three objects, on the other. This result, or one very
like it, has been obtained with neonates (Antell and Keating 1983).
In fact, the baby's capacity to detect similarity in number across distinct arrays serves a methodological wedge into the problem
of how babies individuate objects. The baby can be habituated as described above, to two objects, and then presented with an
array as in figure 5, consisting of two distinct objects sharing a common boundary. Babies dishabituate to this array, showing
that they perceive it as one object, rather than two. These data support the conclusion, derived from other types of data as well,
that babies are not sensitive to shape or texture regularity in individuating objects; they need positive evidence of distinct
boundaries, such as one object moving with respect to the other, or the objects' being separated in space.
unexpected outcome of three objects. Experiments of the same sort demonstrated that babies expected 3 - 1 to be 2, and 2 - 1 to
be 1.
Whereas these studies were performed to explore the baby's concept of number, they bear on our question as well. Babies, like
anybody, cannot count unless they have criteria that establish individuals to count. Babies clearly have criteria that establish
small physical objects as countable individuals.
such as car, person, table. Such concepts (sortals), typically lexicalized as count nouns in languages that have a count/mass
distinction, provide additional criteria for individuation and identity to the spatiotemporal criteria that apply to bounded physical
objects in general, and to the general assumption that an object's properties stay stable over time, or change continuously. When
a person, Joe Shmoe, dies, Joe ceases to exist, even though Joe's body still exists. The sortal person provides the criteria for
identity of the entity referred to by the name 'Joe Shmoe'.
In collaboration with Fei Xu, I have been exploring the question of whether babies represent any sortals more specific than
object, or whether babies can use property/kind information to individuate and trace identity of objects (Xu and Carey 1993).
Consider the events depicted in figure 7. An adult witnessing a truck emerge from behind and then reenter a screen and then
witnessing an elephant emerge from behind and then reenter the screen would infer that there are at least two objects the screen:
a truck and an elephant. The adult would make this inference in the absence of any spatiotempotal evidence for two distinct
objects, not having seen two at once nor any suggestion of a discontinuous path through space and time. Adults trace identity
relative to sortals such as 'truck' and 'elephant' and know that trucks do not turn into elephants.
Xu and Carey (1993) have carried out four experiments based on this design. Ten-month-old babies were shown screens from
which two objects of different kinds (e.g., a cup and a toy elephant, a ball and a truck) emerged from opposite sides, one at a
time. Each object was shown a total of four times. After this familiarization, the screen was removed, revealing either two
objects (expected outcome) or one object (unexpected outcome). In all four studies, babies looked longer at the expected
outcome. They could not use the difference between a cup and an elephant to infer that there must be two objects behind the
screen.
Another group of 10-month-olds was run in a parallel version of this study based on Spelke's design (figure 6). That is, babies
were shown two identical objects emerging from the two screens a total of four times each, and the timing of the events was the
same in the one screen/two kinds studies. Babies succeeded, looking longer at the unexpected outcome of one object.
Apparently, babies can use spatiotemporal information to individuate objects before they can use kind information.
We have ruled out several uninteresting interpretations of the failure in the property/kind conditions of these studies. For
example, it is not that babies
simply do not use this information to drive the inference that there must be two numerically distinct objects behind the screen.
In appears, then, that in one sense Quine was right. Very young infants have not yet constructed concepts that serve as adult-like
meanings of works like 'bottle', 'ball', and 'dog'. How are the babies representing these events? We can think of two possibilities.
First, the babies may actually establish a representation of a single individual object (OBJECTi) moving back and forth behind
the screen, attributing to this object the properties of being yellow and duck-shaped at some times and white and spherical at
other times. The basis for such a representation could be spatiotemporal: the infants may take the oscillating motion as a single,
continuous, path.
A second possibility is that the baby is making no commitment at all concerning whether the objects emerging to the left and
right of the screen are the same or different. That is, the baby is representing the event as OBJECT emerging from the left of the
screen, followed by OBJECT emerging from the right of the screen, and represents these neither as a single object (OBJECTi)
nor as distinct objects (OBJECTi, OBJECTj). Suppose you see a leaf on the sidewalk as you walk to class, and you see a leaf on
roughly the same place on the sidewalk as you return from class. That may be the same leaf or it may not; your conceptual
system is capable of drawing that distinction, but you leave the question open. If the infant is leaving the issue open in this case,
then why does he/she appear surprised when the screens are removed and two objects are revealed? On this hypothesis, the
longer looking time at two objects is a familiarity effect; the infant has been familiarized with instances of single objects, and
thus seeing two objects is different. After all, babies can be habituated to 'oneness' by being shown a series of objects, one at a
time. Even if you were not sure whether that leaf was the same as the one you had seen earlier, if you returned to the classroom
later in the day and encountered two leaves on the sidewalk, you would see this state of affairs as different from ones in which
you encountered cases of single leaves on the sidewalk.
We do not know which possibility is correct. The baby actually may be representing the events as if a duck-shaped object is
turning into a ball-shaped object (possibility one) or simply may be failing to establish representations of two distinct objects
(possibility two). The take-home message is the same whichever possibility is correct; 10-month-old infants do not use the
property/kind differences between a red metal truck and a gray rubber elephant to infer that there must be two numerically
distinct objects involved in the event.
At 11 months, about half of the babies we test succeed at our task. When babies do succeed, are they doing so on the basis of
kind information or property information? That is, are they representing the events as do adults, as involving a duck and a ball,
or are they individuating the objects on the basis of property differences? Further experiments could bear on this question. For
example, habituation studies show babies to be sensitive to color changes and size changes, but color and size are not the types
of properties that signal kind differences, at least in the adult conceptual system. Would babies of the age succeeding at this task
be as likely to infer two objects when shown a blue and red cup, or a big and small cup, emerging from either side of the screen,
as when shown a blue cup and a blue elephant of equal sizes emerging from either side of the screen? A difference in success
rate favoring the latter pair would be suggestive that babies, just like adults, come to represent kinds of objects, and individuate
objects relative to kinds. These experiments together would provide information about the developmental course of this
representational capacity.
It is significant that babies begin to comprehend and produce object names at about 10 to 12 months of age, the age at which
they begin to use the differences between cups and elephants to individuate objects. Again, this pattern of results is consistent
with the Sortal First hypothesis. That is, babies do not seem to learn words for bottlehood; they begin to learn words such as
'bottle' just when they show evidence for sortal concepts such as bottle which provide conditions for individuation and
numerical identity. Current studies in our lab are exploring the relations between specific words understood and success at
individuation based on the kinds expressed by those words.
It is not surprising that babies use spatiotemporal information before kind information to individuate and trace the identity of
objects. All physical objects trace spatiotemporally continuous paths; no physical object can be in two places at the same time.
However, what property changes are possible in a persisting object depends upon the kind. An apparent change of relative
location of the handle to the body of a ceramic cup signifies a different cup; an apparent change of relative location of a hand to
the body of a person does not signify a different person.
In sum, these data suggest that babies have at one sortal concept innately physical object. Their object concept provides
spatiotemporal conditions for individuation and numerical identity. They can use spatiotemporal information to identify
individuals in their environment, and can then learn more specific sortals for kinds of these objects. Exactly how this is
accomplished is the big question, of course. The present data suggests that they spend most of their first year of life on this
accomplishment.
References
Antell, S. and D.P. Keating, 1983. Perception of numerical invariance in neonates. Child Development 54, 695701.
Baillargeon, R., 1990. Young infants' physical knowledge. Paper presented at the American Psychological Association
Convention, Boston.
Bloom, P., 1990. Syntactic distinctions. Child Language 17, 343355.
Bowerman, M., 1978. The acquisition of word meaning: An investigation into some current conflicts. In: N. Waterson, C. Snow
(eds.), Development of communication. New York: Wiley.
Carey, S., 1991. Knowledge acquisition: Enrichment or conceptual change? In: S. Carey, R. Gelman (eds.), The epigenesis of
mind: Essays in biology and cognition. Hillsdale, NJ: Erlbaum.
Carey, S. and E. Spelke, in press. Domain specific knowledge and conceptual change. In: L. Hirschfeld, S. Gelman (eds.),
Cultural knowledge and domain specificity. Cambridge, UK: Cambridge University Press.
Cohen, L.B. and B.A. Younger, 1983. Perceptual categorization in the infant. In: E.K. Scholnick (ed.), New trends in conceptual
representation, 197200. Hillsdale, NJ: Erlbaum.
Dromi, E., 1987. Early lexical development. London: Cambridge University Press.
Gordon, P., 1982. The acquisition of syntactic categories: The case of the count/mass distinction. Unpublished doctoral
dissertation, Massachusetts Institute of Technology, Cambridge, MA.
Gordon, P., 1985. Evaluating the semantic categories hypothesis: The case of the count/mass distinction. Cognition 20, 209242.
Hirsch, E., 1982. The concept of identity. New Haven, CT: Yale University Press.
Huttenlocher, J. and P. Smiley, 1987. Early word meanings: The case of object names. Cognitive Psychology 19, 6389.
Katz, N., E. Baker and J. Macnamara, 1974. What's in a name? A study of how children learn common and proper names. Child
Development 45, 469473.
Macnamara, J., 1982. Names for things: A study of human learning. Cambridge, MA: MIT Press.
Macnamara, J., 1986. A border dispute. Cambridge, MA: MIT Press.
Pinker, S., 1984. Language learnability and language development. Cambridge, MA: Harvard University Press.
Quine, W.V.O., 1960. Word and object. Cambridge, MA: MIT Press.
Quine, W.V.O., 1969. Ontological relativity and other essays. New York: Columbia University Press.
Quine, W.V.O., 1974. The roots of reference. New York: Columbia University Press.
Soja, N.N., 1987. Ontological constraints on 2-year-olds' induction of world meanings. Unpublished doctoral dissertation,
Massachusetts Institute of Technology, Cambridge, MA.
Soja, N.N., 1992. Inferences about the meanings of nouns: The relationship between perception and syntax. Cognitive
Development 2945.
Soja, N.N., S. Carey and E.S. Spelke, 1991. Ontological categories guide young children's inductions of word meaning: Object
terms and substance terms. Cognition 38, 179211.
Spelke, E.S., 1988. The origins of physical knowledge. In: L. Weiskranz (ed.), Thought without knowledge, 168184. Oxford,
UK: Oxford University Press.
Spelke, E.S., 1990. Principles of object perception. Cognitive Science 14, 2956.
Spelke, E.S., K. Breinlinger, J. Macomber and K. Jacobson, 1992. Origins of knowledge. Psychological Review 99, 605632.
Vygotsky, L.S., 1962. Thought and language. Cambridge, MA: MIT Press.
Wiggins, D., 1980. Sameness and substance. Cambridge, MA: Harvard University Press.
Wynn, K., 1992. Addition and subtraction by human infants. Nature 358, 749750.
Xu, F. and S. Carey, 1993. Infant metaphysics: The case of numerical identity. MIT Center for Cognitive Science Occasional
Paper.
Explanation, association,
and the acquisition of word meaning*
Frank C. Keil
Department of Psychology, Cornell University, 223 Uris Hall, Ithaca, NY 14853-7601, USA
A newly emerging view of concept structure, the concepts-in-theories view, suggests that adult concepts are intrinsic mixes of
two different sorts of relations: (a) those involving domain-general tabulations of frequencies and correlations and (b) those
involving domain-specific patterns of explanation. Empirical results from early cognitive development suggest that, by the time
first words are acquired, most concepts have this intrinsic mix even though changes in the nature of the mix can produce marked
developmental changes in apparent concepts, word meanings, and their use.
The concepts-in-theories view suggests that the sorts of constraints needed to model the representation and acquisition of
concepts cannot be based solely on either perceptual or grammatical bases; they must also arise from biases given by specific
patterns of explanation, patterns that may depart from standard notions of intuitive theories. These in turn suggest different
views of possible constraints on the acquisition of word meaning.
1. Introduction
Categorization is one of the most common and salient of human cognitive activities, and many of the categories so formed
appear to be shared among individuals, a pattern that is highlighted by the use of common words to refer to those categories. In
experimental psychology, the distinctive aspects of mental life that enable each categorization are usually thought of as
concepts. Shared mental structures are assumed to be constant across repeated categorizations of the same set of instances and
different for other categorizations. When I think about the category of dogs, a specific mental representation is assumed to be
responsible for that category, and roughly the same representation for a later categorization of dogs by myself or by another. The
* Much thanks to two anonymous reviewers and to Dan Simons for extensive comments on earlier drafts of this
manuscript. Much of the research reported on in this paper was supported by NIH grant number R01-HD23922.
phenomenon of categorization, coupled with the notion that repeated categorizations of the same sort are based on the same
mental representation, is the launching point for most psychological investigations of what concepts are.
Categorization, however, must not be equated with heuristics and other procedures that provide rough and ready identification of
instances above a modest confidence level. Although one might use hair length as a rough means for identifying human sex at a
certain confidence level, a careful and deliberative categorization of humans by sex would make little note of such an attribute.
Careful, considered judgements of membership may emphasize different aspects of mental structures than the fastest and loosest
means of identifying members of categories. In this paper, these different facets of categorization behaviors are considered
together in terms of their implications for models of the acquisition and representation of word meaning.
Psychological research on concepts in the last three decades started largely with the phenomenon of categorization, first with
teaching artificial categories, and later with the study of more naturally acquired ones. But psychological views of concepts have
undergone two dramatic shifts, initially as a consequence of uncovering more details on categorization behavior, but
increasingly also as a consequence of using other behavioral measures, measures that are starting to raise questions about what
concepts are in the first place. My purpose here is to explore the consequences of the most recent shift for word meanings.
I will argue for the following points:
(1) The currently emerging view, the concepts-in-theories view, must ultimately
characterize adult concepts as intrinsic mixes of two different sorts of relations: (a)
those involving domain-general tabulations of frequencies and correlations, such as
done by associative models and many connectionist systems, and (b) those
involving domain-specific patterns of explanation, usually of a causal nature.
(2)Empirical results from early cognitive development and first word meanings suggest
that by the time the first words are acquired, most if not all concepts have this
intrinsic mix even though changes in the nature of the mix can produce marked
developmental changes in apparent concepts, word meanings, and their use.
(3)The intrinsic mix and its early appearance suggests a different kind of ambiguity in
word meaning, wherein largely overlapping sets of instances have the same label
applied to them but have different meanings, allow different patterns of induction,
provide different categorizations of critical test cases. Unlike classical lexical
ambiguities there is no dramatic change in the class of referents even as there is a
discrete change in meaning.
(4) In consideration of these first three points, the sorts of constraints needed to model
the representation and acquisition of concepts cannot be based solely on either
perceptual or grammatical bases; they must also arise from biases given by specific
patterns of explanation that emphasize different causal roles for the same properties
in different domains. These conceptual constraints in turn suggest different views of
possible constraints on the acquisition of word meaning.
accounts and were united in focusing on such phenomena as instances having different goodness of membership in a category,
they began to diverge on what concepts actually were. Feature-based models tended to be like the earlier necessary and
sufficient feature view except that they added weights to each feature representing its relative importance in determining
membership. Importance was usually derived from how often a feature co-occurred with members of the category in the real
world. Correlations between features were also often included in these representations. One consequence of these weightings
was that a much larger number of features could be included in a representation and thresholds could be set up for excluding
features with sufficiently low weights. These thresholds could also be adjusted as a function of contexts (as in Lakoff's hedges
'technically speaking' and 'loosely thinking', which were thought to raise and lower the threshold respectively (Lackoff 1972).
Nonetheless, these views continued to compartmentalize concepts as distinct from the rest of knowledge.
Certain phenomena related to typicality of instances were considered central to understanding concepts and were shown to be
strongly interrelated. These included: intuitions that membership in a category is not all or none but graded such that some
instances are better members than others, intuitions that some entities were at the borderline and their membership in a category
was indeterminate, and reaction times to identify instances (it takes longer to verify membership of an atypical vs. typical
member). All of these responses seemed to converge on mental representations that encoded feature frequencies and
correlations.
This second view required more subtle refutations than the first. Concepts still seemed to have components that were different
from their merely probabilistic parts; and demonstrations appeared showing that people could easily think about typical and
atypical instances for completely well-defined categories, such as odd numbers. Such findings suggest that typicality and well-
definedness are not mutually exclusive for the same concept (Armstrong et al. 1983). Indeed the probabilistic phenomena
associated with concepts were sometimes relegated to the realm of 'identification procedures' for picking out instances of
concepts rather than to the concepts themselves.
Additional concerns came from demonstrations of illusory correlations in the social and clinical psychology literature. That is,
people will see nonexisting feature correlations when they follow from prior beliefs (Chapman and Chapman 1969). Simple
tabulations of feature frequencies and co-occurrences in the world are not enough. In developmental research as well, concepts
emerged in ways not explainable by mere shiftings of either feature
frequencies or correlational weights. Thus, one set of studies show how developmental shifts in what are regarded as legitimate
members of a kind, such as being a tiger or being an uncle, cannot be plausibly modeled by shifting probabilistic weights on
features or changing global criteria of what frequencies are to count as important (Keil 1989).
Other problems were seen in adult processing studies in which two concepts with the same highly typical feature nonetheless
placed greatly different emphasis on those features because one played a much more central role in patterns of causal
explanation. Thus, bananas and boomerangs were judged to be equally typically curved, but straight bananas were judged to be a
much better new member of the class of bananas than straight boomerangs in the class of boomerangs (Medin and Shoben
1988). This particular shape change seems to influence what it means to be a boomerang to a much greater degree than a banana
and does so in a way not predictable from feature frequencies or correlations.
Problems of this sort with probabilistic models have led to what is known as the concepts-in-theories view (Murphy and Medin
1985). A closer look at categorization, and increasingly at other phenomena thought to be related to concepts and their structure,
has put a central emphasis on notions of explanation, mechanism and cause. People do not simply note feature frequencies and
feature correlations; they have strong intuitions about which frequencies and correlations are reasonable ones to link together in
larger structures and which are not. Without these intuitions, people would make no progress in learning and talking about
common categories given the indefinitely large number of possible correlations and frequencies that can be tabulated from any
natural scene. These intuitions seem much like intuitive theories of how things in a domain work and why they have the
structure they do, hence the concepts-in-theories label.
One on the most frustrating drawbacks of the current concepts-in-theories view is the lack of consensus on what theories really
are and how they are mentally represented. Theories might be sets of propositions with causal connectives interspersed among
various logical connectives, all interconnected by deductive chains. Alternatively, theories might be more like mental models,
with image-like notions of mechanism. Or, theories might not need any notions of cause whatsoever, just a more general sense
of explanation; indeed some have decried the appearance of causal relations in any well-specified theory (Hempel 1965, Russell
1924). These controversies, however, should not blur a simple fact: some notion of explanatory coherence does have a powerful
and overarching influence on almost any behavioral measure
of concept structure. Above and beyond the details of representational format lies the common realization that our concepts for
dogs, diners, and dandelions emphasize some features and relations over others because of their fit within broader explanatory
patterns.
3. Hybrid vigor
The increased importance of explanatory systems in concept structure has suggested a complementary need for uninterpreted
probabilistic tabulations as part of concepts, resulting in a hybrid structure. For all but the most contrived cases, concepts may be
intrinsic mixes of both systems of explanation and atheoretic tabulations of properties. Any detailed account of the theories or
explanatory systems that embed concepts will end up confronting the same issue, regardless of their particular approach. Such
systems cannot fully interpret the raw data of experience, yet must rely on it.
The limitations of explanatory systems or theories in organizing properties and concept are best understood by a concrete
example. Consider someone who has had a limited exposure to birds; let us call him Icarus. Because of his limited experience,
Icarus's explanatory knowledge specific to birds is modest. In addition to knowing many general things about animals that apply
to birds as well, Icarus only knows explanatory reasons for properties corresponding to two sorts of bird category contrasts:
flightless vs. flying birds and predators vs. prey. Thus, Icarus understands why flightless birds tend to have much smaller wings
compared to body size than flying birds and why their legs tend to be bigger and more muscular; and he understands why
predators have many properties that help them catch prey and why prey have many features that help them avoid predators. In
this way explanatory knowledge helps order the conceptual space and helps influence intuitive inter-birds similarities,
inductions about new properties, and categorization. Moreover, this knowledge helps constrain which correlations are first noted
for novel instances.
There is, however, a problem with birds that occupy the same spot in this two-dimensional space but which are easily
distinguishable in terms of surface feature probabilities and correlations and which Icarus can in fact distinguish, such as a robin
vs. a wren. They occupy the same place in Icarus's explanatory matrix and yet Icarus stores additional information about them
so as to be able to distinguish them. This information may well be largely associative and guided by broad hunches laid down by
Icarus's theory of animals and
perhaps some general hunches about birds. Those hunches insure that even the associatively organized features will not include
whether a bird was first seen on a tree with an odd vs. even number of branches, or whether it was facing left or right, or
whether it was old or young. But, within those general constraints, many other regularities must indeed be stored purely in terms
of correlations and frequencies. In this way, associative structures in concepts can persevere. This perseverance is inevitable,
simply because we can never have explanations for all regularities that we observe.
It is tempting to try to exclude frequency and correlation information from the concept proper and say that they are only part of
heuristics and identification procedures having to do with concept use; but such a move fails to explain why our beliefs are often
directed specifically towards explaining those correlations; the correlations form an integral part of the system. Only in special
cases where we have conventionally defined concepts with accidentally correlated features, such as the concept odd number, do
the two aspects become more fully separable. There must be not only explanatory structures that narrow down an indefinitely
large number of features and feature relations to a manageable number but also mechanisms for storing information that is
outside the ken of current explanation. We could never acquire new explanations if there were not some way of storing
information external to current explanations.
These issues have often led to proposals of a core-periphery distinction in concept structure, wherein typicality information is
relegated to the periphery (see discussions in Armstrong et al. 1983 and Rey 1983). Although this distinction may imply that the
periphery is less important, or even optional, the relation between the two facets of concepts may be more symbiotic.
Explanations do not amount to much if they do not have anything to explain, and raw tabulations quickly overwhelm any
information gathering system if it does not partially order that information in terms of explanatory usefulness. A parallel case
can be made for scientific theories. Scientific theories are sometimes presented as a set of tightly connected laws linked through
deductive chains (e.g. Hempel and Oppenheim 1948), but in practice, they start with sets of regularities that they seek to explain,
regularities that are noticed and remembered often before the fledgling theory has any way of incorporating them.
The atheoretical component of concepts should therefore not be relegated to a 'periphery' that is not part of the concept proper.
There is no general way to justify such a move despite those cases of a few formal and conventional terms where a clean
distinction is possible. However, even as the
atheoretical component must be part of the concept itself, it remains distinct. There is not an association-to-theory continuum
consisting of increasingly strong and reliable correlations and frequencies. Such a continuum could not explain cases where high
frequencies and correlations are clearly understood as explanatorily irrelevant to membership in the category, whether it be hair
length and gender or economic value and type of chemical element. Although explanatory systems may always need some
reserves of correlational and frequency based information for interpretation, not all of the information in such reserves is always
found meaningful, even when broad hunches suggest it might be.
This debate has proved difficult to resolve. Recent studies would seem to favor the explanations-from-the-start alternative as
younger and younger children's judgments are shown to apparently be governed by such explanations. Yet, advocates for the
opposite view need merely claim that the acquisition of such explanatory systems occurs very early on in infancy (McClelland,
forthcoming), a reply that becomes increasingly difficult to test as ever younger infants are needed. But the tension between
these views has resulted in an important empirical discovery. There has been a rapid downward march of the ages at which
concepts are endowed with explanation, from five-year-olds postulated as purely associative beasts (Vygotsky 1962,
Werner 1948, Quine 1977), to preschoolers being in such a state, and now to infants before their first words. Indeed, a strong
case can be made that four-month-old infants categorize aspects of their physical and social worlds in ways only understandable
by attributing to them domain-specific systems of causal explanation (e.g. Spelke et al. 1993, Leslie, in press). The debate
continues about even earlier origins (e.g. Slater 1993), but there is now little doubt that the child's first words must be sensitive
to the same sorts of relations that have caused a dramatic shift in how adult concepts are viewed.
In my first attempts to understand how word meanings might change with development, I posited a 'characteristic-to-defining
shift' in the acquisition of word meaning (Keil and Batterman 1984). Following decades of vaguer claims about shifts from such
things as holistic to analytic meanings, accidental to essential features and the like, one more testable possibility seemed that
early word meanings are very much like the probabilistic views of concepts, but then shifted to be more 'defining' such that a
simple principle characterized meaning. The predominant method was to present scenarios in which an instance had either all
the characteristic features associated with a category but lacked critical defining ones or in which instances had all the critical
defining features but had many highly uncharacteristic ones as well. Being an 'uncle' was asked either of friendly, gift giving,
middle-aged men unrelated to one's family or of brothers of one's parents who were still children and distinctly unfriendly.
Being an 'island' was asked of peninsulas with beaches, palm trees and buried treasures or of cold, forbidding beachless places
surrounded by water on all sides. Children's judgments of instances did shift with age and on a domain by domain basis as
shown in figure 1 (Keil 1989); but it became increasingly clear that a true characteristic-to-defining shift was not and could not
be occurring.
Even the youngest children were never simply tabulating up all salient feature frequencies and correlations. No child thinks that
uncles must have glasses even if all the uncles they happen to have seen wear them. The features selected by even the earliest
word learners were always constrained by some notions of reasonableness for the kind of thing in question. A second problem
was that true definitions are rare occurrences in word meanings. Only a handful of words even approach clear simple definitions.
Putnam (1975), estimated a few hundred in English, and even these are under contention (Lakoff 1987). Thus, although the
developmental change is robust and easy to produce experimentally, it cannot really indicate a shift from one representational
format to another. Rather, it appears to be two related phenomena: (1) There are increasing elaborations of explanatory systems
Fig. 1. Illustration of an apparent characteristic-to-defining shift in the acquisition of word meaning. The y-axes represent the
extent to which each described entity was judged to be a legitimate member of the category. The x-axes represent three grade
school class levels and adults. The graphs show that there is a developmental shift in whether the +characteristic/ -defining
descriptions or the -characteristic/+defining ones are taken to indicate members of the category. In addition, the shifts occur at
different times on a domain by domain basis. (Adapted from Keil 1989.)
with age such that explanations play ever more extensive roles in constraining what features and correlations are noticed. Thus,
as explanation-based knowledge becomes more and more elaborated, default tabulations of information in associative terms are
less common. (2) There are shifts in understanding which explanatory system is most relevant to a class of phenomena. A child
might realize that 'uncle' is better understood in terms of set of biological relations that comprise kinship and not in terms of
social relations that govern
friendship; the social explanation is discovered to generate more serious mistakes and is abandoned (Keil 1989, 1992).
A similar change of perspective has occurred with terms for natural kinds. For example, young children assert that zebras who
are surgically transformed to look and act like horses are no longer zebras but truly horses. Older children declare the animal to
still be a zebra, suggesting a developmental shift from a phenomenal similarity space to a theoretically driven one. After many
follow-up studies conducted both by my research group and other groups, a different story emerges (Keil 1989). Younger
children may never be total phenomenalists helplessly buffeted about by correlations and frequencies. There invariably seems to
be an understanding of deeper relations that allows them to go beyond what would reasonably be considered phenomenal
similarity. Thus, even for three-year-olds, tigers can only be changed into lions if you use a mechanism of change that is
reasonably related to being a member of the two kinds. Younger children's beliefs about such mechanisms may differ, but they
have them nonetheless and will use them to override associative information.
At any age most natural concepts have a mix of associationistic tabulated information and systems for interpreting, explaining
and guiding the pickup of that information. There can be dramatic developmental change, but not from one kind of
representational system to a dramatically different one. Instead, the predominant change is in how extensively the child is able to
interpret and explain the raw data of association. As those explanations become more and more elaborated over time, the child
has to fall back less and less on the associative component to make judgements.
Following the Icarus example offered above, suppose a child's initial understanding of birds involved only notions of what
properties supported flight and thereby clusters of features corresponding to flightless vs. flying birds but did not understand
how to organize the features that clustered around predator vs. prey. Some more general principles about animals may bias the
child to associatively store frequencies and correlations concerning features such as feet shape, beak shape, eye location in head
and typical diet; but these properties may be stored largely in terms of associative relations until the predator/prey insight comes
to dramatically organize those features and shift similarities accordingly, thereby shifting induction and categorization. Note that
the developmental studies also support the essential hybrid structure of concepts. Even as these studies show that the youngest
children never rely solely on brute force tabulations of feature frequencies and correlations, they also show that something like
an associative component is
also present that shrinks with age in each domain in the face of more elaborated explanation. Although the characteristic-to-
defining change in development is not strictly correct, the changes that do occur support both aspects of the hybrid.
two ways of understanding the world. This possibility results in predictions concerning not just the nature of early concepts but
also how they become linked to the lexicon. If children are presented with partial information about a novel kind, they must
decide whether its known and future properties are to be interpreted in belief/desire terms or in physical/mechanical terms and
ensuing inductions about properties will vary accordingly. For example, it has been argued that biological kinds and their
properties are often interpreted in solely behavioral terms. The behavioral aspects of a property come to be seen as the basis for
it. Young children have been shown to attribute properties such as eating and having babies only to animals that are sufficiently
psychologically similar to humans so as to have the behavioral and belief/desire correlates of eating and having babies (e.g.
feeling hungry and wanting food and being nurturant towards offspring). Worms don't eat because they cannot have feelings of
hunger and desires for food in any way like humans (Carey 1985). These sorts of misattributions were found not only for
properties that have salient behaviors associated with them but also for unfamiliar ones described on the spot for the child (such
as having a spleen). Young children seem to assume that even novel properties are likely to be possessed by other animals to the
extent that they are behaviorally similar to the one on which they are taught. Only later in development would an appreciation of
biological kinds as such emerge and would new properties be based on induction over sets of functional biological relations.
It now seems that young children might not be so narrowly restricted in their earliest kinds of understanding, that the
explanation part of the hybrid has more diversity than just a mechanics and psychology. For example, a wide range of studies
now converge to suggest that preschoolers see living kinds as having their own causal patternings distinct from physical
mechanics and social behavior (Keil 1992, Inagaki and Hatano 1993, Hatano et al. 1993, Springer and Keil 1991, Gelman and
Gottfried 1993). They understand that both plants and animals tend to have functional/adaptive explanations associated with
them that are not seen elsewhere in the natural world and which are distinct from the sorts of functional explanations used with
artifacts. For example, preschoolers show a stronger tendency to explain the properties of living things in terms of the purposes
those properties serve for those things than they do for non-living things (Keil 1992). Adaptive, or design, explanations have
become recognized as a distinct form of explanation linked to the biological sciences, and it now appears that the same form of
explanation is available and salient to young children (Woodfield 1976, Wright 1976).
One way to show this distinctive pattern of expectations concerning living things is to ask preschoolers which of two
explanations they prefer as appropriate for explaining the properties of living vs. non-living things. For example, the children
might be told the following: 'Two people are talking about why plants are green. This person says it is because it is better for
the plants to be green and it helps there to be more plants. This person says it is because there are little tiny parts in plants that
when mixed together give them a green color. Which reason is a better one?'
The same question would then be asked about emeralds. Although both the reductionist and functional explanations are
appropriate for the plants, children preferred the functional sorts of explanations for living things while at the same time
preferring the reductionist explanation for non-living things (Keil 1992).
The contrast with artifacts is more subtle, but involves an understanding that most of the properties of non-domesticated living
kinds are ultimately self-serving whereas most of the properties of artifacts are other-serving. This seemingly abstract notion
appears to be well within the grasp of preschoolers when they are asked to choose between explanations for functional
properties that are for the good of the object vs. the good of another entity; and they show similar understandings for plants as
well as animals, ruling out any need to see the living things as capable of mental goals or desires.
The full set of such early modes of construal may be quite small, possibly as few as half a dozen. The earliest modes of
construal seem to originate from notions of broad domains of phenomena, such as physical mechanics, folk psychology, and
functional/teleological explanations and not local areas of expertise such as dinosaurs or chess. Other possible basic modes
include: moral reasoning, notions of ownership and transfer of ownership, and social power relations. The existence of other
domains besides mechanics, psychology, and design/teleology is speculative at this point, but remains plausible given the likely
universal presence of such domains and the possibility that they have distinctive forms of causal explanation. Alternatively, the
basic three might have a special status making them more fundamental than all other schemas for making sense of the world. A
major challenge for future work is to develop criteria that demonstrate how these basic modes of construal are distinct from the
thousands of local explanations and mental models that people come to use in everyday tasks. In addition to different roles in
processing, the basic modes may be earlier emerging and more invariant over the course of development.
The early presence of multiple forms of explanation may lead to a distinct kind of ambiguity for lexical items. The ambiguity
arises from cases where heavily overlapping sets of individuals are meaningfully interpreted by different forms of explanation.
For example, a class of people can often be understood in biological, social, or physical terms, just as properties of computers
can be understood in mechanical, functional, or psychological terms. A computer might be described as failing a task because
two routines are seen as 'colliding' (mechanical), because the programs are not designed for the task (functional), or because the
program cannot understand or remember the correct sequences (psychological). Clearly distinct meanings can be invoked even
though the real world referents may often be heavily overlapping. These examples appear to be different from cases of
vagueness or more standard lexical ambiguities where the referents are usually non-overlapping, as with the two meanings of
such terms as 'bat', 'bank', and 'tank'. With standard lexical ambiguities, one gets the immediate impression of a new set of
referents as one switches meanings, whereas with referentially overlapping ambiguities, the meaning seems to switch equally
strongly but many of the same referents remain.
Referentially overlapping ambiguities suggest an alternative account of many developmental changes in word meanings.
Children might not be reworking the feature sets that organize members of a category and discarding the old (a pattern that has
the counterintuitive consequence of suggesting that adults should have difficulty accessing earlier states of their own
vocabularies) but rather realizing the relevance of a different, but already present, system of explanation for interpreting roughly
the same properties.
In the case of kinship terms, a preschooler might well have notions that there are ways of understanding things in social as well
as in biological/functional ways, but may have assumed early on that instances of kinship terms were to be understood
social/behaviorally. A change in word meaning, to more classic bloodline relations might reflect not so much a new learning of
biological relations, but an insight that a biological/functional way of explaining properties and judging similarities is the most
common one in adult usages of those terms. This speculation gains support from recent studies showing that, although younger
children might have different default assumptions about properties for living things, modest contextual cues can result in
biologically based inductions. For example, when children judge that dogs and humans but not birds, snakes or insects eat, it
appears as if they might have encoded 'eats' as referring to social and belief/desire aspects of
eating. Yet the same child will also judge that eating applies to all and only animals if they are primed to think about terms like
eating in biological/functional terms (Vera and Keil 1988). Those studies are relevant because they illustrate how a property can
alternately be embedded in two very different systems of explanation. Comparable switches in embedding for kinship terms
could cause the sorts of referentially overlapping ambiguities proposed here.
Apparent developmental changes in word meaning may therefore reflect an awareness that a different set of explanatory
relations can be superimposed on roughly the same set of instances and properties so as to afford new insights. The different
explanatory system may have been present for some time, but is now decided relevant to a class of lexical items in a certain
domain.
Shifts between modes of explanation occur in adults as well. For example, most of the time we understand the personality traits
and behaviors of others in terms of belief/desire patterns of causation. John is afraid of snakes because he had some bad
experiences with them that led him to believe that snakes are dangerous to him, even if those beliefs are false about most snakes.
But we can understand the same traits in biological/functional terms as well. One would then explain John's fear of snakes not in
terms of his beliefs and desires, but in terms of the ecological niches occupied by humans and their evolutionary predecessors
and of a need to develop an adaptation that causes a built-in fear of snakes. Although explanations of behaviors in
biological/functional terms as opposed to belief/desire ones may not be the first option taken, once adopted, they lead to different
inferences and understandings.
If shifting modes of explanation over roughly the same referents provides a sense of ambiguity, there should be other signs of
that ambiguity as well. One example might be a tendency not to switch between meanings when a word is shared in a clause. In
the sentence 'The pilot banked at the intersection of Pine and Locust and the lawyer at Oak and Walnut', it is bizarre to have
'bank' being used first as an airplane maneuver and second as fiscal action. Similarly, in the sentence 'The rock hit the house and
John did too', it is awkward to have 'hit' be in the mechanical/billiard-ball sense for the rock and in the intentional sense for
John. One is more inclined to see John hitting the house in an inanimate sense as a hurtling body not intending the impact. As a
second example, consider the sentence 'The daffodil needed water and John did too', where 'need' has the sense of a
physiological need in both cases, not as physiological need and explicit desire or wanting in the other. If
this analysis is correct, a child hearing an unfamiliar term can safely assume that when a word is shared in usage, the mode of
explanation is likely to remain constant.
The awkwardness of mixing different senses seems less marked for other cases that are not true ambiguities. For example, with
metaphor, one can say 'Phil embraced her body, and Bill her ideas', where 'embrace' clearly has different senses, but little
awkwardness results. Similarly, with syncategorematic terms like 'good', different senses can be mixed in the same clause, as in
'The meal was good and the fire was too'. If these sorts of intuitions prove to be more general, they suggest that shifts in
explanatory frameworks are not to be confused with a host of other possible context effects that cause more subtle shifts in
meaning (Barsalou 1987, Lakoff 1987). There are dozens of different nuances of the word 'instrument' depending on context, as
there are of the verb 'cut'; but these do not seem to reflect deep shifts in explanatory frameworks.
aspects of shape in making inferences about function. Moreover, shape itself might be more profitably subdivided into overall
shape vs. local parts and relative spatial arrangements, with one aspect more important for artifacts and the other for living
kinds. There might not be any general shape bias, but rather an understanding of the different degrees of importance of certain
aspects of shape in different explanatory systems. Shape, or more properly its subtypes, might then be seen as one of several
sorts of properties whose centrality can vary as a function of the kind of entities considered (see also Bloom, this volume).
This alternative would gain support if the bias varied in strength as a consequence of the type of explanatory system invoked by
an object or category. Some evidence is already present in cases where adding eyes to inanimate objects changes the strength of
the shape bias at different ages in different ways (e.g. eyes make an uninterpretable object for a younger child become a living
kind for which shape matters a great deal whereas more complex interactions with texture and color are invoked in older
children (Landau, this volume)).
We have begun exploring this issue with both adults and children. With adults, the experimental paradigm involves describing a
change in types of properties of typical members of familiar categories and asking about the effects of such changes on the
categorical and lexical status of those members. Four changes in property types were used: color, size, shape (as specified by a
tripling of width and a one-third reduction of height), and surface pattern. These changes were described for randomly selected
members of four categories: animals, plants, non-living natural kinds (e.g. rocks), and artifacts. Subjects were given instances of
a member of a category and asked about the effects of changes of different feature types. For example, subjects might be told
about some artifacts that are exactly like ordinary chairs in every way except that all members of this group are normally
uniformly shaded bright pink. The subjects would then be asked if it would still be called a chair. A different item might ask
about rocks that were just like gold except that they are normally bright pink. One group of subjects were asked whether the
object with the altered feature should be assigned the standard name, one group was asked how well the altered object would
function in the original role, and a third group was asked how unusual a member of the original category the altered member
would be. With the exception of functional intuitions about non-living natural kinds, adults have reliable intuitions about such
questions and show clear patterns of responses. Functional questions were asked of things like kinds or rocks to keep the design
consistent across
all categories and to check that in fact they would be less informative for such kinds.
The importance of property types varies strongly as a function of the kind of thing queried. Overall shape and size changes had
a profound impact on functional roles of artifacts as well as on their lexical labels. By contrast changes in color and surface
patternings were irrelevant for comparable ratings of artifacts. For non-living natural kinds, such as minerals, the effects were
roughly reversed, with color and patterning being seen as essential to the judgements and shape and size as largely irrelevant.
For both plants and animals the results were consistently similar, suggesting a set of expectations about the category of living
things in general. Shape and color were generally seen as most relevant across conditions and size as least relevant, a pattern that
was clearly distinct from both artifacts and non-living natural kinds. An example of these different patterns of judgement for
naming is shown in figure 2.
Above and beyond the differences just described for the data set as a whole, judgements also varied as a function of the kind of
question asked. Questions about what the altered entity should be called had different profiles across the three category types
than did questions about its function or unusualness. This finding illustrates that judgements about preservation of function and
typicality are not the same as judgments of what a thing should be called. Such differences might seem obvious, but they are
often conflated in experimental research where results collected with one type of judgement are compared to those collected
with another. Moreover, as seen shortly, the precise nature of their interactions can be informative about the roles of specific
properties in particular categories.
One final measurement taken in these studies provided an important finding. After all initial ratings were completed, subjects
were asked to consider ordinary exemplars of each of the judged categories and rate how much those items normally vary with
respect to overall shape, size color, of patterning. Thus, a subject might be asked to judge how much ordinary chairs, or lumps
of gold, vary in color. These questions were designed to address an important issue in the literature: how much the natural
variability of features in a category could cause the discounting or emphasis of the importance of features when queried in one
of three ways. A 3-inch disk is more likely to be considered a pizza than a quarter, even though it is more similar to the quarter
in physical size, because the minimal variability of currency disallows even modest size changes whereas the larger variability
of size for pizzas allows for an albeit implausible 3-inch pizza (Rips 1989).
Fig. 2. Results of first study conducted by Gruberth and Keil showing adult intuitions of whether a new name should be given to
a class of entities that differ from normal members of that class by having a property change in either shape, size, pattern or
color. The y-axis represents the mean ratings of the extent to which a kind or property change was judged to cause a need for a
new name (7 = definitely a new name, 0 = not at all).
Perhaps more generally, tabulations of variability are the basis for judgments of the effects of changing properties: the lower the
variability, the more the judged impact of the change.
In these studies, variability alone did not predict the significant effects seen for any of the three judgement types. For example,
if a subject said that color typically varied more for chairs than for whales, those judgements did not predict other judgements
about category membership when the color was changed for novel instances. However, when the data analysis was performed
in a manner that looked for patterns remaining after all contributions by variability were numerically removed, there were
differences across judgement types. For judgements of function, high variability of a feature had little influence predicting the
effects of changing that feature. Consequently, when variability was factored out, the differences in property effects across
categories were still large and robust. Even when an item's color or shape varies greatly in the real world, variability is largely
ignored in making judgments about functional influences; instead the deviant instance itself is examined for specific functional
consequences. By contrast, judgements about the appropriate name were influenced by variability not by allowing predictions of
effects, but by eliminating differences in property effects across categories. Thus, although neither variability judgements alone,
nor property-type combinations alone, could predict judgement of naming, their joint action produced reliable effects. In a
related pattern, variability mattered more for non-living natural kinds, for which function is irrelevant, than for artifacts. In sum,
in no cases are intuitions about property variability for instances adequate to predict how changes in those properties influence
category-related judgments about those instances; but in the case of naming judgements, property variability is involved in
interaction with other factors.
The assignment of labels may therefore rely more on the heterogeneous aspects of concepts than other category-related
judgements since both variability information for instances and general views about differences in property importance across
broad category types are involved. Variability may be information stored largely through associative means, whereas the
differential effects of property types as a function of category seem linked to the natural explanatory systems invoked by such
kinds as living things, artifacts, and sentient beings. Any number of simple frequency and correlation tabulation systems can
note how often features vary for instances in a category and access this information through network in which activation
strength is a function of those tabulations, hence the notion that associative means of representation would be adequate. By
contrast, judgements about which properties are most central to a category and cause the most disruption when negated seem to
arise from notions about the causal structures underlying the property configurations of broad categories and how particular
properties are much more closely connected to those causal structures than others. Such judgements cannot be captured by
storing information about feature frequencies across instances as they rely on beliefs about mechanism and explanation. Thus,
arguments made in earlier sections of this paper about the importance of hybrid structure may emerge with a vengeance in the
case of mapping
concepts into the lexicon. (For other arguments about the special influences of lexical items see Markman 1989 and Woodward
1992.)
It is one thing to demonstrate that adults seem to be strongly influenced by abstract explanatory belief systems in how they
interpret the importance of properties for different categories of things. It is quite another to demonstrate that such systems
might be implicated early in life when the lexicon is growing most rapidly. We have begun a series of studies addressing this
issue as well.
These studies use triads of pictures in which an exemplar with a novel name is shown followed by an object that varies in the
shapes of the most salient parts but which preserves color and surface patterning and by another object that preserves all shapes
but varies in colors and surface patterning. In each case, novel items and names are clearly depicted as being from one of four
categories: artifact, plant, animal, or non-living natural kind. The original object is referred to in a manner that is indeterminate
between count and mass (e.g. 'this is an animal that is in the hyrax group' and 'this is a rock that is of the malachite group' or
'this is my hyrax it is a kind of animal' and 'This is my malachite it is a kind of rock').
In one preliminary study 16 three- and four-year-olds ignored shape and emphasized color when deciding which of two novel
non-living natural kinds shared the same label as an originally labeled one. For the living kinds and the artifacts, shape
differences as depicted by different parts are vastly more important. Color and texture also play a role for living things but not at
all for artifacts where, with few exceptions, they may apparently be understood as purely conventional. This insight may get
stronger as the children get older and enter elementary school, a pattern that is now under active study. One aspect of the results
is shown in figure 3.
A second study now under way suggests that shape changes not implicating changes in salient parts are seen as less important
for living kinds than artifacts because of their generally smaller roles in changing functional properties for living things.
Different parts compel a sense of category change for living things. We have constructed stimuli sets where a class of novel
things is distorted two ways: (1) a computer graphics transformation displaces the image pixels such that the object becomes
twisted around its central point and thus the overall shape is dramatically changed whereas local parts stay pretty much the same
and in roughly the same relations to each other; or (2) the overall shape is not changed, but small properties are substituted for
other ones, such as one kind of beak for another on a bird, or one kind of a dial for another on a machine. When preschoolers
are shown these images
the likely causal power of different properties in organizing categories at a high level of abstraction (e.g. living thing and
artifact). However, expectations are less influential for individual items, arguing against some simple storage of concrete
exemplars and comparison back to those instances. Perhaps these effect do not operate at finer levels of categorization. Strong
effects of different causal centrality for properties may exist at levels such as living kinds, substances and artifacts, but not at
much finer levels such as birds vs. fish. That is, it may be difficult to find properties applicable to both fish and birds which,
when counterfactualized, distinguish among them by having a strong effect in one case and not in another. Note that the
properties cannot be one like 'breathes underwater' since it is already counterfactual for birds. Of course, many properties are
causally central to both, but the kinds of properties that are causally most potent may be pretty much the same across all animals
and reflect general notions about the sorts of causal patterns distinctive to living things.
7. Conclusions
I have argued for the following themes:
Current views of concepts require that they be acquired and represented in terms of a system that has two distinct components:
one that stores frequency and correlation information through domain-general procedures, and one that carries domain-specific
beliefs about causal relations that help explain property relations. This hybrid structure is well supported by a wide range of
experimental studies as well as more principled considerations.
The beliefs about causal patternings and property types are linked to broad domains of explanation, such as sentient beings,
living kinds and artifacts. Those domains seem different from the many smaller domains of expertise that all of us come to
acquire as well. The evidence here is still accumulating but indications come from distinct sources, such as beliefs about how
property changes influence category integrity and how ways of understanding classes of things seem to shift in development and
across different contexts.
This hybrid mixture appears to be present throughout much of cognitive development, at least by the time first words are
acquired and quite possibly much earlier. The evidence here arises from the failure of domain-general tabulation models on their
own to explain young children's performance on a wide variety of tasks involving concepts. There is less evidence for just how
many of these basic domains might exist in the prelinguistic child. I have
argued that there are good reasons to suspect at least three and speculated that as many as half a dozen more seem plausible.
These patterns concerning the acquisition and representation of concepts matter directly for models of lexical representation and
acquisition. Both the strategies used to acquire word meanings and their underlying representations themselves are deeply
affected by such views about concepts. The hybrid structure suggests that word meanings are linked both to frequency-based
information and to causal/explanatory beliefs and that different sorts of lexical tasks may highlight one or the other of these two
facets. Empirical studies of the lexicon must therefore keep in mind these two facets and their interactions with specific tasks.
Because the hybrid structure appears to be already present in the prelinguistic child, both facets influence the child's first
attempts to learn the meanings of words, hence the claim that there are explanation-based constraints on the acquisition of word
meaning. If early representations were only based on tabulations of feature frequencies, then strategies that key on some of these
features, such as shape, might be the sole means of narrowing down the hypothesis space of what novel words refer to. Instead,
even the first guesses about word meanings may be biased not only by bottom-up counts of feature frequencies and correlations,
but also by top-down abstract expectations about what sorts of features will be causally central to a domain.
Apparent developmental shifts in the internal representations of word meanings rarely, if ever, are signs of a change in basic
underlying representational format. Instead they may represent increasing elaboration of explanatory beliefs that are able to
interpret a larger and larger percent of the tabulated information and shifts in which explanatory system is deemed most relevant
to understanding members of a category. Younger children may not be showing an inability to represent word meanings in
certain formats as much as they are exercising different default options and having less detail in some of their explanatory
systems.
The hybrid structure and its early appearance highlights a different kind of ambiguity in word meaning, wherein largely
overlapping sets of instances may have the same label applied to them but have different meanings, with different inductions,
different categorizations of critical test cases, and a clear sense of ambiguity as opposed to vagueness or synonymy. It appears
that the shifts of meaning here are not caused by realizing that a label points to a wholly different part of one's conceptual
system with different instances and feature clusters, but rather by realizing that the label may point to a class of things that can
be interpreted in terms of more than one explanatory schema.
Seeing concepts as essential hybrids of explanation and association has led to the claim that early word learners have
explanatory biases that guide first guesses about the sorts of properties that are most strongly causally linked to the central
structure of a category. These biases appear to be high level and abstract yet also powerful in how they guide choices of relevant
properties. The acquisition of word meaning is thus suggested not to rise out of higher and higher order tabulations of
regularities among perceptual primitives, but from the start to be strongly influenced by overarching distinct patterns of
explanation that are associated with such domains as biological kinds, artifacts, and sentient beings. In addition, the possibility
of a handful of such patterns early on allows for much of semantic development to involve switching among these patterns and
not acquisition of totally new forms of explanation.
These discussions have carefully skirted questions of what explanations or theories are really like; but one possibility is that they
are not much at all like theories in the classic nomological-deductive sense. Increasingly in the philosophy of science, it is
common to attribute to the working scientist not a set of propositions linked by rules of inference, but something less structured
involving notions of causal powers (Salmon 1989). Not only children, but also adults, often have explanations in the form of
notions about the sorts of properties likely to be causally most central to a kind, with only the vaguest of hunches about how
they are linked together. It may therefore be that the earliest notions of what properties are central to a category are not so much
consequences of a deeper theory, but are much of the theory itself.
Lexical items cannot be simply equated with concepts; but neither can they ignore the details of concept structure. In this paper I
have argued that a current view of concepts and concept acquisition has important consequences for how we might think of
word meanings and how they are acquired. The notion of hybrid structure, its early origins, the presence of multiple early modes
of construal, and subsequent constraints suggest a very different model of how lexical items become mapped onto concepts and
change with development than would be suggested by alternative models of concepts. Even if one is solely interested in the
lexicon, one cannot avoid making some commitments about concepts themselves.
References
Armstrong, S., L. Gleitman and H. Gleitman, 1983. What some concepts might not be. Cognition 13, 263308.
Baron-Cohen, S., A.M. Leslie and U. Frith, 1985. Does the autistic child have a 'theory of mind'? Cognition 21, 3746.
Barsalou, L.W., 1987. The instability of graded structure: Implications for the nature of concepts. In: U. Neisser (ed.) Concepts
and conceptual development: Ecological and intellectual factors in categorization, 101140. Cambridge: Cambridge University
Press.
Bloom, P., 1994. Possible names: The role of syntax-semantics mappings in the acquisition of nominals. Lingua 92, 297329
(this volume).
Carey, S., 1985. Conceptual change in childhood. Cambridge, MA: MIT Press.
Chapman, L.J. and J.P. Chapman, 1969. Illusory correlation as an obstacle to the use of valid psychodiagnostic signs. Journal of
Abnormal Psychology 74, 272280.
Cheng, P.W. and Y. Lien, in press. The role of coherence in differentiating genuine from spurious causes. In: D. Sperber, A.
Premack (eds.), Causal understandings across culture and development. Cambridge: Cambridge University Press.
Chierchia, G. and S. McConnell-Ginet, 1990. Meaning and grammar: An introduction to semantics. Cambridge, MA: MIT Press.
Gelman, S. and G. Gottfried, 1993. The child's theory of living things. Presentation at the 1993 Meeting of the Society for
Research in Child Development, New Orleans.
Hatano, G., R.S. Siegler, K. Inagaki, R. Stavy and N. Wax, 1993. The development of biological knowledge: A multi-national
study. Cognitive Development 8, 4762.
Hempel, C.G., 1965. Aspects of scientific explanation and other essays in the philosophy of science. New York: Free Press.
Hempel, C.G. and P. Oppenheim, 1948. Studies in the logic of explanation. Philosophy of Science 15, 135175.
Inagaki, K. and G. Hatano, 1993. Young children's understanding of the mindbody distinction. Child Development (in press).
Keil, F.C., 1986. The acquisition of natural kind and artifact terms. In: W. Demopoulos, A. Marras (eds.), Language learning
and concept acquisition, 133153. Norwood, NJ: Ablex.
Keil, F.C., 1989. Concepts, kinds and cognitive development. Cambridge, MA: Bradford Books.
Keil, F.C., 1992. The origins of an autonomous biology. In: M. A. Gunnar, M. Maratsos (eds.), The Minnesota Symposium on
Child Psychology, Vol. 25: 103137. Hillsdale, NJ: Erlbaum.
Keil, F.C. and N. Batterman, 1984. A characteristic-to-defining shift in the development of word meaning. Journal of Verbal
Learning and Verbal Behavior, 23, 221236.
Lakoff, G., 1972. Hedges: A study in meaning criteria and the logic of fuzzy concepts. Journal of Philosophical Logic 2,
458508.
Lakoff, G., 1987. Women, fire, and dangerous things: What categories reveal about the mind. Chicago, IL: University of
Chicago Press.
Landau, B., 1994. Where's what and what's where: The language of objects in space. Lingua 92, 259296 (this volume).
Leech, G., 1974. Semantics. Harmondsworth, England: Penguin.
Leslie, A., in press. TOBY and TOM. In: D. Sperber, A. Premack (eds.), Causal understandings across culture and development.
Cambridge: Cambridge University Press.
Markman, E., 1989. Categorization and naming in children: Problems of induction. Cambridge, MA: Bradford Books/MIT Press.
McLelland, J., forthcoming. Parallel distributed processing: Implications for cognitive development.
Medin, D.L. and E.J. Shoben, 1988. Context and structure in conceptual combination. Cognitive Psychology 20, 158190.
Murphy, G.L. and D. Medin, 1985. The role of theories in conceptual coherence. Psychological Review 92, 289316.
Putnam, H., 1975. The meaning of meaning. In: H. Putnam (ed.), Mind, language and reality, 215271. London: Cambridge
University Press.
Quine, W.V.O., 1977. Natural kinds. In: S.P. Schwartz (ed.), Naming, necessity, and natural kinds, 155175. Ithaca, NY: Cornell
University Press.
Rey, G., 1983. Concepts and stereotypes. Cognition 15, 237262.
Rips, L.J., 1989. Similarity, typicality, and categorization. In: S. Vosnaidu, A. Ortony (eds.), Similarity and analogical
reasoning, 2159. New York: Cambridge University Press.
Rosch, E. and C.B. Mervis, 1975. Family resemblances: Studies in the internal structure of categories. Cognitive Psychology 7,
573605.
Russell, B., 1924. Logical atomism. In: J.H. Muirhead (ed.), Contemporary British philosophy, 1st series, 357383. London:
Allen and Unwin.
Salmon, W.C., 1989. Four decades of scientific explanation. Minneapolis, MN: University of Minnesota Press.
Slater, A., 1993. Comments on symposium on infant cognition. Presentation at the 1993 Meeting of the Society for Research in
Child Development, New Orleans.
Smith, E.E. and Medin, D.L., 1981. Categorization and concepts. Cambridge, MA: Harvard University Press.
Spelke, E., A. Woodward and A. Phillips, 1993. Kind of objects and causal relations in infancy (tentative title). In: D. Sperber,
A. Premack (eds.), Causal understandings across culture and development. Cambridge: Cambridge University Press.
Springer, K. and F.C. Keil, 1991. Early differentiation of causal mechanisms appropriate to biological and nonbiological kinds.
Child Development 62, 767781.
Vera, A. and F.C. Keil, 1988. The development of inductions about biological kinds. Presentation at the Annual Meeting of the
Psychonomic Society, Chicago, IL.
Vygotsky, L.S., 1962. Thought and language (E. Hartmann, G. Vakar, Trans.). Cambridge, MA: MIT press.
Werner, H., 1948. Comparative psychology of mental development (2nd edition). New York: International Universities Press.
Woodfield, A., 1976. Teleology. Cambridge: Cambridge University Press.
Woodward, A., 1992. The role of the whole object assumption in early word learning. Ph.D. dissertation, Stanford University.
Wright, L., 1976. Teleological explanations: An etiological analysis of goals and functions. Berkeley, CA: University of
California Press.
Section 4
Categories, words, and language
1. Introduction
One impressive accomplishment of young children is the degree to which they acquire the vocabulary of their native language.
Not only is the rate and extent of young children's word learning impressive, it is puzzling as well. Given the limitations on
children's hypothesis testing, reasoning, memory, and other information processing abilities, their facility for building a lexicon
is even more striking. This is especially so in light of the well-known inductive problem that word learning poses (Quine 1960).
Faced with an infinite set of
* This work was supported in part by NSF Grant #BNS-9109236. I am very grateful to Andrea Backscheider for allowing
me to publish the results of our study here. I thank Barbara Landau, Lila Gleitman, and two anonymous reviewers for
their helpful comments.
possibilities about what a novel word might mean, the speed with which young children acquire word meanings requires
explanation.
Given the importance of word learning for language acquisition, children might be expected to recruit whatever sources of
information they can to narrow down a word's meaning. One powerful one, for children old enough to benefit from it, is
grammatical form class. As one example, Fisher et al. (this volume) provides dramatic demonstrations of the way in which
syntactic information helps constrain the meaning of a novel verb (see also Fisher et al. 1991, Naigles et al. 1992). Being able to
infer aspects of the communicative intent of a speaker should provide another source of information about the referent of a
novel term. A conscientious parent or other tutor might further help by arranging the environment in ways to exaggerate the
salience of the aspect of the situation being labeled in the hope that their child will find it salient too. Another potentially
powerful source of information young children can use to figure out the meaning of a new word comes from word-learning
constraints. Constraints on word meaning may be particularly critical for babies who have not yet learned enough syntax to rely
on grammatical form class to limit their hypotheses as to a word's meaning. I will argue that by the time they are ready to
acquire vocabulary, children place constraints on possible word meanings, thereby greatly reducing the hypothesis space that
needs to be considered. Children would not need to formulate a long list of potential meanings and painstakingly assess the
evidence in support of each. Rather, they could quickly zoom in on some hypotheses that they are predisposed to prefer.
In this paper, I will consider evidence in support of three word-learning constraints: the whole-object, taxonomic, and mutual
exclusivity assumptions. I will argue that the most recent evidence supports the claim that children's early word learning is
guided by such constraints. For some different formulations of possible constraints see Bloom (this volume); Clark 1991;
Golinkoff et al. 1992a).
To claim that children's early word learning is a constrained form of learning is not to claim that no other source of information
matters. On the contrary, the recent evidence reveals some of the complex and subtle ways that word learning biases interact
with other sources of information. There is evidence, for example, that word-learning constraints can affect each other. When
two or more constraints converge on the same hypothesis the learning will be efficient compared to cases where the constraints
conflict and one must override another. Similarly word-learning constraints interact with grammatical form class. In some cases
both these sources of information lead
to the same conclusion about a novel word, but in other cases they conflict. Analogous points can be made about
communicative or pragmatic sources of information and the word learning assumptions. Finally, there is emerging evidence
about the ways that the processing demands of a given word learning situation can affect the use of constraints. The evidence I
will review suggests that beginning word-learners rely heavily on word-learning assumptions but that these constraints are
modulated by other constraints, by nonlinguistic context, by children's problem solving and other information processing
abilities, and by the pragmatics and syntax of the language children hear.
early 'words' from genuine words that are referential; and it has been termed 'performative' (Snyder et al. 1981) to suggest that
the early words are responses used to perform some instrumental act rather than functioning as words that make reference.
Another characteristic of this first phase of word learning is that words are added to the productive lexicon very slowly. A slow
accumulation of new words might proceed by some relatively unconstrained associative mechanism. Dozens or even hundreds of
trials may be needed for a baby to learn, for example, to say 'bye-bye' on command. This is in marked contrast to the very fast
acquisition seen around 18 months.
This developmental shift at the time of the naming explosion suggests an alternative formulation of the hypothesis that word-
learning constraints are necessary for word learning. The more precise claim is that some constrained form of learning is
necessary to account for the rapid acquisition of words seen at the time of the naming explosion. Babies could not be acquiring
words at the rate of 30 or so a week if they were open-mindedly considering all possible hypotheses each time they encountered
a novel word. Thus the main hypothesis to be evaluated here is that word-learning constraints are available to babies by the time
they enter the naming explosion at roughly 18 months of age.
2.2. Word-learning constraints as default assumptions
Before summarizing the evidence that the word learning of 1½-year-olds is a constrained form of learning, I would like to
clarify the claim. Some confusion has been generated by the terminology where 'constraints' is interpreted differently by
different disciplines, especially linguistics versus ethology. I and other investigators have borrowed the terminology from the
ethology of learning (see Woodward and Markman 1991 for a discussion of this perspective). Within this discipline 'constraints'
on learning are formulated as default assumptions probabilistic biases that provide good first guesses as to a problem an
organism must solve (Marler and Terrace 1984, Rozin and Schull 1988, Shettleworth 1984). To take one example from Gould
and Marler (1984), foraging bees must learn the color of the flowers that yield a given nectar. Purple is the default value bees
hold. It is easier for them to learn about purple flowers than about flowers of any other color. Note that it is by no means
impossible for them to learn sources that are other colors. Violations of the default assumption are common. Nevertheless,
purple serves as a first guess, and it takes more trials or more evidence to learn values that differ from the default assumption.
Turning to word learning, the hypothesis
is that children are able to make progress in word learning by use of such default assumptions (see Merriman and Bowman 1989
and Woodward and Markman 1991). These constraints on learning provide good first guesses about the likely meanings of
terms. Words that conform to the constraints should therefore be easier to learn than ones that violate them. Some of the
controversy surrounding such claims stems from confusion over whether constraints must be absolute or whether they function
as default assumptions (Gathercole 1989, Nelson 1988, Tomasello 1992).
2.3. Are word-learning constraints domain-specific?
To clarify another potential source of confusion, I'd like to briefly consider the question of whether the word learning constraints
I will consider here the whole-object, taxonomic and mutual exclusivity assumptions are specific to language or available to
other domains. Rather than discuss this here, let me refer you to Markman (1992), where this issue is considered in detail along
with speculations about the origins of the constraints. My conclusion is that there is no reason at all to believe that these three
constraints are limited only to word learning and good reason to think they are available to some other domains. This is not to
say that they are domain-general in that important domains are organized by very different principles. Given that analogous
constraints are available to some fundamental domains, I suggested that word-learning constraints have been recruited from
existing abilities that children possess rather than evolving as special purpose mechanisms only for word learning (Markman
1992). Such speculations aside, these assumptions function as word-learning constraints, not in the sense that they are domain-
specific, but in the sense that they help solve the inductive problem that word learning poses.
could refer to objects of the same kind, but it could in principle refer to the object and its spatial location, or the object and its
owner, or the object and a salient part, etc. This is not, moreover, just a theoretical possibility. Many studies of classification
have found that not only do children attend to such 'thematic' relations between objects, they often find them more salient or
interesting than taxonomic relations per se (see Markman 1989). When shown a car, for example, and told to find another one,
children might pick a man because he drives a car rather than a truck, another vehicle. Given such findings from children's
classification, the question arises as to how children avoid such thematic interpretations of a novel word. Hutchinson and I
proposed that children expect novel terms to refer to objects of like kind (Markman and Hutchinson 1984). Hearing a novel
label, then, should cause children to seek out taxonomic relations even in cases where thematic relations would otherwise be
more salient. There is now a substantial body of evidence that children from about 2½ or 3 expect labels to refer to things of like
kind (Baldwin 1989, in press; Hutchinson 1984, Markman and Hutchinson 1984, Waxman and Gelman 1986, Waxman and
Kosowski 1990). I turn now to consider whether the taxonomic assumption is available to babies by the time of the naming
explosion.
3.1. The taxonomic assumption at the time of the naming explosion
Bauer and Mandler (1989) first addressed this issue in their study of the categorization abilities in very young children. As part
of their study, they asked whether labeling would increase 1631-month-old children's tendency to sort taxonomically.
Unexpectedly, however, even the youngest children in their study were sorting taxonomically from the start. That is, even with
no labels children were sorting taxonomically about 75% of the time. Labeling did not increase this already high level of
performance. Bauer and Mandler (1989) have thus convincingly demonstrated that quite young children are capable of sorting
taxonomically. They also argue that there may not be any general thematic preference. Because of the already high rate of
sorting taxonomically, however, they were unable to test whether children of this age adhere to the taxonomic assumption. That
is, it is still important to know whether there are situations in which very young children show thematic preferences and, if so,
whether hearing a label will cause them to shift to taxonomic sorting. Backscheider and I addressed this by changing aspects of
Bauer and Mandler's procedure that resulted in children responding taxonomically from the start.
One reason why Bauer and Mandler (1989) achieved such a high rate of taxonomic responding in their young children is that
they used a reinforcement procedure whereby they briefly pretrained children to select taxonomically and maintained this
selective reinforcement of taxonomic choices throughout the testing procedure. The selective reinforcement clearly mattered
because in a control study Bauer and Mandler achieved an equally high rate of thematic responding by selectively reinforcing
thematic rather than taxonomic responses. Since they demonstrated that selective reinforcement is a powerful way to influence
children's responses, Backscheider and I avoided selective reinforcement (as did all earlier studies with older children). Other
differences between our procedure and Bauer and Mandler's (1989) are that we used pictures instead of objects, we used items
whose thematic relations we thought would be better known to 18-month-olds, and we used an up/down placement of thematic
and taxonomic choices instead of left/right. Finally, Bauer and Mandler counterbalanced the position of thematic and taxonomic
choices by alternating the left/right placement from trial to trial. This alternation is unfortunate especially when coupled with a
reinforcement procedure. Children may have simply learned the correct answer changes sides every trial. We counterbalanced
side but not by alternating from trial to trial. The major difference between the studies, however, is that we did not differentially
reinforce taxonomic responding.
Thirty-three children participated in our study. They ranged in age from 18 to 25 months with a mean age of 21.5 months. The
experimental materials consisted of ten triads, each containing one target picture, one picture thematically related to the target,
and one picture taxonomically related to the target picture. We selected thematic relations that we thought would be highly
familiar to even very young children. The taxonomic match belonged to the same basic level category as the target but, where
possible, came from distinctive subordinate categories. The ten triads are listed in table 1.
The experimental questions were preceded by a set of four warm-up questions designed to clarify the instructions and procedure
but not to differentially reinforce either taxonomic or thematic responding. The warm-up questions and the experimental
questions were oddity tasks, where children viewed a target picture and then selected one of two choice pictures. To begin,
children were introduced to a frog hand-puppet who manipulated the pictures. They were asked to help the frog find the pictures
it wanted. The children were told that the frog had just gotten a new house, and wanted things to put in its house. In order to
ensure that children gave unambiguous responses, they were shown how to place the pictures they selected in the frog
Table 1
Experimental items from Markman and Backscheider's study
Choices
Target
Taxonomic Thematic
Sitting baby Lying baby Stroller
Bottle Bottle Baby
Chair Chair Sitter
Rabbit cup Cup Pitcher
Foot Foot Shoe
Glasses Glasses Eyes
Blue scoop shovel Red shovel Pail
Spoon Spoon Cereal in bowl
Toilet paper Toilet paper Toilet
Mitten Mitten Hand
puppet's mouth. There were two experimental conditions, the No Label Condition and the Novel Label condition.
3.1.1. Warm-up questions
For the warm-up questions, children were shown a target picture and then had to select one of two pictures, one of which was an
unrelated distractor. The other picture, the correct choice, was thematically related to the target for half of the trials and
taxonomically related for the other half. The items used for the warm-up questions are listed in table 2.
Table 2
Warm-up items from Markman and Backscheider's study
Choicesa
Target
Taxonomic match Thematic match Distractor
Hat Hat Head Car interior
Watch Watch Wrist Road
Shopping cart Shopping cart Bag of groceries Pillow
Hair Hair Brush Pot
a In a single trial the child saw either the taxonomic match or the thematic match, not both. The distractor was always
present.
The experimenter placed the target picture (for example a hat) on the table and asked the child to 'look at this picture' or 'see
this', etc. Then the two
choice pictures were placed on a magnetic board, one above the other. (Pilot work suggested that a vertical alignment of the
pictures produced fewer response biases based on position than did a horizontal alignment.) The top/bottom position of the
random distractor and correct picture was counterbalanced. The target picture was held next to the top picture while the
experimenter asked 'Is this another one?' and then held next to the bottom picture while the experimenter asked 'or is this
another one?' When children selected the wrong picture they were told that that was not the one that the frog wanted and were
encouraged to make another selection. For the warm-up questions, the frog would not accept the wrong picture it would keep its
mouth closed and shake its head, etc. When children selected the correct choice the frog opened its mouth to take the picture and
enthusiastically thanked them, saying 'Yeah! Yeah! That is just the one I wanted. Thank you', kissed the child, etc. The
experimenter then explained why that was the picture the frog wanted, describing either the taxonomic relation between the
target and the picture or the thematic relation as appropriate. For example for the thematic pair hat/head, the frog said 'That's
great! A hat goes on a head. Do you ever put a hat on your head? A hat goes on a head, they go together'. For the taxonomic
pair hat/hat, the frog said 'That's great! Now I have two hats. Look, these are the same, two hats'. Thus, the warm-up questions
encouraged, explained, and reinforced taxonomic and thematic responses equally.
The procedure for the two experimental conditions was similar to that of the warm-up questions with two exceptions. First, in
the procedure proper there was no unrelated distractor. One of the choice pictures was thematically related to the target and the
other was taxonomically related to it, as shown in table 1. Second, the frog accepted any choice the child made, that is, there
was no selective reinforcement during the experiment proper.
3.1.2. No label condition
To begin each trial, the puppet placed the target picture on the table and said 'Now let's look at this picture' or 'Here's a new
picture'. Then the two choice pictures were placed on the magnetic board, one above the other. Whether the thematic choice was
placed on the top or on the bottom was counterbalanced such that for each child half of the time it appeared on the top and half
the time on the bottom. The order of presentation of the items was randomly determined for each child. Once the child saw the
target picture, the puppet asked the child to 'find another one'. It held the target picture next to each of the choice pictures,
asking 'Is this another one, or is
this another one?' The frog then held the target between the two pictures while the child selected one of them and placed it in
the frog puppet's mouth. The experimenter took the target out of the frog's mouth as the child moved his or her choice over to
the frog. Children were enthusiastically thanked on every trial, regardless of which picture they chose.
3.1.3. Novel label condition
The materials and procedure for the Novel Label Condition were identical to those of the No Label Condition except that
children in this condition were told that the puppet would sometimes speak in puppet talk and the puppet gave each target
picture a novel label. For example, the puppet would show the child the target picture saying 'Now I am going to show you a
sud. Look at this, it is a sud. Can you find another sud?' When children were making their choices the puppet said, e.g., 'Is this
another sud or is this another sud?' Ten nonsense syllables were used as the novel labels and were randomly assigned to the
targets for each child.
The first question this study addressed was whether children as young as 18 to 24 months will show thematic preferences at all.
In marked contrast to Bauer and Mandler's (1989) findings, these young children did show a thematic bias. When children in the
No Label Condition were asked to choose between an object that was from the same category as the target and one that was
thematically related to it, they chose the taxonomic match only 32% of the time, which was significantly less than chance, t(15)
= 4.78, p 0.001. That is, they chose thematically 68% of the time. Thus, when children are not selectively reinforced for
choosing taxonomic items and when the thematic relations are geared towards what would be well-known to even 18-month-
olds, quite young children will reveal the thematic biases often seen in older children. This does not contradict Bauer and
Mandler's claim that these young children are capable of grouping the objects on the basis of common categories, but it does
establish that when the thematic relations are well-known and when children are not reinforced for one type of response or
another they prefer to organize objects thematically.
Given that for these items, these young children do show thematic biases, it then makes sense to ask whether hearing an object
labeled will help them override their thematic preference in favor of taxonomic relations. As predicted, there was a highly
significant effect of condition, with children in the Novel Label condition picking taxonomically fully 77% of the time compared
to 32% of the time for children in the No Label Condition, F(1, 31) = 66.58, p 0.0001. Children hearing an object labeled
selected a taxonomic match
well over what would be expected by chance, t(16) = 6.75, p 0.0001. Thus, these young children do adhere to the taxonomic
assumption. They extend object labels to other objects of like kind rather than to objects that are thematically related.
Wasow and I have conducted subsequent studies to determine if these results would replicate with babies 16 to 18 months old.
Several changes in procedure were made to accommodate to the younger children. We used objects instead of pictures and
reduced the number of questions that children were asked. The experimental triads that were used are presented in table 3. The
procedure was otherwise quite similar to that of Backscheider and Markman, including use of warm-up questions. Thirty-two
16-month-olds and 32 18-month-olds participated in this study, half of each age group in each condition. As before 18-month-
olds honored the taxonomic assumption. First, when in the No Label condition, 18-month-olds showed a clear thematic bias
selecting the taxonomic choice only 28% of the time, which is below chance, t(15) = 10.97, p 0.0001. When the target object
was given a novel label, the 18-month-olds significantly increased their taxonomic selections, now picking a member of the
same category 54% of the time, t(30) = 3.34, p 0.005.
Table 3
Items from Markman and Wasow's study with familiar objects
Choices
Target
Taxonomic Thematic
Felt doll hat Plastic doll hat Doll head
White doll-size sock Yellow toddler-size sock Doll-size foot
Baby doll (approx. 5'') Smaller doll (approx. 3") Doll-size bottle
Red & white adult-size
toothbrush Pink child-size toothbrush Set of plastic teeth
White wooden dollhouse
Pink plastic dollhouse table table Pink plastic dollhouse chair
Yellow & white plastic
Yellow plastic pail Red plastic pail shovel
We did not find this labeling effect with 16-month-olds, however. The 16-month-olds did find the thematic relations salient,
picking taxonomically only 34% of the time in the No Label condition, which was less than chance, t(15) = 3.76, p 0.002. But
labeling had no effect on their selections, with babies selecting taxonomically only 39% of the time in the Novel Label
condition.
One possible reason for our failure to replicate the labeling effect with 16-month-olds is that in this procedure we provide a
novel label for a familiar
object and then look to see how the child interpreted the label. As I will argue later, children find it more difficult to learn
second labels for objects because this violates mutual exclusivity, another word-learning constraint. Other studies mimimized
the conflict between mutual exclusivity and the taxonomic assumptions by telling children that the novel word was a word in
puppet language or a foreign language (Markman and Hutchinson 1984, Waxman and Gelman 1986). Although older children
have been shown to treat mutual exclusivity as applying within a single language, not across languages (Au and Glusman 1990),
16-month-olds would not have understood a discussion of puppet language. To test the possibility that the conflict between
mutual exclusivity and the taxonomic assumption was confusing the 16-month-olds, Wasow and I ran another version of the
study, this time using novel objects. Thirty-two 16-month-old babies were introduced to novel triads of toys where two of the
toys were similar in appearance and two were related by some thematic relation we demonstrated. For example, two honey
dippers of different colors were novel taxonomic pairs and one honey dipper was scraped against a plastic grid to demonstrate
the thematic relation. Babies were then shown a target, e.g., a honey dipper, heard it labeled or not, and were then depending on
the condition, asked to find another one or another X (where X was the novel label). The full set of novel items used is
described in table 4.
Table 4
Items from Markman and Wasow's study with novel objects
Choices
Target
Taxonomic Thematic
Small green sponge Large yellow sponge Green sponge rectangle with circle cut
circle circle out
Plastic 'tub' container with square cut
Small orange block Large blue block out in lid
Red 'person' peg Yellow 'person' peg Wooden car with round hole
Red helicopter (with post to hold
Large red propellor Small yellow propellor propellor)
Yellow honey dipper Green honey dipper Yellow plastic grid
Decorated PVC pipe Plain PVC pipe Decorated funnel
Once again we failed to find an effect of labeling with 16-month-olds. Babies selected the taxonomic choice 40% of the time
when the object was unlabeled and 44% of the time when it was labeled. Our avoiding second labels for objects did not improve
babies' performance in the labeled condition.
One possibility, of course, is that 16-month-olds lack the taxonomic assumption. Another is that the oddity task we have been
using is insensitive to the knowledge babies have. To reveal use of the taxonomic assumption in this task, babies must inhibit a
dominant response. We deliberately selected thematic choices that would be preferred by babies this age. Being asked for 'a dax'
for example may not be compelling enough to prevent babies from doing what they most prefer, even if they do interpret 'dax' as
referring to objects of like kind.
Thus, our failure to find the labeling effect with the 16-month-olds might be partially accounted for by the requirements of the
oddity task. The taxonomic assumption may not be strong enough to allow young children to inhibit reaching for a preferred
object.
Support for this interpretation comes from work from Waxman and her colleagues, who found that labels highlight categorical
structure for even 12-and 13-month-olds (Waxman, this volume; Waxman and Heim 1991, Markow and Waxman 1992).
Instead of an oddity task where children would need to inhibit a dominant thematic response to reveal knowledge of the
taxonomic assumption, Waxman and Heim (1991) and Markow and Waxman (1992) used a manual habituation procedure which
assessed whether babies could notice and distinguish two different categories of objects. The hypothesis being tested is that
words should enhance babies' tendency to notice object categories. The prediction was that labeling objects should (1) increase
the rate of habituation to category exemplars and (2) increase the degree of dishabituation to an object from a different category.
In this procedure, babies are first familiarized with four toys from a given category (e.g., four cars or four animals). Children
were presented with each toy, one at a time for 30 seconds each. On analogy with visual habituation tasks, a decrement in the
time spent exploring the last toy compared with the first was used as an index of habituation. After the familiarization trials,
babies were presented with a new exemplar from the old category and an object from the novel category. Here, again on analogy
with standard habituation tasks, babies who had best formed the category would be expected to show a greater interest in the
object from the novel category than in the object from the original category. For some of the babies the experimenter labeled the
object during the familiarization phrase saying, e.g., 'Look at the car'. For some babies the experimenter drew attention to the
object without labeling it saying, e.g., 'Look at this'. As predicted, labeling the objects improved babies' ability to categorize as
measured both by rate of habituation to within category exemplars and by dishabituation to a novel category (Waxman and
Heim 1991, Markow and Waxman 1992). Moreover, Markow and Waxman discovered that at 12 months labeling objects
improved babies' categorization even when adjectives rather than nouns were used to refer to the objects. This finding is of
interest because it supports the idea that babies first treat words as referring to objects of like kind rather than treating only
nouns that way. One-year-olds, presumably with minimal knowledge of grammatical form class, were better able to categorize
objects if they were labeled regardless of whether a noun or adjective was used as the label.
At this point it is of interest to consider briefly Bloom's (this volume) alternative framework for constraints on word meaning.
Bloom views the postulation of particular constraints such as the taxonomic and whole-object assumptions as unmotivated and
corrects this in his model with an elegant system that generates constraints to conform to universal grammar-cognition
mappings. Bloom's systematic analysis provides a clear statement of what the endpoint of development should be. Where we
disagree is in what to attribute to the very early phases of language learning the time before children have mastered grammatical
form class. On the view I have proposed very young children will break into the system by assuming that a word refers to a
whole object (the whole object assumption) and is extended to things of like kind (the taxonomic assumption). Bloom's account,
which is bolstered by counterexamples to my formulation, is that children map count nouns on to kinds of individuals rather
than kinds of objects. Yet, this may describe a developmental achievement rather than the initial state. Babies may treat terms as
referring to objects but with time note the regularity with which count nouns refer to objects. Once this correlation is established
children may then be able to metaphorically extend object-like or 'individual' status to non-objects that are referred to with count
nouns such as 'a forest' or 'a lecture'. Second, on my account as well as Waxman's, young children rely on word-learning
constraints before they have mastered grammatical form class. Very young children should err by treating adjectives, for
example, as referring to kinds of objects. The work just described (Markow and Waxman 1992) is one source of evidence that
before children have mastered the noun/adjective contrast in English, adjectives as well as nouns enhance one-year-old's
attention to kinds of objects.
Labeling has been found to enhance even younger babies' attention to objects. Baldwin and I found that babies from 1014
months old attended more to toys that had been labeled compared to ones that had not (Baldwin and Markman 1989). In a
second study, we compared labeling to another powerful means of directing babies' attention, namely pointing. Although
labeling did not increase babies attention over pointing at the time pointing occurred, during a subsequent play period babies
attended more to toys that had been labeled than to those that had not. Thus, labeling may sustain babies' interest in objects
beyond the time the labeling occurs. By helping babies sustain their attention to relevant objects, this labeling effect may help
babies notice and remember the word-object correspondences. It is not yet known whether this labeling effect is caused by
labels per se or whether other nonlinguistic sounds would have the same effect (see Baldwin and Markman 1989 for other
possible explanations as well). So far, the existing data are inconsistent. Waxman and Balaban (1992) have found that words but
not tones increased attention to object categories in 9-month-olds. On the other hand, Roberts and Jacob (1991) found that
music as well as words facilitated object categorization in infants. Whether the effect is specific to words or not, words do
heighten babies' attention to objects and object categories.
Taken together, then, the experimental findings document that babies from 1218 months (and maybe even younger) use the
taxonomic assumption to help them determine the appropriate referents of a word. This conclusion from the experimental data is
supported by the results of Huttenlocher and Smiley's (1987) study of naturalistic data. They generated a set of criteria to
distinguish thematic (complexive) extensions of words from taxonomic extensions, arguing that not every use of a word by a
young child should be taken as a simple label of an object. They followed several children from the time of their first word
(around 13 months for most of the children) and periodically recorded both the words children produced and details of the
situation and context in which utterances occurred. For example, a child who reaches towards a cookie jar saying 'cookie,
cookie' with an insistent request intonation, should not be interpreted as labeling the cookie jar as a cookie. Instead, the child is
most likely requesting a cookie. Using coding categories that capture reasonable interpretations of children's utterances, they
found that from the start children use words to refer to objects of like kind. Thus, evidence from experimental and naturalistic
studies alike indicate that the taxonomic assumption is available to babies who have not yet undergone the naming explosion.
Although the taxonomic assumption goes a long way in reducing the kinds of hypotheses babies need to consider in figuring out
the meaning of a new word, it by no means solves the problem. Babies may be led to treat novel words (or nouns) as referring to
kinds of objects but that still leaves open the question of which kind. This problem has been extensively addressed by Waxman
and her colleagues and is reviewed in Waxman (this volume). To briefly summarize, labels facilitate young children's attention
to objects at some hierarchical levels but might actually interfere with their ability to categorize at other levels. Young children
hearing a noun expect it to refer to categories either at a basic or superordinate level of categorization, but do not expect it to
refer to the distinctions made at subordinate levels of categorization. Rather, they expect this level of detail to be indicated by
adjectives. Waxman's work reveals the interaction between use of word-learning constraints, grammatical form class, and
conceptual structure.
Not only does the taxonomic assumption leave open the question of which kind of object is being referred to, it even leaves
open which object is being referred to. As Baldwin (1991) pointed out, in normal environments that babies find themselves in,
there are often many candidate objects around that could serve as potential referents for a new word. Even if babies are prepared
to treat a novel word as referring to, say, a basic level category, they must figure out which one. One possibility is that babies
may treat a novel word as referring to whatever novel object they are attending to. Whenever babies and adults are focussing on
the same object, this would lead babies to correctly identify the referent of a novel term. On those occasions, however, when the
adult was in fact labeling an object other than the one the baby was focused on, the baby would wrongly interpret the new word
and would make a mapping error. Such mapping errors appear to be very rare, although there is evidence that babies learn words
more readily when parents tend to label what their baby is attending to (Tomasello and Farrar 1986). Baldwin (1991) suggested
that babies could avoid mapping errors if they monitored the speaker's focus of attention. If babies recognize that the object they
are interested in is not the one the speaker is attending to, then they would know not to treat the word as a label for their object.
Baldwin tested whether babies use information about a speaker's focus of attention in inferring the referent of a novel word or
whether they mapped a novel word to whatever novel object they were interested in. To test this, she provided novel labels to
1619-month-old babies in one of two conditions. In the 'discrepant' label condition, babies were given a toy to play with. Once
the experimenter was assured that the baby was examining the toy the experimenter provided a
novel label saying, e.g., 'It's a toma'. Instead of looking at the baby or the toy, however, the experimenter gazed into an opaque
bucket as she said 'It's a toma'. Thus, the baby heard the novel label while looking at a novel toy but while the speaker looked
into a bucket that contained a second toy. In the 'follow-in' labeling condition, the experimenter looked at the visible toy while
she provided the novel label.
In the follow-in labeling condition, babies of both ages treated the novel word as referring to the visible toy. Of more interest is
what happened in the discrepant conditions. The results were quite clear: at neither age did babies treat the novel word as
referring to the visible toy even though they were looking at the toy at the time they heard the label.
By monitoring the eye-gaze, posture, direction of voice, or some other cues to the speaker's focus of attention babies avoided
making a mapping error. The 1617-month-olds avoided errors by simply failing to learn the new word. The 1819-month-olds
not only avoided errors, but were able to infer that the new word referred to the object that was hidden in the bucket at the time
of labeling. Baldwin's results reveal how babies' use of word-learning constraints is coordinated with their construal of the
communicative intent of the speaker.
taxonomic category even when the objects in question do not have the same shape. Moreover, Landau (this volume) along with
Becker and Ward (1991) have found that preschoolers treat novel terms as labels for objects rather than shape in that they treat a
term for a worm-like animal as referring to another worm twisted into a different configuration. (Landau, however, argues for a
different interpretation of these results. In particular she argues that children are still responding on the basis of shape but shape
now is defined as the possible shape transformations an object of a given shape can undergo.) In addition, there are
developmental differences in the way this whole-object bias interacts with grammatical form class (Landau et al., in press;
Landau, this volume). Older children and adults will treat a novel noun as referring to the kind of object but treat an adjective as
referring to a property such as texture. But the youngest children treated even novel adjectives as referring to kinds of objects.
Here again we see that when knowledge of grammatical form class is weak, word-learning constraints such as the whole-object
assumption dominate the child's interpretation of a novel term.
Taken together, these findings suggest that children attempt to honor the whole object and taxonomic assumptions per se, rather
than treating words as shape terms. Further support for the whole-object assumption comes from studies that have documented
that children interpret a novel term as a label for an object and not its part (Markman and Wachtel 1988, Mervis and Long 1987)
and for an object over its substance (Markman and Wachtel 1988, Soja et al. 1991).
4.1. The whole-object assumption at the time of the naming explosion
Although the whole-object assumption has received experimental support, only Mervis and Long (1987) examined babies
around the age of the naming explosion. Moreover, none of the studies (except perhaps Landau's, this volume) provided a strong
test of the whole-object assumption. Babies might map words onto objects not because of a whole-object assumption per se, but
simply because objects are salient and babies might map novel words to whatever is most salient at the time of labeling. Thus a
stringent test of the whole-object assumption requires examining whether babies treat novel words as referring to objects even
when an object is not the most salient aspect of the environment. Woodward (1992) has provided such a test.
Woodward (1992) had babies view two video monitors. On one screen babies viewed a dynamic substance in motion, such as
flowing lava. On the
other screen the babies viewed a static novel object. When the screens were turned on and babies allowed to watch freely, they
clearly preferred the swirling substances to the static objects. Thus, Woodward created a situation where the object was less
salient than the substance in motion. On some trials babies heard a label that could be interpreted as referring either to an object
or substance. The prediction from the whole-object assumption is that hearing a label should cause babies to shift attention more
to the whole object. This hypothesis was confirmed for 18-month-olds (though less clearly for the 24-month-olds). Around the
time of the naming explosion, then, babies treat novel words as referring to objects per se, rather than to whatever is most
salient.
Testing a somewhat different hypothesis, Echols also has evidence suggesting that well before the time of the naming explosion
babies honor the whole-object assumption, but that younger babies (8- to 10-month-olds) may not. Echols (1990, 1991) asked
whether young babies might map labels to whatever in the environment is consistent rather than to objects per se. She used an
habituation/dishabituation procedure where babies heard labels either in the presence of consistent objects with varying motions
or consistent motions with varying objects. Trends in the pattern of habituation and dishabituation led Echols to speculate that
there is a developmental shift from 810 months to 1315 months in how babies are affected by labeling. The younger babies
appeared to focus on what was consistent when they heard a label while the older ones focused on objects per se. The older
babies were only 13 to 15 months old, however.
Woodward's visual preference study and Echols' habituation studies both suggest that the whole-object assumption is in place by
the time of the naming explosion. By focussing children's attention on objects as the most likely referent of a novel word, the
whole-object assumption greatly reduces the number of hypotheses children need to consider for the meaning of a novel term.
The whole-object assumption thus promotes the rapid learning of object labels. While the whole-object assumption can account
for the speed with which children acquire names for objects, it poses an obstacle for learning terms for parts, substances, colors,
texture, and other properties of objects. With only the whole-object assumption to guide their interpretation of novel words,
babies would be limited to learning only object labels. One of the functions of the mutual exclusivity assumption discussed next
is that it can override the whole-object assumption, freeing children to learn a greater variety of terms.
evidence they treated the term as a common name. Katz et al. (1974) manipulated two sources of information about whether
something might be a proper name or not. The first was grammatical form class, in particular, whether the novel term took an
article or not, e.g., 'a dax' vs. 'Dax'. The second was the type of object that was labeled, in particular, animate-like things which
are appropriate referents of proper names, in this case dolls, versus inanimate objects unlikely to be treated as unique
individuals, in this case blocks. Their results for girls suggested that when both the grammatical form class and the conceptual
domain were appropriate, girls treated the novel term as a proper name. That is, when a doll was called 'Dax', girls interpreted
Dax as a name for that doll and not the other one. When the doll was called 'a dax', girls treated the term as applying to both
dolls. No matter whether the block was called by common or proper name, girls treated the new word as referring to both
blocks. A partial replication of this study suggested that 18-month-old girls would also distinguish between proper and common
names for dolls. These results did not hold up for the boys, however. Two-year-old boys were at chance for all object selections,
even when hearing a proper name for a doll.
Gelman and Taylor (1984) replicated the Katz et al. (1974) study with some improvements and modifications. They addressed
the concern that evidence for treating a term as a common noun was undifferentiated from chance performance. By adding other
distractors to the response set, they avoided this problem. Gelman and Taylor (1984) also used unfamiliar rather than the
familiar objects that Katz et al. had used. Testing somewhat older children (23-year-olds), Gelman and Taylor found that boys
as well as girls treated terms as common nouns except in the case where proper names were used to refer to animate-like objects
(stuffed animals).
To return to Hall (1991), then, the question is whether, along with grammatical form class and conceptual domain, mutual
exclusivity is used to help children learn proper names. By overcoming the tendency to treat a novel label as a second term
referring to things of like kind, mutual exclusivity could promote the interpretation of a new term as a proper name. Second
labels for objects should be more likely than first labels to be construed as proper names. To test this Hall ran a partial
replication of Gelman and Taylor (1984), running only the condition that resulted in proper name interpretations in the prior
work. That is, two- and three-year olds were taught proper names for animate objects. They were taught in one of two
conditions: either the animate objects had known names or they did not. Overall, Hall (1991) again replicated Gelman and
Taylor (1984) and Katz et
al. (1974) with children treating the novel term as referring to the individual object labeled at above chance levels. Moreover, as
predicted, children were more likely to treat the novel term as a proper name when they already knew another common name
for the object. Combining the results of Hall (1991) with those of Gelman and Taylor (1991), we can conclude that young
children recruit information from three sources to determine how to interpret a novel term: When the conceptual domains is
appropriate, when the syntactic information is consistent with a proper name interpretation, and when mutual exclusivity can
help override the taxonomic assumption, young children are most likely to treat the term as a proper name.
These studies with parts and substances and proper and common names lead to the general conclusion that some word-learning
constraints can be used to moderate or override others. Mutual exclusivity can override the whole-object assumption leading
children to learn terms for parts, substances, and other atributes of objects. It can also override the taxonomic assumption,
thereby helping children interpret a proper name as a term for a unique individual.
5.1. Mutual exclusivity as a guide to a word's referent
Another advantage of mutual exclusivity is that it can provide an indirect means of inferring the meaning of a novel word.
Suppose a child hears someone use a novel object label, for example 'Look at the gadget' or 'Please bring me the gadget'. The
child looks around and sees one or more objects with known names say, a ball, and a novel object, say, a garlic press. By
mutual exclusivity the child should reason that 'gadget' can't refer to the ball because it is 'a ball', so it must refer to the garlic
press, the only novel object around. In this case the child can fulfill both the whole-object and mutual exclusivity assumptions
and use them to infer the meaning of a novel word. Adults and children alike use mutual exclusivity to figure out which object a
novel label refers to without anyone explicitly pointing to or otherwise indicating the object (Au and Glusman 1990, Dockrell
and Campbell 1986, Golinkoff et al. 1992b, Hutchinson 1986, Markman and Wachtel 1988, Merriman and Bowman 1989).
Golinkoff et al. (1992b) went a step further and documented not only that mutual exclusivity provides children with an indirect
means of inferring the meaning of a novel term but that that term then functions as a familiar word in the child's lexicon. They
showed that after 2 ½-year-olds used mutual exclusivity to infer the referent of a novel term, they treated the referent as an
object with a known label which in turn prevented the children from accepting another novel label for that object.
Two-year-olds' success at using mutual exclusivity to infer the referent of a novel term belies the inferential requirements of this
task. The logic of this problem is of the form of a disjunctive syllogism: 'The novel word must refer to either object A or to
object B. It is not A (by mutual exclusivity), therefore, it must be B'. Hutchinson (1986) has found, in fact, that this logic of the
problem may pose a problem for developmentally delayed children even when they are matched in mental age to normal
children. In her task, normal and developmentally delayed children were given pairs of objects where one object had a known
name and the other did not. The normal children were 2, 2 ½ and 3 years of age and, as expected, at all age groups they chose
the unfamiliar object at above chance levels as the referent of a novel label. The performance for the developmentally delayed
children showed a marked developmental difference with children matched in mental age to the 2 ½- and 3-year-olds selecting
the novel object at above change levels, but with children matched in mental age to the 2024-month-olds failing to do so. These
children did not select the novel object as the referent of the novel word even though normally developing children of 2024
months did. It was not until retarded children reached a mental age of 28 months that they seemed capable of using mutual
exclusivity to infer the appropriate referent of a novel term. One possible reason for this delay is that the logic of the problem is
particularly slow to develop in the delayed children.
5.2. Mutual exclusivity at the time of the naming explosion
There are two concerns that have been raised about whether children around the age of the naming explosion can use mutual
exclusivity to infer the meaning of a novel term. The first is that this use of mutual exclusivity has been documented only in
children from two years of age and up. Studies with younger children are needed. Second, an alternative explanation for these
results has been proposed by Merriman and Bowman (1989). They argued that children might map a novel word to a novel
object in these studies because they are predisposed to fill lexical gaps to find first labels for objects. Thus, when children
encounter an object for which they do not know a label, they should seek out its name. Children might be mapping the novel
label to the novel object because they desire a name for the novel object, not because they reject a second label for the familiar
object.
Wasow and I have addressed both of these concerns in a recent set of studies (Markman and Wasow, in preparation). We asked
whether children at around the age of the naming explosion can use mutual exclusivity unaided by a bias to fill lexical gaps to
infer an appropriate referent of a term. The lexical gap hypothesis states that in the presence of a novel object, children will seek
its name. With no novel object visible, no lexical gap can be created. We ruled out the possibility of children using a lexical gap
strategy by not having a novel object visible at the time of labeling. At the time they heard the novel label children saw only a
familiar object and a bucket which served as a possible location for objects. The prediction from mutual exclusivity is that upon
hearing a novel label children should reject it as a second label for the visible familiar object and search for an appropriate
referent. This search could consist of looking around the room, or on the floor, etc. or by wanting to see what is in the bucket.
We found that babies as young as 15 months of age can use mutual exclusivity to reject a second label for an object and to
motivate them to search for another potential referent. In this case, not only do the babies need to use the disjunctive logic to
reject the label as a label for object A, they infer it applies to some unknown object B that is not visible at the time. It is
remarkable that this very demanding test of the use of mutual exclusivity is passed by such young babies.
Another less demanding test of the use of mutual exclusivity in young children is to see whether mutual exclusivity simply
causes children to reject second labels for objects, without requiring the children to make any inferences beyond that. The
prediction is simply that second labels for objects should be more difficult to learn than first labels. Although there are a number
of studies that document young children are capable of learning second labels for objects (Banigan and Mervis 1988, Mervis et
al. 1991, Taylor and Gelman 1989, Tomasello et al. 1988, Waxman and Senghas 1990) all except Mervis et al. (1991) were
conducted for other purposes and none of these compared the learning to first labels. The prediction is not that second labels
should be impossible for young children to learn, but rather that they should be harder because children prefer not to have
second labels for things.
Liittschwager and I tested this hypothesis with 18- and 24-month-olds (Liitschwager and Markman 1991). Children were briefly
taught either a first label for a novel object or a second label for an object with a known first label. The results for the 18-
month-olds were as expected they successfully learned a new first label for an object but failed to learn a second label. The
results for the 24-month-olds were more complicated and reveal another subtlety to the way constraints on word learning work.
Not only did the 24-month-olds
successfully learn a new first label after brief training, but they learned a second label just as well. These young children clearly
violated mutual exclusivity to acquire this second label. We then ran a second study with 24-month-olds to see what would
happen if the task were made somewhat more demanding. Now children were required to learn two new labels rather than just
one either two new first labels or two new second labels. In contrast to the earlier results, the two-year-olds now showed
evidence of using mutual exclusivity. They succeeded at learning two new first labels but failed to learn the second labels for
objects. This suggests another way in which word-learning constraints function as default assumptions: children may rely on the
default assumptions more heavily when the demands of the task increase.
6. Conclusions
Word-learning constraints were presented as necessary to help children cope with the inductive problem involved in learning a
novel word. An unconstrained, unbiased learning mechanism would be forced to consider too many hypotheses and would be
unable to converge on a candidate meaning in a reasonable amount of time. Some constraints on hypothesess are required to
explain the fast learning actually seen at around 18 months to two years of age. I reviewed evidence for three specific
constraints the whole-object, taxonomic, and mutual exclusivity assumptions. Recent work reveals that all three assumptions are
available to babies by the time of the naming explosion. Even for young children, however, other sources of information about a
word's meaning may be available. When several sources of information, such as grammatical form class, eye gaze of the speaker,
and all three word-learning constraints converge, word learning should be especially efficient. Yet there remain many interesting
issues to be resolved about the complex and subtle interplay of word-learning constraints with each other and with other sources
of information about a word's meaning.
References
Au, T.K., 1989. Children's use of information in word learning. Unpublished manuscript, Brown University.
Au, T.K. and M. Glusman, 1990. The principle of mutual exclusivity in word learning: To honor or not to honor? Child
Development 61, 14741490.
Au, T.K. and E.M. Markman, 1987. Acquiring word meanings via linguistic contrast. Cognitive Development 2, 217236.
Baldwin, D.A., 1989. Priorities in children's expectations about object label reference: Form over color. Child Development 60,
12911306.
Baldwin, D.A., 1991. Infants' contribution to the achievement of joint reference. Child Development 62, 875890.
Baldwin, D.A., in press. Clarifying the role of shape in children's taxonomic assumption. Journal of Experimental Child
Psychology.
Baldwin, D.A. and E.M. Markman, 1989. Mapping out word-object relations: A first step. Child Development 60, 381398.
Banigan, R.L. and C.B. Mervis, 1988. Role of adult input in young children's category evolution, II: An experimental study.
Child Language 15, 493505.
Bauer, P.J. and J.M. Mandler, 1989. Taxonomies and triads: Conceptual organization in one-to-two-year olds. Cognitive
Psychology 21, 156184.
Becker, A.H. and B.T. Ward, 1991. Children's use of shape in extending novel labels to animate objects: Identity versus postural
change. Cognitive Development 6, 316.
Bloom, B., 1994. Possible names: The role of syntax-semantics mappings in the acquisition of nominals. Lingua 92, 297329
(this volume).
Bloom, L., K. Lifter and J. Broughton, 1985. The convergence of early cognition and language in the second year of life:
Problems in conceptualization and measurement. In: M. Barrett (ed.), Children's single-word speech, 149180. New York: Wiley.
Clark, E.V., 1987. The principle of contrast: A constraint on language acquisition. In: B. MacWhinney (ed.), The 20th Annual
Carnegie Symposium on Cognition, 133. Hillsdale, NJ: Erlbaum.
Clark, E.V., 1990. On the pragmatics of contrast. Journal of Child Language 17, 417431.
Clark, E.V., 1991. Acquisitional principles in lexical development. In: S.A. Gelman, J.P. Byrnes (eds.), Perspectives on language
and thought: Interrelations in development, 3171. Cambridge University Press.
Corrigan, R., 1983. The development of representational skills. In: K.W. Fischer (ed.), Levels and transitions in children's
development, 5164. San Francisco, CA: Jossey-Bass.
Dockrell, J. and R. Campbell, 1986. Lexical acquisition strategies in the preschool child. In: S. Kuczaj, M. Barrett (eds.), The
development of word meaning, 121154. Berlin: Springer.
Dromi, E., 1987. Early lexical development. New York: Cambridge University Press.
Echols, C.H., 1990. An influence of labeling on infant's attention to objects and consistency: Implications for word-referent
mappings. Paper presented at the International Conference on Infant Studies, Montreal, Quebec, April 1990.
Echols, C.H., 1991. Infant's attention to objects and consistency in linguistic and non-linguistic context. Paper presented at the
Biennial Meeting of the Society for Research in Child Development, Seattle, WA, April 1991.
Fisher, C., L.R. Gleitman and H. Gleitman, H. 1991. On the semantic content of subcategorization frames. Cognitive
Psychology 23, 331392.
Fisher, C., D.G. Hall, S. Rakowitz and L. Gleitman, 1994. When it is better to receive than to give: Syntactic and conceptual
constraints on vocabulary growth. Lingua 92, 333375 (this volume).
Gathercole, V.C., 1989. Contrast: A semantic constraint? Journal of Child Language 16, 685702.
Gelman, S.A. and M. Taylor, 1984. How two-year-old children interpret proper and common names for unfamiliar objects.
Child Development 55, 15351540.
Golinkoff, R.M., C.B. Mervis and K. Hirsh-Pasek, 1992a. Early object labels: The case for lexical principles. Unpublished
manuscript.
Golinkoff, R.M., K. Hirsh-Pasek, L.M. Bailey and N.R. Wenger, 1992b. Young children and adults use lexical principles to
learn new nouns. Developmental Psychology 28, 8489.
Gould, J.L. and P. Marler, 1984. Ethology and the natural history of learning. In: P. Marler, H.S. Terrace (eds.), The biology of
learning, 4774. Berlin: Springer.
Hall, D.G., 1991. Acquiring proper nouns for familiar and unfamiliar animate objects: Two-year-olds' word-learning biases.
Child Development 62, 11421154.
Halliday, M.A.K., 1975. Learning how to mean. In: E.H. Lenneberg, E. Lenneberg (eds.), Foundations of language
development: A multidisciplinary approach, Vol. 1, 239265. New York: Academic Press.
Hutchinson, J.E., 1984. Constraints on children's implicit hypotheses about word meanings. Unpublished doctoral dissertation,
Stanford University, Stanford, CA.
Hutchinson, J.E., 1986. Children's sensitivity to the contrastive use of object category terms. Papers and Reports on Child
Language Development 25, 4956.
Huttenlocher, J. and P. Smiley, 1987. Early word meanings: The case for object names. Cognitive Psychology 19, 6389.
Katz, N., E. Baker and J. Macnamara, 1974. What's a name? On the child's acquisition of proper and common nouns. Child
Development 45, 469473.
Landau, B., 1994. Where's what and what's where: The language of objects in space. Lingua 92, 259296 (this volume).
Landau, K.B., L.B. Smith and S.S. Jones, 1988. The importance of shape in early lexical learning. Cognitive Development 3,
299321.
Landau, B., L.B. Smith and S. Jones, in press. Syntactic context and the shape bias in children's and adults lexical learning.
Journal of Memory and Language.
Liittschwager, J.C. and E.M. Markman, 1991. Mutual exclusivity as a default assumption in second label learning. Paper
presented at the biennial meetings of the Society for Research in Child Development, Seattle, April 1991.
Lock, A., 1980. The guided reinvention of language. London: Academic Press.
Markman, EM., 1989. Categorization and naming in children: Problems of induction. Cambridge, MA: MIT Press, Bradford
Books.
Markman, E.M., 1992. Constraints on word learning: Speculations about their nature, origins, and domain specificity. In: M.R.
Gunnar, M.P. Maratsos (eds.), Modularity and constraints in language and cognition: The Minnesota Symposia on Child
Psychology, 59101. Hillsdale, NJ: Erlbaum.
Markman, E.M. and J.E. Hutchinson, 1984. Children's sensitivity to constraints on word meaning: Taxonomic vs. thematic
relations. Cognitive Psychology 16, 127.
Markman, E.M. and G.F. Wachtel, 1988. Children's use of mutual exclusivity to constrain the meanings of words. Cognitive
Psychology 20, 121157.
Markman, E.M. and J. Wasow, in preparation. Very young children's use of mutual exclusivity.
Markow, D.B. and S.R. Waxman, 1992. The influence of labels on 12-month-olds' object category formation. Paper presented at
the International Conference on Infant Studies, Miami, Florida, May 1992.
Marler, P. and H.S. Terrace (eds.), 1984. The biology of learning. Berlin: Springer.
McShane, J., 1979. The development of naming. Linguistics 17, 879905.
Merriman, W.E. and L.L. Bowman, 1989. The mutual exclusivity bias in children's word learning. Monographs of the Society
for Research in Child Development 54(3/4), Serial No. 220.
Mervis, C.B., and L.M. Long, 1987. Words refer to whole objects: Young children's interpretation of the referent of a novel
word. Paper presented at the biennial meeting of the Society for Research in Child Development, Baltimore, MD.
Mervis, C.B., R.M. Golinkoff and J. Bertrand, 1991. A refutation of the principle of mutual exclusivity. Paper presented at the
Society for Research in Child Development, Seattle, WA.
Naigles, L.G., H. Gleitman and L.R. Gleitman, 1992. Children acquire word meaning components from syntactic evidence. In:
E. Dromi (ed.), Language and cognition: A developmental perspective. Norwood, NJ: Ablex.
Nelson, K., 1973. Structure and strategy in learning to talk. Monographs of the Society for Research in Child Development
38(1/2), Serial No. 149.
Nelson, K., 1988. Constraints on word learning? Cognitive Development 3, 221246.
Nelson, K. and J. Lucariello, 1985. The development of meaning in first words. In: M. Barrett (ed.), Children's single-word
speech, 5986. Chichester: Wiley.
Quine, W.V.O., 1960. Word and object. Cambridge, MA: MIT Press.
Roberts, K. and M. Jacob, 1991. Linguistic versus attentional influences on nonlinguistic categorization in 15-month-old infants.
Cognitive Development 6, 355375.
Rozin, P. and J. Schull, 1988. The adaptive-evolutionary point of view in experimental psychology, Vol. 1. In: R.C. Atkinson,
R.J. Herrnstein, G. Lindsey, R.D. Luce (eds.), Perception and motivation (2nd ed.), 503546. New York: Wiley.
Shettleworth, S.J., 1984. Natural history and evolution of learning in nonhuman mammals. In: P. Marler, H.S. Terrace (eds.),
The biology of learning, 419433. New York: Springer.
Snyder, L.S., F. Bates and I. Bretherton, 1981. Content and context in early lexical development. Journal of Child Language 8,
565582.
Soja, N.N., S. Carey and E.S. Spelke, 1991. Ontological categories guide young children's inductions of word meaning: Object
terms and substance terms. Cognition 38, 179211.
Taylor, M. and S.A. Gelman, 1989. Incorporating new words into the lexicon: Preliminary evidence for language hierarchies in
two-year-old children. Child Development 59, 411419.
Tomasello, M., 1992. The social bases of language acquisition. Social Development 1, 6787.
Tomasello, M. and M.J. Farrar, 1986. Object permanence and relational words: A lexical training study. Journal of Child
Language 13, 495505.
Tomasello, M., S. Mannle and L. Werdenschlag, 1988. The effect of previously learned words on the child's acquisition of
words for similar referents. Journal of Child Language 15, 505515.
Ward, T.B., E. Vela, M.L. Peery, S. Lewis, N.K. Bauer and K. Klint, 1989. What makes a vibble a vibble: A developmental
study of category generalization. Child Development 60, 214224.
Waxman, S.R., 1994. The development of an appreciation of specific linkages between linguistic and conceptual organization.
Lingua 92, 229257 (this volume).
Waxman, S.R. and M.T. Balaban, 1992. The influence of words vs. tones on 9-month-old infants' object categorization. Paper
presented at the International Conference on Infant Studies, Miami, Florida, May 1992.
Waxman, S.R. and R. Gelman, 1986. Preschooler's use of superordinate relations in classification. Cognitive Development 1,
139156.
Waxman, S.R. and L. Heim, 1991. Nouns highlight category relations in 13-month-old infants. Poster presented at the Society
for Research in Child Development, Seatlle, WA, April 1991.
Waxman, S.R. and T.D. Kosowski, 1990. Nouns mark category relations: Toddlers' and preschoolers' word-learning biases.
Child Development 61, 14611473.
Waxman, S.R. and A. Senghas, 1990. Relations among word meanings in early lexical development. Paper presented at the
International Conference for Infancy Studies, Montreal, Canada, April 1990.
Woodward, A.L., 1992. The role of the whole object assumption in early word learning. Unpublished doctoral dissertation,
Stanford University, Standford, CA.
Woodward, A.L. and E.M. Markman, 1991. Constraints on learning as default assumptions: Comments on Merriman &
Bowman's 'The mutual exclusivity bias in children's word learning'. Developmental Review 14, 5777.
1. Introduction
Humans are uniquely endowed with the capacity to build complex, flexible, and creative linguistic and conceptual systems.
Infants' and toddlers' remarkable achievements in each of these arenas have engaged researchers for decades. Yet in recent years,
it is the relation between linguistic and conceptual development that has come to occupy center stage. Some of the most exciting
current work has been designed to explore the relation between early linguistic and conceptual development in the young child's
acquisition of the lexicon.
This new, integrative approach has brought into sharp focus a fascinating puzzle. We know that infants acquire their native
language naturally at a
* This work was supported in part by grant #HD30410 from NIH. Thanks to Marie Balaban, D. Geoffrey Hall, Elizabeth
Shipley, and an anonymous reviewer for their comments on a previous version.
remarkable pace (Carey 1978, Dromi 1987, Gopnik and Meltzoff 1987, Nelson 1983). We also know that even before learning
the words to express them, infants appreciate many different kinds of conceptual relations among objects, including category
relations, thematic relations, causal relations, and event-related or associative groupings (Bornstein 1984, Leslie and Keeble
1987, Mandler et al. 1987, Younger and Cohen 1986). These early linguistic and conceptual achievements set the stage for what
has been described as 'the inductive problem of word learning' (Goodman 1983, Quine 1960, Carey 1990). The problem is that,
in principle, the richness and flexibility of infants' conceptual abilities should complicate the task of mapping new words to their
meanings.
To understand why this is the case, consider a typical word learning scenario, in which an adult introduces a child to a novel
object (say, a flamingo) and offers a novel label ('a flamingo'). Let us assume that both the child and adult are focusing attention
on the same object or scene (Baldwin and Markman 1989, Tomasello 1988). If children have the conceptual ability to appreciate
so many different kinds of relations involving that object, and if each of these is a potential candidate for the new word's
meaning, then how do infants select from among these many possible meanings when determining what the new word is
intended to convey? How do infants so rapidly learn that a given word (e.g., flamingo) may apply to a particular whole object
and may be extended to other members of that object category (e.g., other flamingos), but not to salient properties of the object
(e.g., its long neck or unusual color), to salient actions in which it is engaged (e.g., feeding its young), or to salient thematic
relations (e.g., a flamingo and sand)? If children had to rule out these and countless other logically possible candidate meanings,
word learning would be a formidable task indeed.
1.1. Solving the inductive problem
Yet despite the logical difficulty of the task, young children rapidly and successfully map novel words and meanings. This
observation has led several researchers, working from several different paradigms, to suggest that children come to the task of
word learning equipped with certain implicit biases or expectations which lead them to favor some types of conceptual relations
over others when ascribing meaning to a new word (Chomsky 1986, Landau and Gleitman 1985, Pinker 1984, Markman 1989,
Waxman 1990, 1991). The claim is that these expectations reduce the logical difficulty of word learning
by narrowing the range of candidate meanings a child will consider for any given new word.
Several such implicit biases have been proposed. For example, research in several different laboratories has converged on the
finding that children expect that the first word applied to a novel object will refer to the whole object and other members of its
basic-level kind, rather than to its parts or other salient aspects (Markman and Wachtel 1988, Taylor and Gelman 1988, Hall et
al. 1993, Markman 1989). Further evidence reveals that children, like adults, expect that different words will contrast in
meaning (Clark 1987, Golinkoff et al. 1992, Markman 1984, 1989; Merriman and Bowman 1989, Waxman and Senghas 1992).
A third type of bias or predisposition will serve as the focal point of this article. There is now considerable evidence that
children use the grammatical form of a novel word (e.g., count noun, proper noun, adjective) as a guide to determining its
meaning (Brown 1957, Katz et al. 1974, Landau and Gleitman 1985, Naigles 1990, Gleitman et al. 1987, Hall and Waxman
1993). For example, by two to three years of age, English-speaking children expect objects and object categories (e.g.,
flamingos, birds, animals) to be marked by count nouns (Markman and Hutchinson 1984, Waxman and Gelman 1986, Waxman
and Kosowski 1990, Waxman and Hall 1993); they expect substances (e.g., wood; gel) to be marked by mass nouns (Dickinson
1988, Soja et al. 1991); and they expect object properties (e.g., size, color) to be marked by modifiers (Gelman and Markman
1985, Hall et al. 1993, Taylor and Gelman 1988, Waxman 1990).
Notice that any linkage between grammatical form class and meaning requires that the word learner has (a) the linguistic
capacity to distinguish among the relevant syntactic categories (e.g., count noun vs. mass noun vs. adjective) in her language,
and (b) the conceptual or perceptual ability to appreciate the various kinds of relations among objects.
In sum, recent work has established that young children appreciate linkages between particular types of words (e.g., count
nouns vs., adjectives) and particular types of conceptual relations. These linkages help to explain how children so rapidly map
novel words appropriately to their meanings. For, although children appreciate myriad kinds of conceptual relations, only some
of these relations become lexicalized. Children do not sample randomly among these many possible relations when determining
the meaning of a new word. Instead, particular kinds of conceptual relations are favored in the context of learning particular
kinds of words.
Fig. 1
In an early study, we examined the impact of introducing novel nouns in a superordinate level categorization task (Waxman and
Gelman 1986). The experimenter introduced preschool children to three 'very picky' puppets and then displayed three typical
members (e.g., a dog, a horse, a cat) of a superordinate category (e.g., animal) to indicate the type of thing each puppet would
like. She then asked children to sort additional items for each puppet.
Children in the Instance condition, who sorted the additional pictures (various members of the classes animals, clothing, and
food) with no further instructions, performed only slightly better than would be expected by chance. This is consistent with
traditional reports that children have difficulty establishing superordinate relations (Inhelder and Piaget 1964, Rosch et al. 1976).
In contrast, children in the Novel Label condition, who encountered the same typical instances, but were also introduced to a
novel Japanese label for each superordinate class (e.g., 'These are the dobutsus, these are the gohans) formed superordinate
classes very successfully. Simply introducing them to novel labels led these children to classify as successfully as other children
who had been given familiar English superordinate labels for the classes (e.g., 'These are animals, these are clothes'). Clearly,
novel count nouns effectively oriented preschool children toward object categories and licensed the induction of superordinate
level categories. Data from Markman and Hutchinson. (1984) have revealed that count nouns also highlight basic level object
categories for 3- and 4-year-old children.
This result has linked one particular linguistic form class count nouns to object categories at the basic and superordinate levels.
This intriguing finding raised two important questions, both of which concern the specificity of the linkage: First, do novel count
nouns draw attention to object categories at all hierarchical levels, or is this effect specific to the basic and superordinate levels?
Second, are object categories highlighted in the context of word learning in general, or is this focus specific to learning novel
nouns?
To address these questions, I systematically compared the effect of introducing either novel nouns or novel adjectives in a
multiple-level classification task (Waxman 1990). Each child in this study classified pictures of objects from three contrastive
classes at three different hierarchical levels (subordinate, basic and superordinate) within the two different natural object
hierarchies (animals and food) depicted in figure 1. As in Waxman and Gelman (1986), the experimenter introduced three 'very
picky' puppets and revealed three typical members of each class to indicate the type of thing each puppet would like. Children in
the No Word condition sorted with no further clues. Children in the Novel Noun condition were introduced to a novel noun in
conjunction with the photographs from each class (e.g., These are the akas; these are the dobus). Children in the Novel Adjective
condition also heard novel words, but the words were presented within an adjectival syntactic context (e.g., 'These are the ak-ish
ones, these are the ones that are dob-ish').
The children in this experiment were very sensitive to the linguistic context in which the novel words were introduced. Novel
nouns facilitated object categorization at the superordinate, but not the subordinate level.1 In the Novel Adjective condition, this
pattern was completely reversed. Unlike nouns, novel adjectives supported the formation of subordinate level object categories,
but exerted no demonstrable effect at either the basic or superordinate levels. Thus, each of these different linguistic forms
facilitated object categorization at particular hierarchical levels.
An interesting parallel to this phenomenon in children has been documented across a wide variety of adult languages, both
spoken and signed. According to ethnobiological data, count nouns typically mark objects and object categories at the basic and
superordinate levels while adjectives tend to mark subordinate level distinctions (Berlin et al. 1973, Newport and Bellugi 1978).
Although these correlations between linguistic form and object categories at particular hierarchical levels are not perfect, they do
suggest that a relation between naming and object categorization may exist throughout the lifespan. (See Waxman, 1991, for a
more thorough discussion of this literature and its relevance to acquisition.)
The developmental finding that novel nouns and adjectives each produced systematic, but distinct, patterns of results at distinct
hierarchical levels reveals that three-year-olds are not only sensitive to the distinctions between these two linguistic forms, but
also consider linguistic form as relevant to establishing meaning. This important finding constitutes strong support for the
hypothesis that by three years of age, children appreciate powerful and precise linkages between word learning and conceptual
organization.
Notice, however, that the data from preschool-aged children cannot address crucial questions concerning the development of
these linkages in infants and toddlers. (See Nelson, 1988, for an extended discussion of this point.) Neither does the existing
evidence address questions concerning the universality of such linkages across languages. These questions become
1 In fact, although novel nouns facilitated classification at the superordinate level, they made classification at the
subordinate level more difficult. Children in the Novel Noun condition classified less successfully at the subordinate level
than did their agemates in the No Word condition. This very interesting result has spurned a whole independent line of
research (see Waxman et al. 1991) which suggests that children's interpretations of novel words are mediated by their
existing lexical and conceptual information.
especially engaging when they are considered in light of the normative pattern; also see Gopnik and Choi, 1990, for a
suggestion that this pattern may not obtain in the acquisition of Korean.) The milestones of early lexical acquisition have been
well-documented. Infants typically produce their first words at approximately 12 months of age and continue to add new words
to their productive vocabularies at a gradual pace. However, at approximately 1720 months, both the pace and character of
lexical acquisition changes dramatically. Infants exhibit a sudden burst in vocabulary development (Benedict 1979, Carey and
Bartlett 1978, Goldfield and Reznick 1990). Because most of the words acquired at this period and at this pace are basic level
count nouns (Dromi 1987, McShane 1980, Gentner 1982), this period has been dubbed the naming explosion. The naming
explosion draws to a close as infants begin to produce combinatorial speech, typically around their second birthdays.
Clearly, any thorough account of the early development of an appreciation of linkages between word learning and conceptual
organization must be compatible with these milestones in lexical development. Bearing this in mind, three broad alternative
accounts concerning the development of this appreciation warrant consideration.
to make the appropriate induction regarding the relation between linguistic form and meaning. If this account is correct, then
infants who have yet to commence the naming explosion should evidence no labeling effects. Instead, novel words should
influence object categorization for infants only after the onset of the naming explosion.
The second account posits that the specific linkages that we have observed in preschool children are available even at the very
onset of lexical acquisition. This alternative requires that preverbal infants expect (a) that there are distinct linguistic forms and
(b) that these distinctions are relevant to establishing meaning (c.f., Pinker 1984, Grimshaw 1981). If this account is correct,
then novel words should influence preverbal infants in just the same way as they influence older infants and preschool children.
That is, even infants who have yet to commence the naming explosion should expect that object categories will be marked by
count nouns and that object properties will be marked by modifiers.
The third alternative account strikes a balance between those outlined above. On this account, infants begin the process of
lexical acquisition equipped with a general rudimentary expectation that will become further refined over development and with
experience with the particular language to which they are exposed. On this account, infants will interpret words (independent of
their linguistic form) as referring to objects and object categories. This alternative is plausible because prior to about two years
of age, infants do not yet distinguish among linguistic form classes in their language production or comprehension (Bloom 1990,
Gordon 1985, McPherson 1991, Prasada 1993, Valian 1986). Later, at around 2 years of age, when infants do begin to
distinguish among the linguistic forms, so do they discover the finer correlations made in their native language between
particular linguistic forms and meaning.
There are actually two variants of this account. One possibility is that infants begin with an abstract expectation that particular
linguistic forms will mark particular kinds of conceptual relations; however, because they have not yet learned how these
linguistic distinctions are marked in their own language, they (mistakenly) interpret adjectives as they do nouns. Another
possibility is that infants begin with an expectation that words in general will mark object kinds; they only later learn that this
linkage is true for count nouns, but not for other grammatical categories (e.g., adjectives).
These two variants are quite difficult to disentangle empirically. For in either case, the patterns of performance should be the
same: prior to the onset of the naming explosion, infants should interpret all novel words, independent of their linguistic form, as
referring to objects and object
categories. This pattern would suggest that infants embark upon the process of lexical acquisition with a rudimentary linkage
between words and object categories that will become increasingly specific as a function of their experience with the particular
syntactic distinctions drawn in their language.
To adjudicate among these broad alternative accounts, I have initiated a detailed examination of the influence of words of
various linguistic forms on infants' and toddler's object categorization. I have also begun to examine young children learning
languages other than English.
To foreshadow, the results of these two complementary sets of experiments converge to provide initial support for the third
alternative account. Infants at 12 months begin the process of lexical acquisition with a general expectation that words
(independent of their linguistic form class) will refer to objects and object categories. This initial, rudimentary linkage becomes
increasingly specific in the second year of life, perhaps as a function of their own language experience.
our effects are attributable to the introduction of labels, per se, or to the arousing effects of motherese in general.
4.1. Evidence from toddlers: Forced choice procedures
Two-year-old children are at an important developmental crossroad. They have just completed the naming explosion and have
entered a phase of rapid syntactic and semantic development. To examine the influence of linguistic form class on object
categorization during this very active period of development, we compared 2-, 3- and 4-year-olds' performance in a match-to-
sample task. Children read through a picture book with an experimenter. On each page, there were 5 pictures: a target (e.g., a
cow), two taxonomic alternatives (objects from the same superordinate class as the target, e.g., a fox and a zebra), and two
thematic alternatives (objects that were thematically related to the target, e.g., a barn and milk) (Waxman and Kosowski 1990).
Children participated in one of three conditions. In the No Word condition, the experimenter pointed to the target and said, 'See
this? Can you find another one?' In the Novel Noun condition, she said, for example, 'See this fopin? Can you find another
fopin?' In the Novel Adjective condition, she said, for example, 'See this fop-ish one? Can you find another one that is fop-ish?'
The child and experimenter read through the book two times. On the second reading, the experimenter reminded the children of
their first choices and asked them to select another from the remaining (3) alternatives. In this way, we were able to examine the
conditions under which children make consistently taxonomic choices.
We reasoned that if children are sensitive to a specific link between nouns and supperordinate relations, then children in the
Novel Noun condition should be more likely than children in the Novel Adjective and No Word conditions to select the
superordinate category members on a page, and not the thematic alternatives. The results with the 3- and 4-year-olds supported
this prediction entirely. Only in the Novel Noun condition did children consistently select taxonomic alternatives. In both the
Novel Adjective and No Word conditions, children performed at chance. Thus, superordinate relations gained priority only in the
context of novel nouns, not in the context of word learning in general. Moreover, the effect of the novel noun was powerful
enough to guide both a first and second set of choices, even in the presence of a clear thematic alternative.
Two-year-olds' performance was very similar to that of the older preschoolers. Two-year-olds in the Novel Noun condition were
more likely than
those in the Novel Adjective and No Word conditions to select superordinate category members. However, when comparing
performance in each condition to chance, one slight developmental difference emerged: 2-year-olds in the Novel Noun condition
selected taxonomic alternatives more often than would be expected by chance; those in the No Word condition selected
taxonomic alternatives less often than would be expected by chance; those in the Novel Adjective condition were intermediate.
As predicted, 2-year-olds in this condition selected taxonomic alternatives less often than did children in the Novel Noun
condition; however their mean rate of taxonomic selections was greater than would be predicted by chance.
This difference in the 2-year-olds' interpretation of novel adjectives, though it is a slim one, provides an important clue into the
development of an appreciation of linkages between word learning and conceptual relations. Although the linkage between
nouns and object categories is clearly evident by two years of age, toddlers at this age also revealed some inclination to interpret
adjectives in a similar fashion. This suggests that 2-year-old children may overextend the linkage between count nouns and
object categories to include new words from other linguistic form classes. This possibility is consistent with the hypothesis that
infants embark upon the task of word learning with an assumption that words (not specifically nouns) highlight object
categories. If this is the case, then the tendency to interpret adjectives, like nouns, as referring to object categories should be
even more pronounced in younger subjects.
4.2. Evidence from infants: Novelty-preference procedures
With this question in mind, we designed a procedure to examine the impact of novel words on object categorization in 12- and
13-month-old infants (Markow and Waxman 1992, 1993). Infants at this age are acutely interested in human language, but
produce very little, if any, of their own. To accommodate the very active nature of the infants, we developed an object
manipulation task, analogous to standard novelty-preference procedures (see also Ruff 1986, Ross 1980, Oakes et al. 1991).
In the familiarization phase, the experimenter offered the child four toys from a given category (e.g., four different animals) one
at a time, in random order, for 30 secs each. This was immediately followed by the test phase in which the experimenter
presented both (a) a new member of the familiar category (e.g., another animal) and (b) an object from a novel contrasting
category (e.g., a fruit). In both phases, infants manipulated the objects freely. Each infant
completed this procedure four times, with four different sets of objects: 2 basic level sets (cows vs. horses; cars vs. planes) and 2
superordinate level sets (animals vs. vehicles; tools vs. animals).
Infants were assigned to one of three conditions, which differed only in the experimenter's comments during the Familiarization
phase. See figure 2. In the Novel Noun condition, the experimenter labeled objects during the familiarization phase (e.g., 'See
the auto'). In the Novel Adjective condition, she introduced the novel word in an adjectival context (e.g., 'See the aut-ish one').
In the No Word condition, she drew attention to each object but offered no label (e.g., 'See this'). The test phase was identical
for infants in all three conditions. The experimenter introduced the test pair (e.g., cow vs. horse), saying, 'See what I have'. No
object labels were introduced in the test phase.
Because infants at the age do not yet distinguish among linguistic form classes such as noun and adjective in their own language
production or comprehension (Bloom 1990, Gordon 1985, McPherson 1991, Prasada 1993, Valian 1986), it is unlikely that they
would consider linguistic form as relevant to establishing meaning. We therefore hypothesized that for infants at this
developmental moment, object categories would be highlighted in word learning in general, not by nouns in particular. We
predicted that infants in both the Novel Noun and Novel Adjective conditions would categorize more readily than would infants
in the No Word condition. More specifically, we predicted that infants hearing novel words (be they nouns or adjectives) would
show (1) a greater decrease in attention to the objects over the familiarization phase, and (2) a stronger preference for the novel
object in the test phase than should infants in the No Word condition.
The results of the experiment were consistent with these predictions. Consider first the data from the familiarization phase,
depicted in figure 3. We calculated individual contrast scores to test the prediction that infants would show a linear decrease in
attention across the four familiarization trials. At the basic level (figure 3a), infants in all three conditions showed this linear
trend. This is consistent with arguments concerning the primacy of the basic level. However, on the superordinate level trials
(figure 3b), only infants hearing novel words (in the Novel Noun and Novel Adjective conditions) showed a decrease in
attention.
During the test trials, the effects of novel nouns and adjectives were also quite comparable. Figure 4 displays the proportion of
attention the infants devoted to the novel test object. At the basic level, infants in both the Novel Noun and Novel Adjective
conditions showed a reliable novelty preference; those in the No Word condition showed no such preference. At the
superordinate
Fig. 2.
Fig. 3.
level, only infants in the Novel Noun condition showed this preference.
These are very striking results, for they reveal a nascent appreciation of a linkage between words and object catagories in infants
who have yet to commence the naming explosion. This finding weakens considerably the first alternative account that prior to
the naming explosion, infants fail to appreciate any linkages between word learning and conceptual organization. Clearly, novel
words do focus infants' attention on object categories. These results also weaken the second alternative account that infants
embark upon the process of word learning with a fully developed appreciation of the specific linkages between types of words
(e.g., nouns and adjectives) and types of meaning. Instead, these data support the third view that 12- and 13-month-old infants
begin the process of word learning with a general expectation that words, be they nouns or adjectives, will refer to object
categories.
How do the pieces of evidence from the infants fit together with the data from the 2-year-olds? Together, the data suggest that
from the earliest stages of lexical acquisition, count nouns focus infants' attention on object categories. Indeed, we have also
obtained converging evidence on this point with 16- and 20-month-old subjects, using an entirely different method (Waxman
Fig. 4.
and Hall 1993). However, initially, this focus is not specific to count nouns. At 12 and 13 months, both nouns and adjectives
focus infants' attention on object categories. The infants' ability to distinguish between linguistic form classes and to use these
distinctions as a guide to establishing word meaning must undergo important developmental change during the second year. By
the time they are approximately two years of age, infants begin to tease apart the syntactic form classes; by two and a half years,
we begin to get evidence that they treat nouns and adjectives differently with respect to object categorization (Waxman and
Kosowski 1990, Taylor and Gelman 1988).
Put differently, the data suggest that the affinity between count nouns and object categories is evident even in preverbal infants,
but the specificity of this
affinity increases over development. This pattern fits nicely with some anecdotal evidence concerning early word learning. One
interesting observation has been made by several researchers: Prior to the naming explosion, infants seem to interpret most
words, independent of syntactic form, as referring to objects and categories of objects. This is illustrated by the oft-cited
anecdote regarding infants' initial interpretation of adjectives like hot. In the earliest stages of lexical acquisition, when children
hear, 'Don't touch that. It's hot', they often interpret hot as referring to an object (e.g. a stove), rather than to a salient property of
that object. Indeed, all of the data documenting that children use syntactic form to affix meaning to a new word comes from
children who have at least embarked upon the naming explosion (Hall 1992, Hall et al. 1993, Katz et al. 1974, Markman and
Wachtel 1988, Soja et al. 1991, Waxman 1990, Waxman and Kosowski 1990).
Based on the data reviewed thus far, I have suggested that the appreciation of a linkage between count nouns and object
categories undergoes no developmental change: it appears to emerge early, requiring little, if any, experience with the language.
In contrast, an appreciation of specific linkage between other grammatical categories (e.g., adjectives, mass nouns, verbs) and
meaning emerges later in development and may depend upon language experience.
Notice, however, that this suggestion is based almost exclusively on English-speaking subjects. This is a serious limitation, for it
is important to determine whether the patterns observed in our English-speaking samples are universal to human development.
(See Slobin, 1985, for excellent discussions of the necessity of cross-linguistic work in establishing theories of acquisition.)
To date, our sample includes unilingual speakers of two different language communities. The French-speaking preschool
children came from Montreal, Canada. All of the children in this sample were members of families for whom French was the
language spoken at home. Moreover, these children were enrolled in French- speaking preschool programs. The Spanish-
speaking children came from Buenos Aires, Argentina. Despite the similarities among these two Indo-European languages, there
are variations in their grammars that bear on the questions at hand. For example, in Spanish and French, as opposed to English,
each object or class of objects has associated with it a grammatical gender. Therefore, the words (e.g., nouns, adjectives,
determiners) which refer to these carry gender markings as well. One possibility is that the gender markings associated with the
various terms would influence the children's interpretations of the novel words. Briefly stated, we found that this was not the
case (Waxman et al., in preparation).
Another difference in these languages was of greater potential relevance. In Spanish and French, nouns are typically dropped if
the grammatical subject is recoverable from context. If I have six mugs before me, in English, I distinguish them linguistically
by pairing the noun 'mug' with an adjective (e.g., 'the big mug' or'the big one'). In Spanish, such constructions are
ungrammatical. Instead, the noun is dropped, leaving the determiner and adjective (e.g., 'el grande') to refer to the intended mug.
This construction is also common (although not obligatory) in French, where one might ask for 'la petite' to refer to the smallest
mug. In such instances, adjectives have referential status and convey nominal information. This grammatical difference in the
referential status of adjectives may have consequences for children's interpretations of novel words. Perhaps in Spanish, novel
adjectives, like nouns, will highlight category relations. Perhaps in Spanish, the influence of novel adjectives is less distinct from
that of novel nouns.
To address this hypothesis, we adapted the five-item forced-choice method (Waxman and Kosowski 1990) to test 2- to 4-year-
old unilingual speakers of French and Spanish (Waxman et al., in preparation). For French-speaking preschoolers, the results
were identical to those obtained in English: Children in the Novel Adjective and No Word conditions demonstrated no particular
preferences; only those in the Novel Noun condition chose predominantly taxonomically related items. These data support the
view that the specific linkage between nouns and object categories in English is evident in French as well.
However, our results with the Spanish-speaking children were different: Like English- and French-speaking children, Spanish-
speaking preschoolers
in the Novel Noun condition exhibited a strong preference for taxonomically related items; those in the No Word condition
showed no particular preference for taxonomic, thematic, or gender-related matches. The essentially random performance in this
condition replicates the results from our other two language samples. However, unlike their English- and French-speaking
counterparts, Spanish-speaking children in the Novel Adjective conditions did display a systematic inclination toward the
taxonomically related items. In Spanish, then, adjectives also seem to focus young children's attention on superordinate category
relations. This finding has now been replicated twice with two independent groups of Argentine preschool children (Waxman et
al., in preparation).
This observed difference in Spanish-speaking children's interpretation of novel adjectives cannot be attributed to any procedural
differences between the Spanish and English protocols, for the procedures employed were identical in all languages. Neither can
the differences be attributed to the stimuli themselves, for when we tested a group of English-speaking children using the
picture book designed for the Spanish-speakers, the data were identical to the original English findings (Waxman and Kosowski
1990).
This difference, then, may indeed be due to cross-linguistic differences in the referential status of adjectives. In English,
adjectives do not (as a rule) convey object reference. Although 2-year-old English-speakers are somewhat inclined to interpret
adjectives as referring to objects and classes of objects, this is not the case for 3- and 4-year-olds. In Spanish, where adjectives
do, in fact must, convey nominal information, experience with the language may lead to a different outcome. Here, even 3- and
4-year-olds often interpret adjectives as referring to objects and classes of objects.
Thus, the role of adjectives appears different in Spanish than in French or in English. And it appears to differ in a predictable
way, given the grammar of the adult languages. There are several possible explanations for this difference. First, it is possible
that in Spanish, the grammatical distinction between nouns and adjectives develops over a more protracted period. It is also
possible that the grammatical distinction between these linguistic forms is made early, but that the appreciation of specific
linkages between linguistic form and meaning develops over a more protracted period in Spanish. Additional research is
currently underway to examine these possible explanations.
Let us now integrate these findings from the French- and Spanish-speaking children with those from children, toddlers, and
infants learning English. The results of these complementary lines of research converge to provide initial support for the
hypothesis that from the earliest stages of lexical acquisition,
infants expect that words (independent of their grammatical form) will refer to objects and categories of objects. Later, this
general linkage gives way to more specific pairings between particular grammatical forms and particular types of meaning. The
affinity between count nouns and object categories emerges early and is evident in all three languages we have examined to date.
In contrast, the more specific linkages for adjectives emerge later and may vary, depending upon the language being acquired.
This account of the child's emerging appreciation of linkages between linguistic and conceptual organization is consistent with
other major milestones in lexical acquisition. It gains further plausibility by virtue of the fact that it is also consistent with cross-
linguistic evidence concerning the linguistic categories noun and predicate (including, e.g., adjectives, verbs).
language gets along without a full Adjective class Some (languages) express all adjectival concepts through intransitive
verbs others express some through nouns and some through verbs and others invoke further means.'
Thus, the syntactic category adjective differs widely across languages. Some languages (like English and the Australian
language Dyirbal) have extensive and elaborate adjective systems; others (like Igbo and the Bantu languages) have very few
adjectives (Dixon 1982). Further, adjectives and other predicates appear to be acquired later than nouns. Moreover, there is
question as to whether there is anything analogous to the naming explosion for the acquisition of predicates (Gopnik 1988).
Finally, members of the predicate system are both semantically and syntactically dependent upon nouns.
tone (Waxman and Balaban 1992). Thus, by 9 months of age, labels facilitate categorization of objects. Moreover, this labeling
effect appears to be tied to language, rather than to auditory stimulation, in general. (But see Roberts and Jacob 1991, for a
different view.)
However, at this point in development, the data do not support the claim that infants make systematic distinctions among words
from various form classes. The data from our laboratory reveal that at 12 months, infants tend to interpret most words,
independent of their syntactic status, as referring to objects or categories of objects. Therefore, prior to the onset of the naming
explosion, there appears to be a general (and possibly universal) linkage between words (not specifically count nouns) and object
categories.
As the naming explosion draws to a close, and as infants begin to distinguish among the linguistic form classes (e.g., count
nouns, mass nouns, adjectives, verbs) in their own language production and comprehension (Bloom 1990, Gordon 1985,
McPherson 1991, Prasada 1993, Valian 1986), infants probably begin to consider syntactic form class as relevant to determining
a novel word's meaning. By two to three years of age, children begin to reveal an appreciation of specific linkages between
particular linguistic forms and particular types of meaning.
For example, English-speaking children expect that object categories will be marked linguistically by count nouns (Brown 1957,
Markman and Wachtel 1988, Waxman 1990, Waxman and Kosowski 1990, Taylor and Gelman 1989, Waxman and Senghas
1991), that substances will be marked by mass nouns (Dickinson 1988, Soja et al. 1991), that individuals will be marked by
proper nouns (Katz et al. 1974, Gelman and Taylor 1984, Hall 1992), and that various properties (e.g., size, color, temperament)
will be marked by modifiers (Hall et al. 1993, Markman and Wachtel 1988, Waxman 1990, Taylor and Gelman 1988).
It is interesting to note that even at this point, when children are clearly capable of using syntactic information as a cue to
meaning, they do not do so invariably. Instead, the tendency to use syntactic information is modified considerably by the child's
existing lexical and conceptual knowledge (Au 1990, Banigan and Mervis 1988, Callanan 1985, Chi 1983, Mervis 1984, Mervis
and Mervis 1988, Waxman et. al. 1991; Hall et al. 1993). Children's interpretation of a novel word depends, at least in part,
upon whether or not they already have an existing label for the referent object. If the object is familiar (that is, if children have
already acquired a count noun label for the object), then they use syntactic information as a guide in interpreting the meaning of
subsequent words applied to that object. For example, if a child is
taught a new noun for a familiar object (e.g., a dog), the child exhibits a strong tendency to interpret the word as referring to an
object category that is subordinate to (e.g., collie), superordinate to (e.g., mammal), or overlapping with (e.g., household pet) the
familiar basic level category (Taylor and Gelman 1989, Waxman and Senghas 1992). For a new adjective, children tend to
interpret the word as referring to a salient property, substance or part of the object (Hall et al. 1993, Markman and Wachtel
1988).
However, if the object is unfamiliar (that is, if children have not yet acquired a count noun for the object), they tend to rely
upon an earlier pattern of behavior; they tend to interpret any word applied to that object (be it a count noun, proper noun, or
adjective), as referring to an object category, typically at the basic level (Hall 1992; Hall et al. 1993, Markman and Wachtel
1988). Thus, children are attentive to syntactic form in ascribing meaning only after a count noun has been assigned to that
object and to other members of its kind.
Of course, linkages like the ones described here cannot tell the entire developmental story, for children do not learn meaning on
the basis of syntactic context alone. Additional research with preverbal infants and with children learning diverse languages will
further clarify when these various linkages between linguistic and conceptual development emerge, how they are modified by
linguistic input, and how they are modulated within the context of the child's existing fund of knowledge.
References
Anglin, J.M., 1977. Word, object, and conceptual development. New York: Norton.
Au, T.K., 1990. Children's use of information in word learning. Journal of Child Language 17, 393416.
Baldwin, D.A. and E.M. Markman, 1989. Establishing word-object relations: A first step. Child Development 60, 381398.
Banigan, R.L. and C.B. Mervis, 1988. Role of adult input in young children's category evolution, II: An experimental study.
Journal of Child Language 15, 493504.
Bauer, P.J. and J.M. Mandler, 1989. Taxonomies and triads: Conceptual organization in one-and two-year-olds. Cognitive
Psychology 4, 100110.
Benedict, H., 1979. Early lexical development: Comprehension and production. Journal of Child Language 6, 183200.
Berlin, B., D. Breedlove and P. Raven, 1973. General principles of classification and nomenclature in folk biology. American
Anthropologist 75, 214242.
Bloom, P., 1990. Syntactic distinctions in child language. Journal of Child Language 17, 343355.
Bornstein, M.H., 1984. A descriptive taxonomy of psychological categories used by infants. In: C. Sophian (ed.), Origins of
cognitive skills, 303338. Hillsdale, NJ: Erlbaum.
Brown, R., 1957. Linguistic determinism and the part of speech. Journal of Abnormal and Social Psychology 55, 15.
Brown, R., 1958. Words and things. Glencoe, IL: The Free Press.
Bruner, J.S., J.J. Goodnow and G.A. Austin, 1956. A study of thinking, New York: Wiley.
Callanan, M.A., 1985. How parents label objects for young children: The role of input in the acquisition of category hierarchies.
Child Development 56, 508523.
Carey, S., 1978. The child as word learner. In: M. Halle, J. Bresnan, G.A. Miller (eds.), Linguistic theory and psychological
reality. Cambridge, MA: MIT Press.
Carey, S., 1990. On some relations between the description and the explanation of developmental change. In: G. Butterworth, P.
Bryant (eds.), Causes of development: Interdisciplinary perspectives, 135157. New York: Harvester Wheatsheaf.
Carey, S. and E. Bartlett, 1978. Acquiring a single new word. Papers and Reports on Child Language Development (Department
of Linguistics, Stanford University) 15, 1729.
Chi, M.T.H., 1983. Knowledge-derived categorization in young children. In: D.R. Rogers, J.A. Sloboda (eds.), The acquisition
of symbolic skill, 327332. New York: Plenum Press.
Chomsky, N., 1986. Knowledge of language: Its nature, origin, and use. Westport, CT: Praeger.
Clark, E.V., 1987. The principle of contrast: A constraint on language acquisition. In: B. McWhinney (ed.), Mechanisms of
language acquisition: The 20th annual Carnegie Symposium on Cognition, 133. Hillsdale, NJ: Erlbaum.
Dickinson, D.K., 1988. Learning names for materials: Factors constraining and limiting hypotheses about word meaning.
Cognitive Development 3, 1535.
Dixon, R.M.W., 1982. Where have all the adjectives gone? Berlin: Mouton.
Dromi, E., 1987. Early lexical development. Cambridge: Cambridge University Press.
Echols, C., 1992. Developmental changes in attention to labeled events during the transition to language. Paper presented at the
International Conference on Infancy Studies, Miami, FL.
Fernald, A., 1992. Human maternal vocalisations to infants as biologically relevant signals: An evolutionary perspective. In:
J.H. Barkow, L. Cosmides, J. Tooby (eds.), The adapted mind: Evolutionary psychology and the generation of culture, 391428.
New York: Oxford University Press.
Gelman, S.A. and E.M. Markman, 1985. Implicit contrast in adjectives vs. nouns: Implications for word-learning in
preschoolers. Journal of Child Language 12, 125143.
Gelman, S.A. and M. Taylor, 1984. How two-year-old children interpret proper and common nouns for unfamiliar objects.
Child Development 55, 15351540.
Gentner, D., 1985. Why nouns are learned before verbs: Linguistic relativity versus natural partitioning. In: S. Kuczaj (ed.),
Language development: Language, thought, and culture, 301334. Hillsdale, NJ: Erlbaum.
Gleitman, L.R., H. Gleitman, B. Landau and E. Wanner, 1987. Where learning begins: Initial representations for language
learning. In: F. Newmeyer (ed.), The Cambridge Linguistic Survey, Vol. III, 150-193. New York: Cambridge University Press.
Goldfield, B.A. and J.S. Reznick, 1990. Early lexical acquisition: Rate, content, and the vocabulary spurt. Journal of Child
Language 17, 171183.
Golinkoff, R., K. Hirsh-Pasek, L.M. Bailey and R.N. Wenger, under review. Young children and adults use lexical principles to
learn new nouns.
Goodman, N., 1983. Fact, fiction, and forecast. Cambridge, MA: Harvard University Press.
Gopnik, A., 1988. Three types of early word: The emergence of social words, names and cognitive-relational words in the one-
word stage and their relation to cognitive development. First Language 8, 4970.
Gopnik, A. and S. Choi, 1990. Do linguistic differences lead to cognitive differences? A cross-linguistic study of semantics and
cognitive development. First Language 10, 199215.
Gopnik, A. and A. Meltzoff, 1987. The development of categorization in the second year and its relation to other cognitive and
linguistic developments. Child Development 58, 15231531.
Gordon, P., 1985. Evaluating the semantic categories hypothesis: The case of the count/mass distinction. Cognition 20, 209242.
Grimshaw, J., 1981. Form, function, and the language acquisition device. In: C.L. Baker, J. McCarthy (eds.), The logical
problem of language acquisition. Cambridge, MA: MIT Press.
Hall, D.G., 1991. Acquiring proper names for familiar and unfamiliar animate objects: Two-year-olds' word-learning biases.
Child Development 62(5), 11421154.
Hall, D.G., 1992. Basic level kinds and individuation. Unpublished manuscript, MRC Cognitive Development Unit, London.
Hall, D.G., in press. How children learn common nouns and proper names. In: J. Macnamara, G. Reyes (eds.), The logical
foundations of cognition. Oxford: Oxford University Press.
Hall, D.G. and S.R. Waxman, 1993. Assumptions about word meaning: Individuation and basic-level kinds. Child Development
64, 15501570.
Hall, D.G., S.R. Waxman and W.R. Hurwitz, 1993. The development of sensitivity to syntactic cues: Evidence from
preschoolers learning property terms for familiar and unfamiliar objects. Child Development 64, 16511664.
Huttenlocher, J. and F. Lui, 1979. The semantic organization of some simple nouns and verbs. Journal of Verbal Learning and
Verbal Behavior 18, 141162.
Inhelder, B. and J. Piaget, 1964. The early growth of logic in the child. New York: Newton.
Kaplan, T., K. Fox, D. Scheuneman and L. Jenkins, 1991. Cross-modal facilitation of infant visual fixation: Temporal and
intensity effects. Infant Behavior and Development 14, 83109.
Katz, N., E. Baker and J. Macnamara, 1974. What's in a name? A study of how children learn common and proper names. Child
Development 45, 469473.
Landau, B. and L.R. Gleitman, 1985. Language and experience: Evidence from the blind child. Cambridge: Harvard University
Press.
Landau, B., L.B. Smith and S.S. Jones, 1988. The importance of shape in early lexical learning. Cognitive Development 3,
299321.
Leslie, A.M. and S. Keeble, 1987. Do six-month old infants perceive causality? Cognition 25, 265288.
Macnamara, J., 1986. A border dispute: The place of logic in psychology. Cambridge: MIT Press.
Mandler, J.M., 1988. How to build a baby: On the development of an accessible representational system. Cognitive
Development 3, 113136.
Mandler, J.M. and P.J. Bauer, 1988. The cradle of categorization: Is the basic level basic? Cognitive Development 3, 237264.
Mandler, J.M., P.J. Bauer and L. McDonough, 1991. Separating the sheep from the goats: Differentiating global categories.
Cognitive Psychology 23(2), 263299.
Mandler, J.M., R. Fivush and J.S. Reznick, 1987. The development of contextual categories. Cognitive Development 2, 339354.
Markman, E.M., 1984. The acquisition and hierarchical organization of categories by children. In: C. Sophian (ed.), Origins of
cognitive skills, 371406. Hillsdale, NJ: Erlbaum.
Markman, Ellen M., 1989. Categorization and naming in children. Cambridge, MA: MIT Press.
Markman, E. and J.E. Hutchinson, 1984. Children's sensitivity to constraints on word meaning: Taxonomic vs. thematic
relations. Cognitive Psychology 16, 127.
Markman, E.M. and G.F. Wachtel, 1988. Children's use of mutual exclusivity to constrain the meanings of words. Cognitive
Psychology 20, 121157.
Markow, D.B. and S.R. Waxman, 1992. The influence of labels on 12-month-olds' category formation. Paper presented at the
Eighth International Conference on Infant Studies, Miami, FL.
Markow, D.B. and S.R. Waxman, 1993. The impact of introducing novel nouns versus novel adjectives in 12-month-olds'
categorization. Paoer presented at the Meeting of the Society for Research in Child Development, New Orleans, LA.
McPherson, L., 1991. A little goes a long way: Evidence for a perceptual basis of learning the noun categories COUNT and
MASS. Journal of Child Language 18, 315338.
McShane, J., 1980. Learning to talk. Cambridge: Cambridge University Press.
Mendelson, M.J. and M.M. Haith, 1976. The relation between audition and vision in the human newborn. Monographs of the
Society for Research in Child Development 41(4), 172.
Merriman, W.E. and L.L. Bowman, 1989. The mutual exclusivity bias in children's word learning. Monographs of the Society
for Research in Child Development, Serial No. 220, 54(3/4), 123.
Mervis, C.B., 1984. Early lexical development: The contributions of mother and child. In: C. Sophian (ed.), Origins of cognitive
skills, 339370. Hillsdale, NJ: Erlbaum.
Mervis, C.B., 1987. Child-basic categories and early lexical development. In: U. Neisser (ed.), Concepts and conceptual
development: Ecological and intellectual factors in categorization, 201233. Cambridge: Cambridge University Press.
Mervis, C.B. and M.A. Crisafi, 1982. Order of acquisition of subordinate-, basic-, and superordinate-level categories. Child
Development 53, 258266.
Mervis, C.G. and C.A. Mervis, 1988. Role of adult input in young children's category evolution: I. An observational study.
Journal of Child Language 15(2), 257272.
Naigles, L., 1990. Children use syntax to learn verb meanings. Journal of Child Language 17, 357374.
Nelson, K., 1973. Structure and strategy in learning to talk. Monographs of the Society for Research in Child Development,
Serial No. 149, 38(1/2), 1136.
Nelson, K., 1983. The conceptual basis for language. In: T. Seiler, W. Wannanmacher (eds), Concept development and the
development of word meaning, 173190. Berlin: Springer.
Nelson, K., 1988. Constraints on word learning? Cognitive Development 3, 221246.
Newport, E.L. and U. Bellugi, 1978. Linguistic expression of category levels in a visual-gestural language: A flower is a flower
is a flower. In: E. Rosch, B.B. Lloyd (eds.), Cognition and catagorization, 4971. Hillsdale, NJ: Erlbaum.
Oakes, L.M., K.L. Madole and L.B. Cohen, 1991. Infants' object examining: Habituation and categorization. Cognitive
Development 6(4) 377392.
Paden, L., 1975. The effects of variations of auditory stimulation (music) and interspersed stimulus procedures on visual
attending behavior in infants. Monographs of the Society for Research in Child Development, 39(158), 2941.
Petitto, L.A., 1987. On the autonomy of language and gesture: Evidence from the acquisition of personal pronouns in American
Sign Language. Cognition 27(1), 152.
Pinker, S., 1984. Language learnability and language development. Cambridge, MA: Harvard University Press.
Prasada, S., 1993. Young children's linguistic and non-linguistic knowledge of solid substances. Cognitive Development 8,
83104.
Quine, W.V., 1960. Word and object. Cambridge, MA: MIT Press.
Roberts, K. and M. Jacob, 1991. Linguistic vs. attentional influences on nonlinguistic categorization in 15-month-old infants.
Cognitive Development 6(4), 355375.
Rosch, E., C.B. Mervis, W.D. Gray, D.M. Johnson and P. Boyes-Braem, 1976. Basic objects in natural categories. Cognitive
Psychology 8, 382439.
Ross, G.S., 1980. Categorization in 1- to 2-year-olds. Developmental Psychology 16, 391396.
Ruff, H.A., 1986. Components of attention during infants' manipulative exploration. Child Development 57(1), 105114.
Self, P.A., 1975. Control of infants' visual attending by auditory and interspersed stimulation. Monographs of the Society for
Research in Child Development 39(158), 1628.
Slobin, D.I., 1985. The crosslinguistic study of language. Hillsdale, NJ: Erlbaum.
Soja, N., S. Carey and E. Spelke, 1991. Ontological categories guide young children's inductions about word meaning: Object
terms and substance terms. Cognition 38, 179211.
Talmy, L., 1985. Lexicalization patterns: Semantic structure in lexical forms. In: T. Shopen (ed.), Language typology and
syntactic description, Vol. 3: 57149. Cambridge: Cambridge University Press.
Taylor, M. and S.A. Gelman, 1988. Adjectives and nouns: Children's strategies for learning new words. Child Development 59,
411419.
Taylor, M. and S.A. Gelman, 1989. Incorporating new words into the lexicon: Preliminary evidence for language hierarchies in
two-year-old children. Child Development 60, 625636.
Tomasello, M., 1988. The role of joint attention in early language development. Language Sciences 11, 6988.
Valian, V.V., 1986. Syntactic categories in the speech of young children. Developmental Psychology 22, 562579.
Vygotsky, L., 1962. Thought and language. Cambridge: MIT Press.
Waxman, S.R., 1990. Linguistic biases and the establishment of conceptual hierarchies: Evidence from preschool children.
Cognitive Development 5, 123150.
Waxman, S.R., 1991. Convergences between semantic and conceptual organization in the preschool years. In: S.A. Gelman, J.P.
Byrnes (eds.), Perspectives on language and thought, 107145. Cambridge: Cambridge University Press.
Waxman, S.R. and M.T. Balaban, 1992. The influence of words vs. tones on infants' categorization. Paper presented at the
Eighth International Conference on Infant Studies, Miami, FL.
Waxman, S.R. and R. Gelman, 1986. Preschoolers' use of superordinate relations in classification and language. Cognitive
Development 1, 139156.
Waxman, S.R. and D.G. Hall, 1993. The development of a linkage between count nouns and object categories: Evidence from
15 to 21 month old infants. Child Development 64, 12241241.
Waxman, S.R., and L. Heim, 1991. Nouns highlight category relations in 13-month-olds. Paper presented at the annual meeting
of the Society for Research in Child Development, Seattle, WA.
Waxman, S.R. and T.D. Kosowski, 1990. Nouns mark category relations: Toddlers' and preschoolers' word-learning biases.
Child Development 61, 14611473.
Waxman, S.R. and A. Senghas, 1992. Relations among word meanings in early lexical development. Developmental Psychology
28(5), 862873.
Waxman, S.R., E.F. Shipley and B. Shepperson, 1991. Establishing new subcategories: The role of category labels and existing
knowledge. Child Development 62. 127138.
Waxman, S.R., A. Senghas, D. Ross and L. Benveniste, in preparation. Word-learning biases in French- and Spanish-speaking
children.
Younger, B.A. and L.B. Cohen, 1983. Infant perception of correlations among attributes. Child Development 54, 858867.
Younger, B.A. and L.B. Cohen 1986. Developmental changes in infants' perception of correlations among attributes. Child
Development 57(3), 804815.
what they perceive and know. Of chief importance is their perception and knowledge of objects, the motions they undergo, and
the locations they occupy.
But how does perceiving or knowing about objects help in learning language? As Quine (1960) argued, the novice analyst
hearing an utterance might be free to conjecture any of an infinite number of hypotheses about its meaning, each compatible
with what he observes. Such rampant indeterminacy would, of course, prevent learners from converging on the correct meaning
of a word. However, human infants do not seem to find themselves mired in such an impossible learning situation; quite the
contrary, they manage to map many words onto coherent meanings in a relatively short period of time. Accounting for this
phenomenon in terms of constraints on learning both linguistic and non-linguistic is currently a major challenge for both
linguists and psychologists.
One source of constraints is the human spatial-cognitive system. Recent non-linguistic investigations have shown that the spatial
system contains two separate components: one dedicated to representing objects (the 'what system') and the other to representing
locations (the 'where system'; Ungerleider and Mishkin 1982). The two components exhibit quite different properties, perhaps
not surprising if one evolved to represent objects, the other, places. What is intriguing about each component, however, is that
broad properties of each appear to converge with the differences in the ways that languages express objects and locations
(Landau and Jackendoff 1993).
These convergences could impose significant constraints on learning. For if the design of the 'what' and 'where' system engages
only certain geometric representations, then only these will be available for mapping onto language. Learners possessing such a
representational system would not be subject to Quine's radical indeterminacy; rather, they would approach language learning
equipped with strong biases to consider only certain properties of the world as 'relevant' to naming objects vs. places.
In arguing for such constraints, several lines of evidence are required. First, there must be differences in the way that languages
universally encode the two ontological categories of object and place. Second, these linguistic universals should show properties
that are similar to those found in non-linguistic representations of 'what' and 'where'. Finally, there must be evidence that
children learning names for objects and places generalize on the basis of different geometric properties in the two cases that,
essentially, the problem of learning names for objects and places is bifurcated, with each domain drawing on quite different
kinds of representations.
In the sections that follow, I first provide evidence for such a bifurcation in young children. This phenomenon will set the stage
for asking in detail about the nature of 'what' and 'where' representations in languages and language learning. I then explore in
detail the spatial representations underlying object naming and those underlying the language of places. Finally, I discuss
questions about the possible contributions of spatial representation vs. language itself to these different representations.
Before turning to the findings, however, the reader might wonder how learners know in the first place that a particular part of an
utterance expresses the name of an object vs. a place. Our research is neutral on how learners first come to know the mapping
between forms and these broad ontological distinctions. In fact, our empirical research begins at the point (age 2 ½ years) when
young children can already use syntactic context to create a coarse division of the hypothesis space. By this time, they can use
the syntactic and morphological properties of count and mass nouns, adjectives, and verbs to disambiguate among various
readings in English. For example, they can use count/mass syntax to distinguish between object and substance readings (Brown
1957; P. Bloom, this volume); noun/adjective syntax to distinguish between object and property readings (Landau et al. 1992a);
and number of verb arguments to disambiguate between agentive vs. non-agentive readings (Gleitman 1990, Fisher et al. this
volume). In our research, we use children's knowledge of syntax and morphology as a wedge in asking about the
representational differences between object and place.
I assume that such broad mappings between form and meaning are achieved quite early in language learning. For example, pre-
linguistic children can represent major ontological categories, such as OBJECT, PLACE, and ACTION: Infants know that
objects continue to exist over spatial and temporal displacements (see Carey, this volume; Spelke 1990) and that objects occupy
spatial locations which are constant over one's own movements (Landau and Spelke 1988, McKenzie et al. 1984, Rieser 1979).
Objects, static locations, and trajectories are mapped to language in the child's earliest vocabulary, including words like in, on,
up and down (Brown 1973, L. Bloom 1973).
Children also can represent lexical and phrasal categories such as N, NP, P, PP, V, VP. These categories might be identified in
the speech stream by correspondences between certain properties of the wave form and major lexical, phrasal, or clausal units
(i.e. by 'prosodic bootstrapping', see Gleitman et al. 1988, Kelly, this volume). Another possibility is that nouns are discovered
first, when the child hears a single word in isolation (usually an
object name) together with the sight of an object (Gillette and Gleitman, forthcoming). Given only a small group of nouns, the
remaining categories in the syntactic configuration may be deducible from internal linguistic principles (Grimshaw, this
volume).
In either case, it seems reasonable to assume that children also approach language learning with certain expectations about the
canonical mappings between the major ontological categories and the syntactic categories (Grimshaw 1981). For example,
OBJECT is mapped canonically onto N, and PLACE more properly, spatial-locative-functions onto Preposition, PP, or other
argument-taking categories.1
Once the learner has come to know how objects and places are syntactically expressed in his or her language (in English, by
count nouns and PPs respectively), he or she can infer that a novel lexical item falls into the domain of either object or place.
Now, from the different structural properties of the 'what' and 'where' system, the relevant geometric generalizations should
automatically fall out.
Given this view, the syntactic properties of a novel word can provide the learner with quite coarse semantic distinctions (e.g. the
major ontological categories and broad semantic classes). The remainder of the 'learning' of a word must be discovered from
sources outside of syntax2 (see Grimshaw, this
1 The encoding of spatial location functions varies across languages: English encodes them canonically as prepositions
and PPs, while other languages incorporate the location into the verb (Talmy 1985.) In this paper, I focus on English
prepositions, which have been analyzed in detail by linguists, computer scientists and psychologists (Miller and Johnson-
Laird 1976, Herskovits 1986, Jackendoff 1983, Talmy 1983.) However, the principles proposed in this paper are meant to
be universally represented in expressions of spatial location (cf. Talmy 1983), regardless of their syntactic realization. The
strength of analyzing English spatial prepositions lies in the fact that these elements unlike, e.g., many spatial verbs
encode only object location functions, hence are an especially clear example of how languages represent places.
2 These semantic-syntactic mappings will be richer in some domains than in others, and we can expect corresponding
differences in the extent to which learners will be able to use syntactic context to learn details about the meanings of new
words. For example, in the case of verbs, we might expect relatively rich correspondences carving out sets of meaning-
related words (Pinker 1989, Levin and Hovav 1991) and such correspondences can be expected to aid the learning of new
verbs (Landau and Gleitman 1985, Fisher et al. 1991, Fisher et al., this volume.) In contrast, the interpretation of novel
nouns should be helped relatively little beyond distinguishing between count vs. mass readings. For example, a learner
would not be able to tell, from the syntactic context alone, much more than that the word describes an 'individuated' entity
(e.g. a concrete object, see P. Bloom, this volume.) The same sparse mapping seems true for spatial prepositions. In
English, places map canonically onto PPs (Jackendoff 1983.) If the learner understands this mapping principle, she might
be able to interpret a new PP as describing a place, but it will only be through analysis of the distribution of figure and
reference object geometries that she will be able to induce the correct geometric meaning of a given preposition. The
present enterprise is motivated by the possibility that there are principles determining domain-related meanings (e.g.,
representations for objects vs. places) that could internally differentiate broad syntactic classes such as count noun or
preposition.
volume; Fisher et al., this volume; Pinker, this volume). Our interest is in how the spatial-cognitive system might further pare
down the hypothesis space following the broad division into object vs. place.
figure 1 for objects and standard location). Each time subjects observed an object being placed on the second box, they were
asked either 'Is this a corp?' or 'Is this acorp your box?' (matching their initial instructions). We asked whether subjects would
attend to different properties of the array in the two contexts.
Fig. 1. Objects and array used in spatial preposition experiment. In this experiment, a 'standard' object was placed in a 'standard'
position (as shown) on the upper surface of a box. Subjects were told either ''This is a corp" (count noun condition) or "This is
acorp my box" (preposition condition.) Then each of the objects was placed in each of several positions on and around the box,
and subjects were asked either "Is this a corp?" or "Is this acorp your box?" (with the test question matching the initial
instruction). The responses showed that in the count noun condition, subjects accepted only the standard shape and rejected
shapes that differed from the standard. In contrast, in the preposition condition, subjects accepted all shapes, apparently
considering object shape irrelevant to decisions about location.
The results showed that both children and adults attended to the object's shape in the count noun condition, accepting the
original shape but rejecting all shape changes. In the preposition condition, however, subjects ignored shape, accepting all three
shapes equally. In complementary fashion, subjects ignored the object's position in the count noun context, but attended to it
closely in the preposition condition. Children accepted all positions on the upper surface of the box, while adults were more
conservative, primarily accepting the standard position.
This pattern of results shows a rather extreme dissociation presence of detailed shape vs. no shape information at all in 3-year-
olds who are generalizing a novel preposition. But such an extreme dissociation is not a necessary property of the geometries of
object vs. location terms. There are terms in English and other languages which preserve restricted elements of an object's
shape. For example, the terms across and along both require that the figure object be roughly 'linear', that is, that the object
have a principal axis that can be located with respect to the axis of the reference object (Talmy 1983; see section 4). Therefore,
in a subsequent experiment, we asked whether children could represent objects in this intermediate fashion: as a geometric kind
that does not represent exact shape, but preserves just one critical component the object's principal axis.
We now used the same method but changed the objects' shapes and locations (see figure 2 for shapes and 'standard' location).
The standard object was now a 7" straight rod, and the test objects included a replica of the standard, a squiggly rod that was the
same extent as the standard, and a 2" × 2" × 1" block. The standard was placed perpendicular to the box's main axis when we
introduced it, and the test positions included this same position, one slightly to the left, one parallel to the box's main axis, and
one diagonal to it.
The overall results showed the same complementary pattern for attention to shape vs. position as we had found in the first
experiment: Subjects in the count noun context attended to shape but not position, while subjects in the preposition context did
the converse. However, it is the details of the shape results that are of particular interest, for we observed two different kinds of
shape representations for objects in the two syntactic contexts.
In the count noun context, children and adults preserved the precise shape of the figure object when generalizing: They accepted
the standard and rejected both the squiggle and the block. In the preposition condition, some 3-year-olds ignored shape
altogether as they had in the previous experiment. But other 3-year-olds, and most 5-year-olds and adults accepted the standard
and the squiggle while rejecting the block. That is, they accepted both objects that had intersected the box, even though these
were quite different in precise shape and hence had not both been accepted as instances of the same named object category in
the count noun condition. What these children did was to treat both the standard and the squiggle alike as linear objects in the
preposition condition, attending to the one property of shape the extent of its main axis that was apparently important in
describing its location.
In sum, we found that when young children were asked to generalize a name for a novel object, they engaged a full description
of the object's shape.
Fig. 2. Objects and array used in second spatial preposition experiment. This experiment was identical to the first, except for the
shapes of the 'standard' and test objects, and the positions that were tested (see text.) As in the first experiment, subjects in the
count noun condition attended to shape, accepting only the 'standard' (the straight rod.) In the preposition condition, subjects
accepted the two 'linear' objects objects whose principal axis could span the box and rejected the block. This suggests that
subjects in the preposition condition schematized the test objects, focussing only on the existence of a principal axis of a
particular extent. The results of the two experiments show that detailed object shape matters for the extension of object count
nouns, but that schematic object shape matters for the extension of spatial prepositions.
In contrast, when asked to generalize a new place term, they either ignored the object's shape altogether representing it
essentially as a 'blob' or did something in between, representing just one component of shape as if the target object were a 'linear
thing'.
Why did young children generalize so differently in the context of a count noun vs. a preposition? Clearly, we cannot appeal to
an unbiased Quinean similarity space which dictates what will be 'salient' in any scene, for these were identical in the two
situations. The only difference between the conditions was the syntactic context in which the novel word was introduced.
Children must have assumed that the novel count noun referred to the object being placed on the box in this case, an object of a
specific shape (to be explored in section 3). And they must have assumed that the novel preposition referred to the place where
the object was located. Syntactic context thus directed
children's attention to two different ontological categories object and place from which followed quite different kinds of
geometric representations.
These findings raise many questions. As for objects: Do children always generalize novel object names on the basis of shape
and if so, what does this suggest about the role of object perception in constraining the child's initial hypotheses about object
names? As for places: Is the lack of shape information in locational terms a universal feature of languages and is failure to
attend to shape for place terms a universal feature of language learners? Finally, do these differences between object and place
representation reflect facts about the way languages are organized? Or, as I've already hinted, do they partly reflect the way the
human spatial-cognitive system operates independently of language?
Common sense suggests that object shape might be particularly critical in the extension of object names. For adults who have
full-blown concepts and a mature lexicon, the representations of objects named by count nouns such as chair, apple, and dog
must in part be linked to the shapes of these objects (Jackendoff 1987, Jackendoff and Landau 1991). Object shape remains
constant over space and time, is an excellent predictor of object kind, and shape perception is undoubtedly a part of human
native endowment. Shape has been shown to be a critical property of object representation at the basic level (Rosch et al. 1976);
it is the principal basis for children's spontaneous over-generalizations (Clark 1973); and it may be a privileged property in
extending object names to depictions in representational art. For example, the well-known sculptures by Claes von Oldenburg
include 'The Clothespin' (a 45-foot metal object) and 'Good Humours' (floppy, furry objects); in both cases, similarity to the real
thing is solely in their shape.4 In these cases, similarity of shape licenses the use of the real object's name, despite the fact that
the representational object is not really an X (see Jackendoff, 1992, for related discussion).
Yet shape is definitely a perceptual property. If object names are extended on the basis of perceptual properties only as weighted
by theories, then what is the status of shape? Is shape somehow 'special' as a brute-force preferred property? Or are there always
effects of higher-level knowledge in determining whether and to what extent shape is important?
In our studies of object naming, we have discovered that the answers to these questions are complex. As the following evidence
shows, shape is indeed 'special' as a property often critical to the extension of object names among young children. Considering
the reasons for this suggests that constant object shape may be intimately related to our notions of object kind. However, the
results also show that learner's use of object shape recruits a system of shape perception that is itself quite complex and subtle.
Moreover, the system can
4 Jackendoff (1983, 1992) has suggested that an object's name can be extended to many possible referents X, including
'depiction of X'. This would account for the naturalness of referring to the sculpture as 'clothespin' even though we all
know it isn't really one. Jackendoff suggests that the distinction between real referents and depictions may be critical for
binding theory. I thank Paul Bloom for pointing out the relevance of Jackendoff's treatment, and Robert Remez for
providing an extensive list of Oldenburg's work that exemplifies this problem.
be affected by top-down influences from a rather early age. Both of these facts render the Quinean learner even less plausible,
and suggest great detail and complexity in the representational system underlying object naming in childhood.
3.1. Rigid objects and the shape bias
When a child hears an object labelled with a novel count noun, he faces at least a circumscribed version of Quine's problem: He
must decide which of the properties he perceives are relevant to generalizing the new word. Which properties of the 'whole
object' (Markman, this volume) will the learner select as the basis for generalization?
In a first series of experiments, Landau et al. (1988) investigated 2- and 3-year-olds' and adults' representation of concrete 3-
dimensional objects. We asked whether children and adults would tend to generalize a new count noun on the basis of an
object's shape, its size, or its texture, and whether any such biases were due to generalized perceptual preferences rather than the
task of learning a new word.
We showed subjects a single object for example, a 2" blue wooden U-shaped thing and labelled it by saying, 'See this? This is a
dax'. This object the standard was placed aside but still in view and subjects were shown a series of objects, one at a time, which
were either size, texture, or shape changes from the standard. These changes ranged from relatively minor or moderate changes
tested in a first experiment (e.g., ½" larger for the minor size change or a slightly bent leg for the shape change) to either
moderate or quite extreme changes tested in a second experiment (e.g., 24" for the extreme size change and a completely rotated
leg for the shape change; see figure 3 for examples5). For each object, subjects were asked 'Is this a dax?'
The results are shown schematically in figure 3. Subjects of all ages accepted the size and texture changes at ceiling levels, but
rejected the shape changes. For the minor and moderate magnitudes of change in the first experiment, adults accepted all size
and texture changes but rejected all shape changes. Two-and three-year-olds showed the same pattern but probabilistically
rather than categorically.
5 Note that varying the magnitude of the change for each dimension is critical to determining whether it is the dimension
itself that children attend to, or simply the magnitude of the change. For example, a child might reject a given shape
change while accepting a given size change just because the shape change is quite radical but the size change is rather
minor. Unless the magnitude of changes is varied over dimensions, we cannot dissociate these two possibilities (see
Landau et al., 1988, for discussion).
Fig. 3. The shape bias in early lexical learning. When children or adults hear
a novel object (the 'standard') named with a novel count noun, they tend
to generalize that noun on the basis of the object's shape,
and not on its size or texture.
For the moderate and extreme magnitudes of change in the second experiment, adults again accepted all size and texture
changes and rejected all shape changes. The adults' pattern here was quite rigid: They rejected even a very minor shape change
while accepting at ceiling a size change that was more than 10 times the size of the standard. Three-year-olds showed the same
pattern as adults, again in a weaker form, while the two-year-olds did not show any particular preferences, accepting all objects
at a relatively high level. This pattern of results for the 2-year-olds was due in part to the 'yes/no' procedure, which often elicits
blanket 'yes' or 'no' in this age group. Using a forced-choice procedure, 2-year-olds did tend to accept size and texture changes
and reject shape changes, as did the 3-year-olds (see Landau et al., 1988, Experiment 2 forced-choice procedure).
Thus we found that children and adults alike extend the reference of a count noun to new objects on the basis of shape. But they
reject extension to objects with even quite minor shape changes. This pattern which we called
the 'shape bias' appears by age 2 in a relatively fragile form, is stable by age 3, and reaches adult strength by age 5 (Landau et
al. 1992a).
Does the shape bias reflect a general, indiscriminate perceptual preference for grouping same-shaped objects together? One
might suppose that children generalize a count noun on the basis of shape just because the objects with the same shape as the
standard look more like the standard than objects with the same size or the same texture. Evidence for this interpretation would
be a shape bias across a variety of object matching tasks, whether they involve an object's name or not.
To assess this possibility, Landau et al. (1988) presented 2- and 3-year-olds with the same standard as before, but did not name
the object. Children's attention was drawn to the standard by pointing and saying 'See this?' Then the object was placed aside as
before, and the test objects were brought out in pairs. Children were now asked 'Which one goes with this?' (requesting a match
to the standard). The expression 'goes with' is, of course, ambiguous; that is, one object could 'go with' another on many
different bases. Previous experiments have shown that children who are asked this question might group on a number of
different bases, including various dimensions of perceptual similarity, magnitude of similarity, or thematic relationship with
another object (Smith 1989, Markman and Hutchinson 1984, Inhelder and Piaget 1964). The question was whether children
would still group on the basis of shape given that there are many possible bases for similarity grouping.
Although children had shown a shape bias in the count noun condition, this time they showed either a weak shape bias or none
at all. Two-year-olds were as likely to pick the most extreme size change (which had the same shape and texture as the
standard) as the most extreme shape change (which was identical to the standard in size and texture). Three-year-olds showed a
weak shape bias, nowhere near as strong as in the count noun condition.
Thus, by age 3, the shape bias is elicited reliably and strongly in the context of object naming. During the early years of
language learning, the perceptual space has already become 'biased'stretched towards emphasizing different dimensions in
different contexts. The context of object naming draws on a perceptual space where similarity of object shape is critical.
Further research has confirmed that the shape bias is extremely robust in the context of object naming, that is, under count noun
instructions. It appears in 3-year-olds and adults when the objects are three-dimensional (Landau et al. 1988, Au and Markman
1987), or two-dimensional (Landau et al. 1992b, 1993; Baldwin 1993); when the object's texture or coloration is so salient that
children are drawn to those dimensions if asked for non-linguistic
similarity judgments (Smith et al. 1992); and when cues to animacy (eyes) also pull children's attention to the texture or
substance of objects (Jones et al. 1991). A shape bias has been replicated in other labs and with school age children (Becker and
Ward 1991).
In contrast, the shape bias is weakened or completely suppressed in other syntactic contexts, even though the domain in question
still concerns objects (Landau et al. 1992a, Smith et al. 1992). For example, we found that when objects were introduced as
members of a more inclusive category (i.e., 'This is a kind of dax'), young children and adults showed only a weak preference
for same shape, and no strong preference for either size or texture. This makes sense, as members of more inclusive object
categories are often less similar in shape (and other dimensions) than members of their subsets. (For example, members of the
animal category are less similar to each other than members of the dog category; and dogs are less similar to each other than
Dalmatians). We also found that when objects were introduced to learners using an adjectival context ('This is a daxy one'), they
showed no preference at all for same shape, but rather, a strong preference for same surface texture which is encoded by
adjectives in English and other languages (Schachter 1985; see also Waxman, this volume).
The results of both syntactic manipulations confirm the notion that the shape bias is confined to the count noun context. This
provides further evidence for the importance of syntactic context in directing young children's interpretations of novel words to
different parts of the object representation system.
3.2. Artifacts, natural kinds, and shape: Influences of perception and knowledge
The existence of a shape bias should certainly aid the child in learning new object names and generalizing them to new
instances. However, if the bias were too rigid, children would fail to generalize in cases where they should. Although many
artifacts are rigid, others regularly undergo transformations of shape changes in their configuration because they are jointed (e.g.
tools, vehicles). Still other objects, especially natural kinds, undergo changes of configuration as they move, grow, are squashed,
etc.6
6 I omit from this discussion the importance of perspective transformations, to which young language learners are surely
sensitive. Evidence shows that infants are sensitive to shape constancy (see Banks and Salapatek 1983). I assume that 2-
year-olds will generalize a novel name to different perspective transformations of an object (although not necessarily
'accidental' perspectives, i.e. those that obtain from unusual viewing perspectives).
Does a bias for same shape mean that shape is preserved in a static 'snapshot'-like fashion, corresponding to the configuration
shown by an object in just one pose? Or is the bias for same shape responsive to the different kinds of shape transformations a
given object might normally undergo?
The objects used by Landau et al. (1988) all were made of wood and most had hard straight edges and sharp angles, suggesting
the rigidity characteristic of artifacts. Thus the strong shape bias among children might have been due to their understanding that
objects possessing these properties could not undergo shape transformations. Such an understanding could stem from
perceptually-based or knowledge-based causes. Perceptual processes might specify the kinds of transformations normally
expected from objects possessing properties such as straight edges and sharp angles. Knowledge about different object kinds
might also affect judgments; for example, knowing that artifacts normally do not undergo massive shape transformations might
lead the child to reject shape changes. And these perception-driven and knowledge-based causes might well interact.
To determine whether perceptual causes could modulate children's judgments about named objects. Landau et al. (1992b, 1993)
first sought to vary suggested malleability by purely perceptual means. Using the same overall method as in previous studies, we
first created outline drawings of the original shapes, including the standard plus the three shape changes and three new size
changes (see figure 4A for the shape changes). Three-year-olds and adults who were tested using these stimuli showed a strong
shape bias, just as before.
In a second series, we tried to suggest malleability by drawing the same shapes but with curved contours (figure 4B). Contour
curvature generally correlates with malleability in the physical world, and the question was whether this change would suggest a
greater potential for transformation to the children. In a third series, we added another cue to malleability by imposing wrinkles
on the curved contours (figure 4C).
In the fourth and final series, we added a single perceptual property that powerfully suggests a particular ontological kind.
Following Jones et al. (1991), we drew eyes on the wrinkled curved drawings (figure 4D). Although eyes are a perceptual
property, they clearly are powerful in suggesting animate beings. In fact, Jones et al. found that when eyes were placed on the
original wooden objects, 3-year-olds generalized on the basis of same shape
Fig. 4A-D. Drawings used in studies of rigidity, malleability, object kind, and naming. The first object in each row represents
the 'standard' object that was named with a count noun. Subjects in each of four experiments saw either A, B, C, or D. Test
items included the corresponding shape changes in each row. Objects in the first row possess straight edges and sharp corners.
Those in the second, third, and fourth rows vary the basic shape so as to suggest malleability by curves, wrinkles, and eyes,
respectively. Subjects' judgments of which shape changes belong in the named category varied considerably in accord with
suggested malleability.
and texture, as if an animate object's 'stuff' mattered in generalizing its name.7
7 This finding suggests that young children are sensitive to the importance of texture (or 'stuff') of which animates are
made, but also suggests that children are as rigid in their preservation of shape for animates as for artifacts. The difference
in findings between Jones et al. and Landau et al. (1993) suggests the important role of material as it interacts with object
shape in determining relative malleability.
Subjects showed a strong shape bias for the drawn versions of the original stimuli (4A) the ones with straight edges and sharp
corners but this restriction to the original shape was progressively weakened with the other manipulations. With the curved
contours, subjects accepted some shape changes and with the wrinkled and curved contours, they accepted slightly more. The
most powerful change occurred with the combination of wrinkled and curved contours topped by eyes; in this case, subjects
accepted all of the shape changes. There were no age differences in the overall pattern: 3-year-olds showed the same
distribution of choices as adults.
These findings make two points. First, they show that the shape bias is not a bias for accepting only objects having precisely the
same static configuration. Children as young as 3 years old appreciate that an object's shape is defined with respect to the range
of shape transformations it can undergo over transformation, the 'same shape' will undergo changes in its configuration.
Although the range of acceptable configurations broadens with suggested malleability, it still is constrained within a clearly
delimited range of possibilities. We think it unlikely that children would have accepted a radical topological transformation (e.g.
an irregular, roundish shape) as an instance of a dax.
The second point is that perceptual information (such as curviness or wrinkles or eyes) can play a rich role in engaging
children's and adults' knowledge of different object kinds and their potential for shape transformation. This information may
concern the object's shape itself (i.e., its contours) or it may concern other properties that signal certain categories (e.g., eyes).
Although 3-year-olds do not have mature biological theories of concepts such as 'animal', 'mammal', or 'primate' (Johnson et al.
1992), perceptual information can engage knowledge of different object kinds, allowing young children to make predictions
about what an object will look like over space and time. These predictions serve as an important foundation for the extension of
object names.
3.3. Summary: Object shape and object name
The foregoing evidence suggests that shape properly defined is critical in young children's extensions of novel object names.
Sensitivity to 'same shape' is clearly complex, with the range of possible shape transformations varying as a function of
properties such as malleability and animacy; these properties can be suggested by perceptual information. However, other
properties such as size, texture, or material do not appear to play the same role in extension of object names. What underlies the
critical role of shape in object naming?
One factor is that shape appears to be a privileged dimension of object representation in the human spatial-cognitive system:
Objects are recognized largely on the basis of their shape (see, e.g. Biederman 1987). Shape makes sense as a privileged
dimension, because of the spatial constraints under which objects (as opposed to substances) operate. Normally, under internal
or external motion, objects remain rigid (locally, in the case of jointed or semi-malleable objects; globally in the case of non-
jointed objects). Infants are sensitive to the intermodal consequences of rigid vs. non-rigid motions (Gibson and Walker 1984)
and sensitivity to rigidity appears to guide infants' perception of object unity (Spelke 1990).8
One important consequence of rigidity is that an object's shape remains constant over motion through space: Objects typically do
not deform randomly as they move. In contrast, substances do not obey such a constraint; to the extent that they are arranged
into a shape when static, this shape changes radically over motion through space, both locally and globally. Indeed, the special
status of shape does not generalize to substances: When we observe paint or sand, shape is not salient at all; rather, it is the
'stuff' that is most immediately apprehended.9 The distinction between objects and substances exists quite early in development,
certainly prior to productive control of count/mass syntax (Soja et al. 1991; see Carey, this volume); and this no doubt serves as
one important constraint on word learning.
But how does such a shape-based object recognition system become linked to object naming and to a shape bias in the child?
One possibility is that children assume (natively) that object names are linked to object kinds; when they are told 'This is an X',
children may assume that the name of that thing refers to the kind of object it is (see also Markman, this volume; Carey, this
volume). Given the fact that object kinds in the world often are differentiated by object shape, the child will often hear 'This is
an X' referring to some kind that is highly correlated with the shape of the object they are observing. In such event, a bias to
represent objects in terms of their shapes will lead the
8 To be specific, Spelke (1990) proposes that infants' object perception is guided by four principles: cohesion,
boundedness, rigidity, and no action at a distance. Together, these principles specify which surfaces belong to the same
object and which belong to separate objects. Spelke further proposes that these principles of object perception are deeply
related to basic constraints on the motions of physical bodies: we perceive objects according to certain perceptual
principles such as rigidity because bodies move as they do, i.e. they do not deform as they move.
9 We obviously can construe what we see in terms of shape, but then we would describe what we see as a 'splotch of
paint' or a 'pile of sand', drawing attention to the individuated entity, 'splotch' or 'pile' (see P. Bloom, this volume, for
discussion of related issues).
child to link two pairs of relationships: object name and object kind on the one hand, object name and object shape on the other
hand. Such a linking could occur on just one or two exposures (in effect, a kind of 'triggering' in Fodor's, 1981, sense), although
the evidence reviewed above does suggest that the shape bias becomes sharper over time. In any case, once this link is
established, children hearing an object's name would generalize that name on the basis of object shape.
The reliance on object shape in extending object names does not entail that shape is definitional for object kind, nor that it
exhausts the possible cues to kind. Obviously, objects sharing the same shape are sometimes members of different kinds (e.g.
whales and fish), which means that sometimes a shape bias will lead to incorrect extension of an object name. However, it is
significant that in these cases, children's errors may be long-lived and require specific tutoring before they are corrected. Shape
may remain at the top of a preference hierarchy even for adults, although additional computational resources and brute-force
knowledge of the world will allow adults to use other sources of information more easily than young children, thus leading to
more complex modulations of any shape bias.
The strong reliance on object shape in object naming points to an intricate system of representation which is specialized for the
'whats': object recognition, object naming. This specialization does not appear to be engaged in the system underlying the
language of places.
Given that spatial locational terms are acquired early and with little difficulty, the question arises how learners come to know
just what spatial properties of the world are relevant to these terms. Having just seen that object shape is a privileged property
for learning object names, one might suppose that learners show a shape bias even when learning names for places. For
example, lerners might assume that something can be in only those objects that are shaped like a box or bottle. Yet cross-
linguistic study of place expressions reveals that locational language does not, in general, preserve the details of object shape.
What, then, is the structure of spatial locational terms that renders them so easy to learn? And how precisely do young learners
acquire this structure?
4.1. The structure of place expressions
In English, the standard expression for place is the prepositional phrase (PP). The central element in the place expression is the
spatial preposition words like on, in, off, and around. Since an object's location is often defined in terms of other objects, the
spatial preposition usually is joined by at least two NP arguments, one representing the object being located (the 'figure'
according to Talmy 1983) and the other representing the reference object (the 'ground'). For example, in the sentence 'The cat is
on the mat', the preposition on is a place-function that defines a region (the surface) of the reference object (the mat) where the
figure object (the cat) is located.
Each preposition has a characteristic geometry that specifies the relevant region of the reference object (e.g. within the object's
boundaries for in; in contact with its surface for on). The preposition's geometry also specifies the kinds of geometric entities
that can play the role of figure and reference object (Talmy 1983, Jackendoff 1983, Miller and Johnson-Laird 1976, Herskovits
1986). For example, the geometry of in roughly specifies a region of 'containment'. This region then imposes selection
restrictions on the arguments of the preposition: In order to qualify as an appropriate reference object for the term in, an object
must be potentially construable as 'container-like', e.g. in a bowl, in a carton, or even in a tree, where the tree is an enclosing
volume; but not *in a point (since a point, by definition, cannot contain anything). There are no geometric restrictions at all on
the figure object for this term that is, any object of any geometric class can be located in a suitable 'container'.
Note that the 'suitability' of a reference object is almost completely dependent on coercion by the preposition: Things that are not
really containers
can easily be suitable arguments for in. This fact has led many to emphasize that spatial language draws on 'schematic'
representations that what matters is one's mental representation of an object, and not necessarily its physical properties (Talmy
1983, Herskovits 1986).
Given the geometric restrictions on prepositions, violations (usually, grossly unsuitable arguments) result in anomaly. For
example, since in requires an enclosing area or volume, a sentence such as (1a) is permissible while (1b) is not.
(1a) The fly is in the circle.
(1b) *The fly is in the point.
As a more complicated case, the English words across, along, and around seem to require that the figure object be construable
as having a principal linear axis. For example,
(2a) The snake lay across (along) the road.
is fine, because the snake has a clear principal axis which can either intersect (for across) or lie parallel to (for along) the road.
However,
(2b) *The ball lay across the road.
is anomalous, except on the reading that the ball lies on some path leading from the observer's viewpoint to some point on the
other side of the road.
(2c) ?The ball lay along the road.
similarly is somewhat odd, although it can be repaired easily by substituting alongside of, an expression that picks out a small
section of the road's edge as the reference object. Now the ball can be construed as having an axis that lies parallel to this
section.
Cross-linguistic studies of spatial meanings have revealed universals in the geometries underlying regions, figure and reference
objects (Talmy 1983, Jackendoff and Landau 1991, Landau and Jackendoff 1993). Most important for the present discussion are
the geometries of the figure and reference object.
Unlike the representations underlying object names, the figure and reference objects tend to be represented in quite sparse
geometric terms. Across languages, they typically are represented as points, areas, surfaces, or volumes,
at most preserving quite restricted aspects of object shape such as the three orthogonal sets of axes corresponding to up/down,
front/back, and right/left (Talmy 1983, Landau and Jackendoff 1993). Objects are not represented by precise metric descriptions
of shape, even though shape is critical to the representation of objects as category members (but see section 4.4 for some
possible exceptions).
For example, as we noted above, the preposition in must be used with a reference object that can be potentially construed as a
'container', e.g. in a bowl, in a carton, or even in a tree, where the tree is an enclosing volume. Although what is selected by the
preposition is a geometric volume common to all three acceptable objects, there clearly are enormous shape differences among
bowls, cartons, and trees which are critical in assigning these objects to different named categories. Similarly, on selects for a
reference object that can be thought of as a surface, e.g. on my nose, on the field, on the earth, but not *on the hole. Again, each
of these can function as a surface, though their individual shape parameters differ greatly.
Other than volume and surface, the main elements of shape that are preserved concern the object's three sets of axes. Some
terms require that the reference object be construed as having at least one directed axis, either vertical or horizontal (up, down,
above, below). Others call for the reference object to be construed as having two orthogonal axes in the horizontal plane (in
front of, behind, beside). Both of these sets of terms leave the geometry of the figure object completely unspecified, while the
reference object can vary indefinitely in shape as long as its axes are preserved. Terms like across and along are among the most
geometrically complex prepositions in English, as they require a linear axis for both figure and reference objects.
Drawing on these geometries, only a relatively small set of region types are expressed. At the most general level, all prepositions
appear to describe regions that preserve non-metric properties; for example, there is no spatial preposition that describes regions
extending just 48 degrees to the right of an object or 2'' around its border. Rather, prepositions represent properties such as
containment (in, with the reference object a volume), support or attachment (on, with the reference object a surface), intersection
(across, with reference object an axis), co-linearity (along, the reference object again an axis), and relative proximity (near, in
front of, the reference object either a point or a set of axes).
Landau and Jackendoff (1993) suggest that these properties may factor into several levels of direction and distance. The
direction component relies on analysis of the reference object into the three sets of orthogonal axes
described above (up/down, front/behind, right/left). The distance component chunks continuous space into a small set of discrete
categories which vary somewhat from language to language. In English there are four such levels: (a) internal to the reference
object (e.g. in, inside), (b) in contact with the object (e.g., on), (c) proximal to it (e.g. near, in front of), (d) distal from it (e.g.
far and beyond).
Thus, the geometric system for representing figure and reference objects is rather tight and rather 'sparse'. What do these
restrictive geometric descriptions portend for the language learner? On the one hand, strict limitations in the range of object
geometries might be a positive force in learning: If the representations are limited, there is less to learn. On the other hand, the
object geometries that are relevant to prepositional meanings might create a serious problem for the learner, for these geometries
are qualitatively different from those used when naming objects. Although detailed object shape is relevant when learning an
object's name, most or all of an object's shape is irrelevant when learning the meaning of a spatial preposition.
4.2. Lack of object shape in spatial term learning
Given the salience of object shape in object naming, one might suppose that children would require considerable time to learn to
ignore object shape in the context of prepositions. For example, children might find it easiest to learn terms drawing on richer
shape representations (like across or along) than sparser ones. Or, children's earliest use of terms with sparse shape descriptions
might occur only with reference objects of a certain shape: The word in might be used only with objects that are canonical
'containers' fully closing, opaque-sided objects such as boxes and drawers.
To my knowledge, there are no reports describing such patterns among young children learning spatial prepositions. What
evidence does exist on early usage suggests the opposite. For example, experimental studies on the cross-linguistic order of
acquisition of spatial prepositions suggest that terms requiring objects with very little geometric restriction (e.g. near, on) are
acquired before those requiring objects with more complex restrictions (e.g. in front of, across, see Johnston and Slobin 1978).
Similarly, evidence from spontaneous speech suggests that children represent objects schematically that is, preserving little
shape information when using them as the arguments of prepositions. Landau et al. (1990) analyzed young children's
spontaneous uses of prepositional phrases headed by in or on, asking what kinds of objects served as the reference objects for
these
prepositions. The children ranged from 21 to 40 months with mean length of utterance (in morphemes) ranging from 1.0-2.8
(younger group) or 3.0-5.5 (older group).
We found that not even the youngest children confined their production to reference objects considered 'canonical' containers
(for in) or horizontal surfaces providing gravitational support (for on). For PPs headed by in, children in the younger group
produced reference objects that were enclosures of some type; but these included ones that were solid-sided and fully closeable
(e.g., house, closet, plane, bottle), partial volumes open on at least one side (e.g., cup, basket), bounded volumes with non-solid
sides (e.g., crib, cage), and partially bounded volumes open on two or more sides (e.g. chair, stroller). Older children also used
reference objects conceived as solid objects affording embedding (e.g. a tick in the ear), homogeneous substances conceived as
a mass (e.g. in the woods, rain, hair), and apertures (e.g., window, door).
The same generality held for reference objects used with on. The children used reference objects that included 2-dimensional
horizontal planar surfaces (e.g. paper, floor), horizontal surfaces that were parts of 3-dimensional objects (e.g. train, bike,
chair), and outer surfaces of objects that were not necessarily horizontal (e.g. baby, dolly, fingers, face). All of these could
furnish a base for support, but not only gravitational support.
Such patterns of generalization suggest that young children learning their first prepositions can represent many objects properly
as affording either 'containment' (in a highly abstract sense) or support/attachment.
Converging evidence from cross-linguistic studies shows that young children learning other languages also can represent object
classes that abstract away from exact shape as in English; but they can also schematize objects along dimensions that are
somewhat different from those relevant to English spatial prepositions. Bowerman and Choi (reported in Bowerman 1991) have
noted that while English has separate terms for putting objects in, on, or together, Korean collapses these together under the
verb kkita, and opposes them to relationships English classifies as taking out, off, or apart, collapsing these under the verb
ppayta. While these verbs collapse distinctions that English makes, they also cross-cut the English spatial terms with a
dimension of 'tight fit/ loose fit'. For example, while English distinguishes putting a ring on finger from putting a button in
pocket from putting beads together, Korean collapses them together as joining relations of tight fit. Other verbs are used for
relationships of loose fit, such as cup on table, apple in bowl, or tables together. Bowerman and Choi have reported striking
evidence that children learning Korean appear to observe this distinction in their earliest speech,
suggesting that they can represent objects in terms of certain force dynamic notions (i.e., degree of fit) as well as object
geometries such as volume or supporting surface.
Finally, recall Landau and Stecker's (1990) findings (see section 2). Three-year-olds who were shown an object being placed on
the top right-hand corner of a box and told 'This is acorp my box' generalized the term acorp to objects of very different shapes
as long as they occupied a location on the upper surface of the box. In contrast, children who were told 'This is a corp'
generalized to objects of the same shape, regardless of position. The results of the preposition condition concur with the findings
from spontaneous speech in showing that young children ignore detailed object shape when acquiring a new preposition.
As a whole, these results suggest that there may not be much learning required by the child in order to ignore the full shape
specification of an object. Children might be able to represent objects schematically (in terms of certain limited geometric
properties) but also more fully (in terms of a complete geometric description of shape, for the purposes of object naming).
Critically, however, the schematic representations are engaged only when learning place terms.
4.3. Axes in spatial term learning
If children ignore object shape when learning place terms, when and how do they come to represent an object in terms of its
axes, as would be critical for learning terms such as up, down, top, bottom, in front of, behind, and across? Much of the
literature on acquisition order for spatial prepositions (and comparable terms across languages) suggests that the axial system is
acquired rather late. For example, Johnston and Slobin (1985) reported that terms such as in front of and behind are mastered
substantially later than terms such as in or on. However, the conditions for application of in front of and behind are rather
complex, involving more than an axial representation of the reference object. For example, one must understand the particular
pattern of deictic usage of one's language for instance, which portion of the object's axis is mapped onto each term (see, e.g. H.
Clark 1973, Herskovits 1986).
If we confine ourselves to the question of when children are able to represent objects in terms of their axes regardless of whether
the particular mapping for individual spatial terms is correct we find that children from age 2 on can represent the axial system.
One piece of evidence comes from Tanz (1980), who asked children from 2 ½ to 5 years of age to place objects
'in front of', 'in back of' and 'at the side of' reference objects. Tanz found a great many errors through age 4. However, despite
the errors, 96% of the placements were made at the 'cardinal' directions: in line with the two axes that divide the reference
object, front from back and side from side.
Additional evidence comes from some current work in our laboratory on the nature of young children's regions mapped to in
front of. In three studies, we have shown 3-year-olds, 5-year-olds, and adults an array with a single reference object, and have
asked them to judge when a small object is 'in front of' it. In one study, the reference object is U-shaped essentially, a largish
version of the 'dax' we have used in our object naming studies. This object has a clear vertical axis, and when it is placed
horizontally on a table, it has a clear principal axis that runs from front to back, at least for adults. In another study, the reference
object is round, lacking clear axes. In the third study, the reference object is still round, but now is adorned with plastic eyes and
a small tail, rendering clear the location of this object's principal axis.
Adults, not surprisingly, tend to judge one object 'in front of' the reference object when it falls within a quadrant surrounding the
half-axis closest to the viewer. Five-year-olds do the same. Three-year-olds who might be least likely to understand an object's
axial system perform randomly when judging what is in front of the round unmarked object. However, they follow the same
pattern as older children and adults when judging what is in front of either the U-shaped object or the round object with eyes
and tail. When they perform differently, they follow the reference object's projected axis more closely than either five-year-olds
or adults. For example, some of them judge that only objects falling exactly on the relevant half axis are in front of the reference
object.
A last piece of evidence for early representation of object axes comes from an experiment by Landau (1991) on the
understanding of spatial part terms (top, bottom, front, back, and side) in 3- to 4-year-olds, one of whom was congenitally blind.
Children were shown novel objects, and were specifically told that one region (indicated by the experimenter) was either the
top, bottom, front, back, or side of the object. Children now had to identify the regions corresponding to the remaining four
terms. Correct responding would have required the children to identify the three major axes, and anchor them with the single
term provided by the experimenter.
In order to determine whether the terms corresponded to only the canonical position of an object, children were shown regions
in each of two conditions: In the Canonical condition, the region was designated as if the object was upright for example, top
was the uppermost region, bottom the
lowermost. In the non-canonical condition, the regions were designated as if the object was not upright for example, now top
was the lowermost region or the region corresponding to the canonical side.
Children performed quite well on this task, identifying the queried regions well above chance. The blind child performed
similarly to the sighted children who viewed the objects. The children performed above chance in both the canonical and non-
canonical conditions, indicating that they could parse the object into its three orthogonal axes, and that the regions corresponding
to each half-axis were defined in an object-centered fashion. That is, the top was at the opposite end of the bottom, regardless of
the overall orientation of the object.
In sum, children from about age 2 appear to be capable of representing an object in terms of its axes. This representation sparse
as it is may contain the richest geometric properties necessary for learning place terms. Although the axes are not physically
present, young children possess object representations that contain axes; and these axes can be engaged during the learning of
place terms.
4.4. Some challenges: How much shape, and when?
Although the foregoing description of spatial terms relies largely on analysis of English spatial terms, a number of investigators
have concluded that the geometric properties on which I have focussed also capture place terms in other languages (Talmy 1983,
Landau and Jackendoff 1993). There is, however, some disagreement on this issue, and this paper would not be complete
without mentioning some dissenting views that raise interesting empirical questions.
A number of investigators have suggested that there exist languages whose place terms incorporate much richer object shape
than I have described. Specifically, Bowerman (1991), Levinson (1992) and Brown (1993) argue that Tzeltal (and related Mayan
languages) exhibit a large class of predicates (several hundred) that include shape as a crucial feature. According to Brown,
Tzeltal has but one preposition (roughly meaning 'at'). Most detail of the relationship between figure and reference object is
expressed by two systems. One draws on body-part terms using a possessive construction: The figure object is located with
respect to a named part of the reference object, for example, 'The frying pan is at the waterpot's butt', where the 'butt' is the
bottom of the waterpot (independent of its orientation) This system also is used in English: We can say, for example, 'Mary was
at the foot of the table' or 'The ball was at the bottom of the tree'. Although the number of usable body parts is larger in Tzeltal
than in English, neither case seems to genuinely
depend on analysis of the object's shape: The butt of the waterpot is where it is in virtue of the object's axial system, not its
rotund shape.
The second system uses locational predicates that appear to describe the location of objects with particular shapes. For example,
according to Brown, pachal describes a wide-mouthed container canonically 'sitting', waxal describes a tall oblong-shaped
object canonically 'standing', lechel describes a wide flat object lying flat, pakal describes a blob-type object with a clear flat
surface lying 'face-down'. Again, English does share some of these restrictions in its verb system: Stand usually is restricted to
objects in their vertical orientation, lie to objects in their horizontal orientation, and numerous denominal verbs (to bottle, to
house, to jar) incorporate specific object categories.
Is the Tzeltal system an exception to the sparse representation of shape in place naming? It is difficult to tell, since the items in
question are verbs, lexical items that can incorporate all kinds of information. But even Tzeltal locational verbs do not appear to
incorporate object-shape distinctions as fine as the vocabulary of object nouns in that language or any other. In fact, the Tzeltal
shape distinctions seem to fit with the kinds of distinctions made by classifier systems in other languages (Allan 1973.) Shape
classifiers group objects in terms of such parameters as 'lump', 'flat', and 'long-thin-thing'. But classifiers do not group objects in
terms of specific basic-level (shape-based) object categories this is what object names are for. It remains an interesting empirical
question whether the children learning Tzeltal readily incorporate the so-called richer elements of shape in their place
expressions.
infinite detail in the observable world to the finite vocabulary. Of course, critical in this filtering is removing just the right kind
of detail and this detail differs for named object vs. located objects.
In the case of objects, count nouns group together sets of objects that cohere in a number of respects. Sameness in shape
(properly defined) seems to be an important dimension for such categories, while color or texture is not. However, even
members of a category named by a basic level noun, though similar in shape, are not identical in shape: Members of the
category dog include things as diverse as Dalmatians and Pekinese, Great Danes and Chihuahuas.10 Many of these distinctions
are captured at the subordinate level (Waxman, this volume), but even there, category names cover a multitude of variation.
Thus, language filters out a good deal of the perceptual variability among objects, as it must if words are to represent categories.
Although we can perceive and remember the differences among members of a named category, the design of language does not
allow representation in its basic vocabulary for all these differences. Our perceptual representations of objects seem to preserve
a good deal more detail than language encodes. A picture is worth a thousand words.
This filtering is even more dramatic in the case of prepositions. Recall that cross-linguistic studies have shown that a very small
number of geometric properties are preserved in the representation of figure and ground objects (Talmy 1983). In English, the
geometries boil down to treating the figure as a 'blob' or 'line' and the reference object as a 'point', 'volume', 'surface', 'line' or set
of axes (as well as a few others, see Landau and Jackendoff 1993). As demonstrated earlier, this sparse treatment of objects in
the spatial domain does not cause difficulties for young children learning spatial prepositions or spatial verbs (see Bowerman
1991). The filtering of object properties in the place domain is more extreme than it is in the object domain.
Could the extreme filtering for prepositions be due to the fact that prepositions are closed class (function words)? That is, all
closed class items by virtue of their small number must categorize perceptual and conceptual space to a great degree. Perhaps
the need for stretching a very small number of words to cover a large number of potential distinctions inevitably gives rise to
such drastic filtering.
10 We find it interesting that children can be resistant to labelling the more unusual looking of these with the word dog.
The daughter of a friend insisted for a long time that a Pekinese was a cat, presumably in virtue of its flat face.
This is unlikely to be the whole story. The fundamental question is not why spatial prepositions filter out detail at all, for all
linguistic categories filter as they must, given the fact that their memberships are finite. Rather, the fundamental question is why
spatial prepositions filter as they do, preserving only certain shape-based qualities. If the requirement were simply to filter
without further specification spatial prepositions could represent objects by gross variation in size, brightness, texture, or any of
the other infinite possible properties.
A telling comparison involves the noun classifiers found in many languages such as Japanese, Chinese, and American Sign
Language. Noun classifiers are morphemes that 'classify' or describe the noun they accompany, and in many languages they are
obligatory in certain contexts (such as counting). They are closed class, are relatively few in number, and express just a few
distinctions among objects. Universally, these distinctions include properties such as shape and size of objects (Allan 1973), for
example, 'long and straight thing', 'small round thing', 'flat thing'. These properties appear quite similar to the kinds of geometric
properties described in conjunction with figure and reference object and might seem to suggest that it is the closed class nature
of the prepositions and not their pertinence to location that forces them to draw on the properties they do.
However, there are a number of differences between the properties engaged by noun classifiers vs. spatial prepositions. First,
classifiers do not describe only object geometries. They also often mark animacy, texture or substance, physical integrity (e.g.,
broken), arrangement (e.g. pleats, loops), and other properties that do not seem to appear in expressions of pure location (see
Allan 1973). Second, they can be much more specific than preposition geometries, for example, marking certain function-
related categories (e.g. in Mandarin Chinese, 'handled things'; in ASL, 'tools'), basic level or subordinate categories (e.g. in
Mandarin, 'book', 'rifle'), and even unique honorific object categories such as 'horse' (Erbaugh 1986). This suggests that there is
actually a fair amount of latitude in how closed class elements can abstract away from object representations. The particular
geometric object properties preserved in the spatial preposition meanings do not follow necessarily (or solely) from the
prepositions' status as closed class elements.
So, although languages must filter out detail from spatial and perceptual representations, such filtering cannot by itself account
for why objects being named are represented in terms of their detailed shapes while objects playing the role of figure or
reference object are represented in terms of geometries such as point, blob, volume, or axes.
shows that damage to the inferior temporal cortex often impairs an animal's ability to identify an object (often by its shape) in
order to obtain food. In contrast, damage to the posterior parietal cortex appears to impair the animal's ability to identify the
location an object occupies, again, in order to obtain food. In the former case, object recognition is impaired while object
location is spared; in the latter case, the ability to locate an object is impaired while object recognition is spared.
This general dissociation of 'what' and 'where' has also received support from human psychophysical studies and from studies of
brain damage in humans. The psychophysical studies indicate that two separate streams of visual processing exist from lower
visual levels up to higher levels. These streams appear to segregate those properties relevant to object identification (primarily
shape and color) from those relevant to locating objects (primarily motion, depth, and location; see Livingstone and Hubel
1989). The human clinical evidence also shows dissociation of object and place through individual cases in which brain damage
can selectively impair either object recognition while sparing object localization or the converse (Farah et al. 1988).
What do these findings imply for our studies of spatial language? We conjecture that the two kinds of object representation we
have uncovered in studies of language learning may be rooted in the larger design of the spatial representational system. The
reason that detailed shape matters for object naming is that the count nouns naming objects draw on representations in the 'what'
system. And the reason that sparse shape only point, line, volume, etc. matter for object location is that spatial prepositions draw
on representations in the 'where' system, a system that does not represent object shape in detail. We speculate that the relatively
simple shape specifications observed in the preposition system reveal the extent of detail possible in object descriptions within
the 'where' system. That is, the degree of filtering seen for objects when they play the role of figure or ground is rooted in the
fact that the spatial locational system simply does not, by itself, represent object shapes in detailed fashion.
This hypothesis is compatible with the view put forth by Talmy (1983) in his explanation of why prepositions and other closed
class elements have the sparse structure that they do. Talmy proposes that the structure of the closed class reflects much of the
fundamental structure of cognition; for example, that the cognitive system itself represents location in relatively qualitative (non-
metric) terms. The present proposal further points to a reason why certain members of the closed class spatial prepositions
abstract away from shape as they do. We propose that, in part, they do
reflect the organization of spatial cognition, and we have provided converging evidence from non-linguistic studies that this
characterization of the spatial locational system is correct. It is an empirical question whether similar correspondences can be
found within other closed class sets such as markers of time, aspect, number, etc.
This view suggests a reason for the fact that children find it easy to preserve shape when learning an object's name, but preserve
shape only schematically when learning a term for an object's place. They do so because object and place are represented
separately and distinctly, engaging different schematizations of the world.
At the most general level, this view emphasizes the role of observational context, for what the child observes under different
linguistic conditions does seem to guide his inferences about the meaning of a new word. But in detail, the evidence and
arguments presented in this paper should serve as a sobering reminder that to note the importance of observational context is not
to solve the problems of exactly how this context is engaged during language learning. Our challenge in solving that problem is
to discover in detail how learners do represent the world for the purposes of language learning. The study of spatial language
appears to provide fertile ground for meeting these challenges.
References
Allan, K., 1977. Classifiers. Language 53(2), 285311.
Armstrong, S., L. R. Gleitman and H. Gleitman, 1983. What some concepts might not be. Cognition 13(3), 263308.
Au, T. and E. Markman, 1987. Acquiring word meanings via linguistic contrast. Cognitive Development 2, 217236.
Baldwin, D., 1992. Clarifying the role of shape in children's taxonomic assumption. Journal of Experimental Child Psychology
54, 392416.
Banks, M. and P. Salapatek, 1983. Infant visual perception. In: M. Haith, J. Campos (eds.), Infancy and developmental
psychobiology, Vol. 2 of P. H. Mussen (ed.), Handbook of child psychology, 435572. New York: Wiley.
Becker, A. and T. Ward, 1991. Children's use of shape and texture with objects and substances. Paper presented at the Society
for Research in Child Development, Seattle, Washington.
Biederman, I., 1987. Recognition-by-components: A theory of human image understanding. Psychological Review 94(2),
115147.
Bloom, L., 1973. One word at a time. The Hague: Mouton.
Bloom, P., 1994. Possible names. The role of syntax-semantic mappings in the acquisition of nominals. Lingua 92, 297329 (this
volume).
Bowerman, M., 1991. The origins of children's spatial semantic categories: Cognitive vs. linguistic determinants. In: J.J.
Gumperz, S.C. Levinson (eds.), Rethinking linguistic relativity. Cambridge, MA: Cambridge University Press.
Brown, P., 1993. The role of shape in the acquisition of Tzeltal (Mayan) locatives. 25th Annual Child Language Research
Forum, Stanford University, April.
Brown, R., 1957. Linguistic determinism and parts of speech. Journal of Abnormal and Social Psychology 55, 15.
Brown, R., 1973. A first language. Cambridge, MA: Harvard University Press.
Carey, S., 1982. Semantic development: The state of the art. In: E. Wanner, L.R. Gleitman (eds.), Language acquisition: The
state of the art, 347389. New York: Cambridge University Press.
Carey, S., 1985. Conceptual change in childhood. Cambridge, MA: Bradford Books/MIT Press.
Carey, S., 1994. Does learning a language require the child to reconceptualize the world? Lingua 92, 143167 (this volume).
Clark, E.V., 1973. What's in a word? On the child's acquisition of semantics in his first language. In: T.E. Moore (ed.),
Cognitive development and the acquisition of language, 65110. New York: Academic Press.
Clark, H., 1973. Space, time, semantics, and the child. In: T.E. Moore (ed.), Cognitive development and the acquisition of
language, 2764. New York: Academic Press.
Choi, S. and M. Bowerman, 1991. Learning to express motion events in English and Korean: The influence of language-specific
lexicalization patterns. Cognition 41, 83122.
Erbaugh, M., 1986. Taking stock: The development of Chinese noun classifiers historically and in young children. In: C. Craig
(ed.), Noun classes and categorization, 399436. Amsterdam: Benjamins.
Farah, M., 1988. Is visual memory really visual? Overlooked evidence from neuropsychology. Psychological Review 95(3),
307317.
Farah, M., K. Hammond, D. Levine and R. Calvanio, 1988. Visual and spatial mental imagery: Dissociable systems of
representation. Cognitive Psychology 20, 439462.
Feldman, H., S. Goldin-Meadow and L.R. Gleitman, 1978. Beyond Herodotus: The creation of language by linguistically
deprived deaf children. In: A. Locke (ed.), Action, symbol, and gesture: The emergence of language. New York: Academic
Press.
Fisher, C., H. Gleitman and L.R. Gleitman, 1991. On the semantic content of subcategorization frames. Cognitive Psychology
23(3), 331392.
Fisher, C., D.G. Hall, S. Rakowitz and L. Gleitman, 1994. When it is better to receive than to give: Syntactic and conceptual
constraints on vocabulary growth. Lingua 92, 333375 (this volume).
Fodor, J., 1981. The present status of the innateness controversy. Representations. Cambridge, MA: MIT Press.
Gentner, D., 1982. Why nouns are learned before verbs: Linguistic relativity vs. natural partitioning. In: S. Kuczaj (ed.),
Language development: Language, culture, and cognition 301334. Hillsdale, NJ: Erlbaum.
Gibson, E and A. Walker, 1984. Development of knowledge of visual-tactual affordances of substance. Child Development 55,
453460.
Gilette, J. and L.R. Gleitman, forthcoming. Observation and noun verb learning.
Gleitman, L.R., 1990. The structural sources of verb meanings. Language Acquisition 1, 355.
Gleitman, L.R., H. Gleitman and E. Wanner, 1988. Where learning begins: Initial representations for language learning. In: F.
Newmeyer (ed.), Linguistics: The Cambridge Survey. New York: Cambridge University Press.
Grimshaw, J., 1981. Form, function, and the language acquisition device. In: C. Baker, J. McCarthy (eds.), The logical problem
of language acquisition, 183210. Cambridge, MA: MIT Press.
Grimshaw, J., 1994. Lexical reconciliation. Lingua 92, 411429 (this volume).
Herskovits, A., 1986. Language and spatial cognition: An interdisciplinary study of the prepositions in English. Cambridge:
Cambridge University Press.
Inhelder, B. and J. Piaget, 1964. The early growth of logic in the child. New York: Norton, 1969.
Jackendoff, R., 1983. Semantics and Cognition. Cambridge, MA: MIT Press.
Jackendoff, R., 1987. On beyond zebra: The relation of linguistic and visual information. Cognition 26, 89114.
Jackendoff, R., 1992. Mme. Tussaud meets binding theory. Natural Language and Linguistic Theory 10, 131.
Jackendoff, R. and B. Landau, 1991. Spatial language and spatial cognition. In: D.J. Napoli (ed.), A Swarthmore Festschrift for
Lila Gleitman, 145170. Hillsdale, NJ: Erlbaum.
Johnston, J.R. and D.I. Slobin, 1978. The development of locative expressions in English, Serbo-Croatian, and Turkish. Journal
of Child Language 6, 529545.
Johnson, K.E., C.B. Mervis and J.S. Boster, 1992. Developmental changes within the structure of the mammal domain.
Developmental Psychology 28(1), 7483.
Jones, S., L. Smith and B. Landau, 1991. Objects properties and knowledge in early lexical learning. Child Development 62,
499516.
Keil, F., 1989. Concepts, kinds, and conceptual development. Cambridge, MA: MIT Press.
Kelly, M.H. and S. Martin, 1994. Domain-general abilities applied to domain-specific tasks: Sensitivity to probabilities in
perception, cognition, and language. Lingua 92, 105140 (this volume).
Landau, B., 1982. Will the real grandmother please stand up? The psychological reality of dual meaning representations. Journal
of Psycholinguistic Research 11(1), 4762.
Landau, B., 1991. Spatial representations of objects in the blind child. Cognition 38, 145178.
Landau, B. and L. Gleitman, 1985. Language and experience: Evidence from the blind child. Cambridge, MA: Harvard
University Press.
Landau, B. and R. Jackendoff, 1993. 'What' and 'Where' in spatial language and spatial cognition. Behavioral and Brain Sciences
16, 217266.
Landau, B. and E. Spelke, 1988. Geometrical complexity and object search in infancy. Developmental Psychology 24(4),
512521.
Landau, B. and D. Stecker, 1990. Objects and places: Geometric and syntactic representation in early lexical learning. Cognitive
Development 5, 287312.
Landau, B., L.B. Smith and S. Jones, 1988. The importance of shape in early lexical learning. Cognitive Development 3,
299321.
Landau, B., L. Smith and S. Jones, 1992a. Syntactic context and the shape bias in children's and adults' lexical learning. Journal
of Memory and Language 31, 807825.
Landau, B., L. Sorich and D. Stecker, 1990. Geometric concepts in young children's uses of in and on. International Conference
on Infant Studies, Montreal.
Landau, B., M. Leyton, E. Lynch and C. Moore, 1992b. Rigidity, malleability, object kind, and object naming. Psychonomics
Society Meeting, St. Louis, April.
Landau, B., M. Leyton, E. Lynch and C. Moore, 1993. Perception, object kind, and naming. Manuscript.
Levin, B. and M. Hovav, 1991. Wiping the slate clean: A lexical semantic exploration. Cognition 41, 123152.
Levinson, S., 1992. Vision, shape, and linguistic description: Tzeltal body-part terminology and object description. Working
paper No. 12, Cognitive Anthropology Research Group, Max Planck Institute for Psycholinguistics.
Livingstone, M. and D. Hubel, 1989. Segregation of form, color, movement, and depth: Anatomy, psychology, and perception.
Science 240, 740749.
Lynch, K., 1960. The image of the city. Cambridge, MA: MIT Press.
Markman, E.M., 1994. Constraints on word meaning in early language acquisition. Lingua 92, 199277 (this volume).
Markman, E. and J. Hutchinson, 1984. Children's sensitivity to constraints on word meaning Taxonomic versus thematic
relations. Cognitive Psychology 16(1), 127.
McKenzie, B.E., R.H. Day and E. Ihsen, 1984. Localization of events in space: Young infants are not always egocentric. British
Journal of Developmental Psychology 2, 19.
Miller, G. and P. Johnson-Laird, 1976. Language and perception. Cambridge, MA: Harvard University Press.
Mulford, R., 1985. First words of the blind child. In: M.D. Smith, J.L. Locke (eds.), The emergent lexicon: The child's
development of linguistic vocabulary. New York: Academic Press.
Murphy, G. and D. Medin, 1985. The role of theories in conceptual coherence. Psychological Review 92, 289316.
Murphy, G.L. and J.C. Wright, 1984. Changes in conceptual structure with expertise: Differences between real world experts
and novices. Journal of Experimental Psychology: Learning, Memory, and Cognition 10, 144155.
Pinker, S., 1989. Learnability and cognition: The acquisition of argument structure. Cambridge, MA: MIT Press.
Pinker, S., 1994. How could a child use verb syntax to learn verb semantics? Lingua 92, 377410 (this volume).
Quine, W., 1960. Word and object. Cambridge, MA: MIT Press.
Rieser, J.J., 1979. Spatial orientation of six-month-old infants. Child Development 50, 10781087.
Rosch, E., C.B. Mervis, W.D. Gray, D.M. Johnson and P. Boyes-Braem, 1976. Basic objects in natural categories. Cognitive
Psychology 8, 382439.
Rueckl, J., K. Cave and S. Kosselyn, 1988. Why are 'what' and 'where' processed by separate cortical visual systems? A
computational investigation. Journal of Cognitive Neuroscience 1, 171186.
Schachter, P., 1985. Parts of speech. In: T. Shopen (ed.), Language typology and syntactic description, Vol. 1: Clause structure,
361. Cambridge, MA: Cambridge University Press.
Smith, L.B., 1989. A model of perceptual classification in children and adults. Psychological Review 96, 125144.
Smith, E. and D. Medin, 1981. Categories and concepts. Cambridge, MA: Harvard University Press.
Smith, L.B., S. Jones and B. Landau, 1992. Count nouns, adjectives, and perceptual properties in novel word interpretations.
Developmental Psychology 28, 273286.
Soja, N., S. Carey and E. Spelke, 1991. Ontological categories guide young children's inductions of word meaning: Object
terms and substance terms. Cognition 38(2), 179211.
Spelke, E., 1990. Principles of object perception. Cognitive Science 14, 2956.
Supulla, E., 1988. Structure and acquisition of verbs of motion in American Sign Language. Cambridge, MA: Bradford Books.
Talmy, L., 1983. How language structures space. In: H. Pick, L. Acredolo (eds.), Spatial orientation: Theory, research, and
application. New York: Plenum Press.
Talmy, L., 1985. Lexicalization patterns: Semantic structure in lexical forms. In: T. Shopen (ed.), Language typology and
syntactic description, Vol. 3: Grammatical categories and the lexicon, 57149. Cambridge: Cambridge University Press.
Tanz, C., 1980. Studies in the acquisition of deictic terms. Cambridge: Cambridge University Press.
Ungerleider, L.G. and M. Mishkin, 1982. Two cortical visual systems. In: D.J. Ingle, M.A. Goodale, R.J.W. Mansfield (eds.),
Analysis of visual behavior. Cambridge, MA: MIT Press.
Waxman, S.R., 1994. The development of an appreciation of specific linkages between linguistic and conceptual organization.
Lingua 92, 229257 (this volume).
Possible names:
The role of syntax-semantics mappings
in the acquisition of nominals*
Paul Bloom
Department of Psychology, University of Arizona, Tucson, AZ 85721, USA
Many scholars have posited constraints on how children construe the meanings of new words. These include the restriction that
new words refer to kinds of whole objects (Markman and Hutchinson 1984), that words describing solid objects refer to
individuated objects while words describing non-solid substances refer to portions of substance (Soja et al. 1991), and that count
nouns that name objects are generalized on the basis of shape (Landau et al. 1988). There are theoretical and empirical problems
with these proposals, however. Most importantly, they fail to explain the fact that children rapidly acquire words that violate
these constraints, such as pronouns and proper names, names for substances, and names for non-material entities. The theory
defended here is that children and adults possess mappings from grammatical categories ('count noun', 'mass noun', and 'noun
phrase') to abstract semantic categories; these mappings serve to constrain inferences about word meaning. Evidence from
developmental psychology and linguistic theory is presented that suggests that even very young children possess such mappings
and use them in the course of lexical development. Further issues such as the possibility of developmental change, the precise
nature of these semantic categories, and how children learn words prior to the acquisition of syntax are also discussed. It is
concluded that although these syntax-semantics mappings are not by themselves sufficient to explain children's success at word
learning, they play a crucial role in lexical development. As such, only a theory that posits a deep relationship between syntax
and semantics can explain the acquisition and representation of word meanings.
'Language must have its perfectly exclusive pigeon-holes and will tolerate no flying vagrants. Any concept that asks for
expression must submit to the classificatory rules of the game It is almost as though at some period in the past
* I am grateful to Felice Bedford, Susan Carey, Frank Keil, Ellen Markman, Janet Nicol, Mary Peterson, Nancy Soja, an
anonymous reviewer, and especially Karen Wynn for their very helpful comments on earlier versions of this paper. This
work was supported by a NIH Biomedical Research Support Grant.
the unconscious mind of the race had made a hasty inventory of experience, committed itself to a premature classification that
allowed of no revision, and saddled the inheritors of its language with a science that they no longer quite believed in nor had
the strength to overthrow. Dogma, rigidly prescribed by tradition, stiffens into formalism. Linguistic categories make up a
system of surviving dogma dogma of the unconscious.'
Edward Sapir, Language (1921: 99100)
1. Introduction
One of the deepest mysteries in the study of language development is how children learn the meanings of words. A child's
vocabulary grows at an extraordinary rate one estimate is that children acquire about nine new words a day from the age of 18
months to six years (Carey 1978) and there is little understanding of how this process takes place. This ignorance is due at least
in part to the fact that there is no consensus on what it is for someone to possess 'the meaning of a word' (for discussion, see
Carey 1982, Lakoff 1987, Premack 1990). In general, no theory of acquisition can be complete without some understanding of
the nature of what must be acquired.
A further difficulty concerns the nature of the learning problem itself. Word learning is standardly viewed as an inductive
process and, as Goodman (1983) has stressed, there is an infinity of logically possible generalizations that one can make on the
basis of a finite set of instances. To take a specific example, consider an adult pointing to Fido and saying to a child 'Look at the
dog'. Imagine that somehow the child is capable of determining what the word is intended to describe, i.e., it describes Fido, not
the child, or the finger, or the act of pointing, etc. Imagine further that the child can segment the utterance into words and can
determine that the relevant word is 'dog', not 'look', 'at', or 'the'. Still, there are countless possible meanings of this novel word. It
could refer to the basic-level kind (dogs), but it could also refer to a subordinate kind (poodle), or a superordinate kind (animal),
or to the individual (Fido). It could refer to the color of the entity being pointed to (brown), to its shape (oblong), or its size
(large). It could refer to a part of the dog (tail). It could refer to the front half of the dog, to dogs until the year 2000 and then to
pigs, to all dogs and all pencils, to all dogs and also to Richard Nixon. To modify an example from Quine (1960), it could even
refer to undetached dog parts.
From a logical standpoint, all of these examples are possible, since they are all consistent with the ostensive act. From a
psychological standpoint, however, some of these possibilities are ludicrous no child would ever construe the word 'dog' as
referring to just the front half of Fido, or to the category of 'dogs and pencils'. Any theory of word learning must explain why
some word meanings are more natural than others, and how children determine which of the set of natural meanings that a word
could have actually corresponds to its meaning in a given language.
More generally, any successful inductive procedure requires that hypotheses be somehow ordered or ranked (Fodor 1975,
Goodman 1983) and one possibility is that there exist constraints that rule out (or bias against) entire classes of hypotheses. In
the domain of language development, Markman and her colleagues have presented the following two constraints, which are
argued to be special to the domain of word learning (e.g., Markman and Hutchinson 1984, see Markman 1990 for a review):
Whole Object constraint:
' a novel label is likely to refer to the whole object and not to its parts, substance, or other properties.' (Markman 1990:
59)
Taxonomic constraint:
' labels refer to objects of the same kind rather than to objects that are thematically related' (Markman 1990: 59).
Thematically related entities include those that fall into 'spatial, causal, temporal, or other relations' such as a dog and its
bone, a dog and the tree that it is under, a dog and the person who is petting it, and so on. Although children are sensitive
to these sorts of relations in non-linguistic tasks (for instance, they will put a dog and a bone together when asked to sort
objects into different piles), this constraint forces them to attend to taxonomies (such as the kind 'dog') when faced with
the task of inferring the meaning of a new word.
Another proposal, advanced by Soja et al. (1991: 182183), is that the following two procedures apply in the process of word
learning:
Procedure 1:
Test to see if the speaker could be talking about a solid object; if yes,
Step
1:
Conclude that the word refers to individual whole objects of the same type as
the referent.
Step
2:
Procedure 2:
Test to see if the speaker could be talking about a non-solid substance; if yes,
Step
1:
Conclude that the word refers to portions of substance of the same type as
the referent.
Step
2:
Finally, Landau et al. (1988, 1992) posit the following bias in lexical acquisition:
Shape bias:
' the bias to group objects by shape in the presence of a novel count noun ' (Landau et al. 1992: 87)
There is by now a large body of evidence showing that 2- and 3-year-olds behave in accordance with these posited constraints.
When taught a word for a novel object, children tend to categorize the word as referring to other whole objects of the same
kind; they will not extend the word to entities sharing a 'thematic' relation (Markman and Hutchinson 1984, Waxman and
Gelman 1986) and will not initially interpret it as referring to a part of the object, a property of the object, or the stuff that the
object is made of (Baldwin 1989, Clark 1973, Macnamara 1982, Markman and Wachtel 1988, Soja et al. 1991, Taylor and
Gelman 1988). In contrast, when taught a word for a novel non-solid substance, they will tend to generalize the word on the
basis of the kind of substance (perhaps using texture and color as cues), and ignore properties such as shape and size (Soja 1987,
1992; Soja et al. 1991). Finally, when taught count nouns that describe objects, children will tend to generalize these nouns on
the basis of shape, not color, size, or texture (Baldwin 1989, Landau et al. 1988).
Nevertheless, there are reasons to doubt that these precise constraints are present in the minds of young children. For one thing,
they are false of adult language all languages have words that do not refer to taxonomies, words that do not refer to whole
objects, and count nouns that name kinds of objects that do not share a common shape. Below it is argued that these
counterexamples are also present in the language of very young children. A further concern is that the sole motivation for
positing these constraints is their role in solving the word learning problem. It would be preferable to derive these constraints
from deeper properties of language and cognition, instead of having to simply stipulate them.
In this paper, I present a theory of where constraints on word meaning
come from. It is often maintained that children possess mappings between syntax and semantics which facilitate grammatical
and lexical development (e.g., Bloom 1990a,b; 1994a,b; Brown 1957, Carey 1982, Gleitman 1990, Grimshaw 1981, Katz et al.
1974, Landau and Gleitman 1985, Landau et al. 1988, 1992; Macnamara 1982, 1986; Macnamara and Reyes 1990, Naigles
1990, Pinker 1984, 1989; Taylor and Gelman 1988, Waxman 1990). It is argued here that the existence of certain syntax-
semantics mappings in the domain of nominals makes it unnecessary to posit special word-learning constraints and that a theory
based on such mappings allows for a better explanation of how young children learn words.1
'handle'). (In fact, some nouns refer to parts of objects that can never appear as discrete entities, such as 'surface' or 'coating'.)
The largest class of exceptions are count nouns like 'nap', 'idea', 'race', and 'dream', which do not refer to material entities at all.
Given that the shape bias is intended to apply only to count nouns that name objects, words like 'nap' do not violate this bias.
But other nouns do, such as names for collections ('army', 'family'), superordinates ('animal', 'weapon') and relationship terms
('brother', 'friend'). In fact, it is far from clear that shape is the crucial dimension even for count nouns that refer to basic-level
discrete whole objects, which are those considered by Landau et al. (1988). As Soja et al. (1991, 1992) argue, even for young
children, shape is not criterial in determining the extension of count nouns like 'dog' or 'skunk'; something can be thought of as a
dog even if it is shaped like a cat (Keil 1989). What Landau et al. have found is that children prefer to generalize novel nouns
referring to discrete material objects on the basis of shape as opposed to size or texture. But this is a far weaker conclusion than
the claim that count nouns, or even count nouns referring to objects, 'correspond to categories whose members have similar
shapes' (Landau et al. 1988: 316). On the contrary, there is only a small number of nouns, such as 'square', 'globe', and
'pyramid', where shape is an essential property of how they are used.
The taxonomic constraint is also false for adults. While it is true that words cannot refer to chains of thematically related
categories, there are nominals that do not refer to taxonomies; these include pronouns and proper names ('he', 'Fred', 'Canada'),
which refer to particular individuals, and do not generalize to other entities.
Of all of the constraints proposed above, only those advanced by Soja et al. (1991) are largely correct with regard to adult
language but (once again) only when we replace 'word' with 'noun' (as they themselves suggest, p. 203). It is clearly false that
any word used to describe a solid object can be extended to objects of the same type, since the word could be describing a
property of the object ('red'), the state of the object ('resting'), and so on, and the same observation applies to the claim that any
word used to describe a non-solid substance can be extended to that type of substance. But although these procedures work
better when their domain is restricted to nouns, there are still cases where they fail. Some mass nouns, like 'wood' and 'metal',
describe solid objects but are not extended to objects of the same type; they are extended to objects composed of the same
material. The opposite sort of counterexample concerns count nouns like 'pile' and 'puddle'; they describe substances, but are
extended on the basis of configuration, not substance-kind.
The rapid acquisition of such words is clearly problematic for the whole object constraint and the taxonomic constraint. It also
poses a difficulty for the shape bias. As noted by Soja et al. (1991, 1992), it would be surprising indeed if children extended
words like 'uncle' and 'clock' on the basis of shape; if they did, they would be unable to use these words in any way similar to
adults.
Finally consider the Soja et al. (1991) procedures, which state that words describing objects refer to kinds of individual whole
objects and words describing substances refer to portions of substance. Revising their proposal so that it applies only to nouns,
this predicts that children should be unable to acquire solid substance names ('wood') and names for bounded substances
('puddle'). But in fact even 2-year-olds can learn solid substances names (Prasada 1993), and there is evidence from Soja (1992)
suggesting that they can also acquire names for bounded entities (see section 4.2 for discussion). It is worth noting, however,
that these are not children's first guesses as to a word's meaning. For instance, when a word is used to describe a novel bounded
entity, children's first interpretation of its meaning is that it refers to the kind of object, not the stuff that the object is made out
of, and this holds regardless of the syntax in which the word is presented (Au and Markman 1987, Dickinson 1988, Markman
and Wachtel 1988, Soja 1987). Nevertheless, the fact that words like 'wood' are acquired at all militates against the claim that
children possess the procedures posited by Soja et al. (1991).
None of the counterexamples discussed above necessarily refutes the hypothesis that these constraints exist. One possibility,
discussed in detail by Markman (1989), is that although they are present at the start of lexical development, the constraints can
be abandoned or 'overridden' in the course of language development (possibly as the result of the application of other
constraints; see footnote 1). More generally, they can be viewed as default conditions which only apply in the absence of certain
countervailing circumstances, and such circumstances may be present at any stage of lexical development.
Under this interpretation of what constraints are, however, it is unclear whether any degree of understanding on the part of
young children could refute these proposals. For instance, pronouns and proper names are acquired very early, often before
children have learned any other noun that describes people (such as 'person', 'man', and 'woman'). As such, constraints such as
Mutual Exclusivity cannot block the child from interpreting words like 'Fred' and 'she' incorrectly, as names for kinds of whole
objects. In fact, these sorts of errors do not occur (see Macnamara 1982) but a reasonable reply by a
constraint-theorist is that the special status of what pronouns and proper names refer to (i.e., people) causes children to override
the taxonomic assumption in such cases. In general, the fact that these constraints are posited as default conditions makes it
important to provide some theory of what other rules and constraints can override them. Without such a theory, the constraint
proposal runs the risk of begging all of the difficult questions.
In sum, the theories of Markman, Landau et al., and Soja et al. cannot account for most of the words that children acquire. But
the very same induction problem that exists for the acquisition of a word such as 'dog' also exists for words such as 'Fred',
'water', and 'forest', and thus the very same arguments for the necessity of constraints also apply. The goal of a theory of lexical
development is to account for the acquisition of all words, not just names for kinds of objects, and this motivates an effort to
explain the acquisition of object names in the context of a more general theory of word learning.
2.3. Learning issues
Where do constraints on word learning come from? In this regard, it is worth echoing Nelson's (1988) complaint that some of
the theorists who posit these constraints are vague as to whether or not they are presumed to be unlearned. There is a strong
argument, however, that the only claim consistent with the idea of such constraints is that at least some of them are present prior
to the onset of word learning. The motivation for positing constraints in the first place is to explain how children solve the
induction problem and learn words. From this perspective, it would be contradictory to claim that (for example) children learn
that words describe kinds of whole objects, as this would require that they first learn the meanings of some words and then
notice that they tend to refer to whole objects. This would imply that children are able to acquire words without this constraint,
and thus one could not appeal to its existence as an explanation of how children initially solve the word learning problem.
Consider also the specific proposal that children induce that members of certain grammatical categories tend to share certain
meaningful properties, and that this is the origin of some of the constraints. Landau et al. (1988: 317) suggest that 'the
development of a same-shape preference in children may originate in language learning, specifically in the process of learning
count nouns. Many of the words acquired by early language learners do in
fact partition the world according to the shapes of the objects in it young children very quickly realize this, abstracting a rule
from their early word learning experience that says shape is the critical factor in decisions about the extensions of these nouns.
Then they use this rule when encountering new nouns and new classes, thus immensely simplifying and speeding up the
mapping of the one onto the other'. While possible, this assumes that children are able to learn the correct meanings of count
nouns (and can thus infer that they tend to refer to shape) prior to the onset of the shape bias. Thus although this bias might
facilitate word learning later on, we are still left with the problem of what constrains children's inferences in the first place.
If the constraints are unlearned, what is their precise nature? Do they constitute a subpart of a distinct language acquisition
mechanism that exists solely to facilitate the learning of words? This is certainly conceivable, but it would be preferable to
motivate these constraints on word learning in terms of more general properties of children's linguistic and cognitive
competence. One specific proposal, advanced below, is that these constraints emerge from other properties of children's
knowledge; in particular, from children's grasp of syntax-semantics mappings.
in order to account for words like 'Fred' and 'she' which refer to individuals and are not 'generic' in the sense discussed by Di
Sciullo and Williams.
An alternative is that the relevant distinction is grammatical, and related to how languages use syntactic categories to express
meaningful notions and relations (Bloom 1990a, Jackendoff 1983). In particular, it is not words that have generic reference it is
categories such as nouns and verbs. Nouns like 'dog' and 'water' are generic in the sense that they can be extended to an
indefinite number of novel instances (an infinity of different dogs, an infinity of different portions of water). Put differently, they
refer to kinds, not to particular individuals or entities.
In contrast, noun phrases (NPs) like 'the big dog' can be conceptualized as referring to individuals, and not to kinds. The
standard examples of this are when nouns combine with quantifiers to become NPs. Thus 'a dog' can pick out a particular
individual that happens to be a dog, 'those big dogs' picks out those dogs that have the property of being big, and so on. (See
Parsons, 1990, for a discussion of how a similar analysis can apply to verbs and VPs.)
The distinction between words and nouns is crucial here, since some words are NPs, not nouns. This allows us to explain the
peculiar status of pronouns and proper names. From the standpoint of grammar, they are lexical NPs (see Bloom 1990b). With
regard to their role in syntactic structure, words like 'Fred' and 'he' behave like phrasal NPs such as 'the dog' and thus cannot
appear with adjectives and determiners. Under the hypothesis that pronouns and proper names are NPs, we can posit the
following mappings:
Mapping 1: NPs refer to individuals.
Mapping 2: Count nouns and mass nouns refer to kinds.
It is likely, however that Mapping 1 is too strong; there are NPs that appear not to refer at all. In a language like English, these
include expletive pronouns, such as 'it' as in 'it is raining' or 'there' as in 'there is trouble brewing'. One theory of why such
semantically empty NPs exist is because of a requirement in English that all tensed sentences must have overt subjects
(Chomsky 1981), even if these subjects play no meaningful role. In languages like Italian, where overt subjects are not
necessary, one can say the equivalent of 'is raining', while in English, it is necessary to add the meaningless NP 'it' in order to
satisfy the grammatical requirement (but see Bolinger, 1973, for
evidence that even expletives have some semantic content). This class of counterexamples (and many others; see Bloom, under
review) suggests that the mapping from NPs to individuals does not hold in all cases. These exceptions must somehow be
learnable by children through something other than the mapping posited above.
One possibility, first advanced by Nishigauchi and Roeper (1987), is that children must initially acquire a given NP by using the
syntax-semantics mapping (e.g., referential 'it'), and only after having done so, can they understand the same word or string of
words in a semantically empty context (e.g., expletive 'it'). This makes the prediction that across different languages, all words
and phrases that are NP expletives must also be NPs of the semantically well-behaved type (NPs that refer), because otherwise
children would not be able to acquire them. There is some evidence that this is the case (Nishigauchi and Roeper 1987), and one
could make the further prediction that children can only categorize a string of words within an idiom as an NP (e.g., 'the bucket'
as in 'kick the bucket') if they are already capable of construing that string of words as having some referential meaning when it
is outside of the idiom. A theory along these lines is discussed in Bloom (under review); for the purposes here, it should be
noted that a complete account of lexical acquisition must explain how children acquire these sorts of non-referential NPs.2
3.2. Count nouns vs. mass nouns
If we restrict their domain to count nouns, we can collapse the whole object constraint, the taxonomic constraint, and the shape
bias as follows:
Count nouns refer to kinds of whole objects (Markman and Hutchinson 1984) and children are biased to extend them on
the basis of shape (Landau et al. 1988).
This generalization connects with a sizable literature that attempts to discover the semantic basis of the grammatical count/mass
distinction (e.g.,
2 A different sort of puzzle concerns NPs that apparently refer to kinds, as with the subject NPs 'dogs' and 'water' in the
sentences 'Dogs are friendly animals' and 'Water is good to drink'. These NPs can be construed as referring not to actual
dogs or actual portions of water, but to abstract kinds to the species DOG and the substance WATER. Under one analysis,
they serve as proper names for these kinds, and thus refer to kinds in a very different way than do nouns (Carlson 1977).
In any case, the acquisition and representation of these NPs constitute a further domain of study from the standpoint of
syntax-semantics mappings.
Bach 1986, Bloom 1990a, 1994a; Bloomfield 1933, Gordon 1985, 1988; Jackendoff 1991, Langacker 1987, Levy 1988,
McCawley 1975, McPherson 1991, Mufwene 1984, Quine 1960, Ware 1979, Weinrich 1966, Whorf 1956, Wierzbicka 1985, see
Gathercole 1986 for a review). In general, it is clear that objects tend to be described by count nouns ('a dog', 'five tables') and
non-solid substances tend to be described by mass nouns ('much water', 'less sand'). This holds for languages other than English;
across different language families, names for entities like dogs and tables are always count nouns, and names for entities like
water and sand are always mass nouns (Markman 1985, Mufwene 1984, Wierzbicka 1985). This pattern is unlikely to be a
coincidence, and it might lead one to the hypothesis that entities described by count nouns have perceptually salient boundaries,
and thus are countable, while mass nouns describe everything else.
Nevertheless, the same objections discussed earlier against the whole object constraint have also been made against this theory
of count/mass syntax (e.g., by Ware 1979). One objection is that many count nouns do not describe whole objects; there exist
abstract words like 'nap' and 'joke', as well as collective nouns like 'forest' and 'army'. If one is to retain the notion that the
grammatical count/mass contrast maps systematically onto some cognitive division, the cognition side of the mapping must be
considerably more abstract than 'whole object' and 'non-solid substance'.
As a result of these considerations, many scholars have proposed that the grammatical count/mass distinction maps onto the
semantic contrast between nouns that refer to kinds of individuals vs. those that refer to kinds of non-individuated entities,
which we can view as 'portions' (see Bach 1986, Bloom 1990a, 1994a; Bloom and Keleman, under review; Gordon 1982, 1985,
1988; Jackendoff 1983, 1990, 1991; Langacker 1987, 1990; Macnamara 1986, Macnamara and Reyes 1990). The cognitive
notion of inviduals is related to properties such as countability, indivisibility, and boundedness, and is roughly equivalent to
'discrete bounded entity'. Within the domain of material entities, this usually corresponds to whole objects, though it can also
correspond to entities such as a forest, a puddle, and a pile. Outside of the domain of material entities, an event that takes a
bounded interval of time ('a race', 'a conference') can be construed as an individual, as can a mental state ('a headache', 'a
nightmare') or a period of time ('a day', 'an hour'). (Some speculations about the precise notion of 'individual' will be discussed in
section 6.2.)
What support is there for the claim that count/mass syntax actually corresponds to these aspects of abstract cognition? One
source of support is from linguistic analyses (e.g., Barwise and Cooper 1981, Jackendoff 1991,
Langacker 1987). Unless one assumes that there is some consistent semantic property holding across both material count nouns
and abstract count nouns, it is difficult to provide a consistent theory of quantification. In many important regards, an NP such
as 'a dog' is semantically equivalent to an NP such as 'a nightmare', and one way to capture this is by describing 'a' as having the
semantic role of combining with nouns that refer to kinds of individuals to form an NP that can denote a single individual.
Nouns such as 'dog' and 'nightmare' but not nouns such as 'water' and 'advice' refer to kinds of individuals and, thus can be used
with count noun syntax.
There also exists empirical evidence concerning the productive use of these syntax-semantics mappings in adults. In one study,
adults were taught novel words referring to sensations or sounds (Bloom 1994a). The syntax of the word was kept neutral with
regard to its count/mass status; what varied was whether the new word was described as referring to something that occurs in
discrete units of time (temporal individuals) or to something occurring over continuous periods of time (temporal portions). This
had the predicted effect on adult categorization of the new word: Names for temporal individuals tended to be categorized as
count nouns, while names for temporal portions tended to be categorized as mass nouns.
We can now posit the following three mappings:
Mapping 1: NPs refer to individuals
Mapping 2: Count nouns refer to kinds of individuals
Mapping 3: Mass nouns refer to kinds of portions
Before turning to the question of precisely how these three mappings enable children to acquire new words and how they fare
relative to the sorts of hypotheses advanced by Markman and others it is necessary to consider whether young children actually
possess this understanding of the relationship between syntax and semantics.
semantic basis of this distinction until a relatively late age. For instance, Levy (1988: 186) reviews the work of Gordon and
Gathercole and concludes as follows:
'Thus, Gathercole's conclusions are in complete agreement with the conclusions reached by Gordon (1985); namely, that
children first learn the linguistic distinction as a morphosyntactic rather than a semantic distinction.'
Others have reached similar conclusions. Thus Schlesinger (1988: 147), in his discussion of domains in which his semantic
assimilation theory does not apply, states:
' Gordon (1985) and Gathercole (1985) have shown that the count-mass distinction is acquired through formal clues rather
than via the semantic object-substance distinction. The reason seems to be that, in English, there is not a very consistent
correlation between these two distinctions.'
These findings have been taken as strong evidence for a 'distributional theory' of language development. Levy (1988) argues,
following Karmiloff-Smith (1979), that children view the acquisition of grammar as a formal puzzle, 'a problem space per se',
and semantics is irrelevant. This is also Gathercole's (1985) conclusion, but Gordon (1985, 1988) proposes a quite different
view, maintaining that count/mass syntax is based on quantificational semantics from the very start. What young children lack,
according to Gordon, is knowledge of how this semantic contrast maps onto perception. That is, they understand that the
contrast between count nouns and mass nouns corresponds to the distinction between words that refer to kinds of individuals vs.
words that refer to kinds of portions but they lack an understanding that physical objects are canonical individuals and non-solid
substances are canonical portions. If this were correct, then these mappings would be useless as a source of constraint in word
learning.
The specific studies of Gathercole and Gordon are critically reviewed in considerable detail in Bloom (1990a); it will suffice
here to raise a conceptual point. All of the experiments purportedly showing that children's understanding of count/mass is not
semantic involve studying children's sensitivity to linguistic cues. Thus one finding is that if 3- and 4-year-olds hear (e.g.,) 'This
is a blicket' they tend to grammatically categorize 'blicket' as a count noun, while if they hear 'This is some blicket', they tend to
grammatically categorize 'blicket' as a mass noun. Further, children will give these syntactic cues priority over referential cues.
If they hear 'This is a blicket' they are
likely to interpret the word as a count noun regardless of whether they are being shown an object or a substance (Gordon 1985).
One interpretation of this result is that children's understanding of count/mass is not semantic. Instead children possess some
generalization of the form: 'Everything following the word ''a" is a count noun', and this is distinct from any semantic
understanding, which has to be learned at some later point. This assumes a dichotomy between 'linguistic cues' and 'semantic
cues', where the latter is restricted to information that children receive through perception of the external world. An alternative,
however, is to reject this dichotomy altogether. Semantic information can be conveyed through language; when children hear 'a
blicket' and categorize 'blicket' as a count noun, they may be drawing a semantic inference. Specifically, children might encode
the determiner 'a' as having the semantic potential of interacting with a noun that refers to a kind of individual to pick out a
single individual because this is what it means and it follows from this that any noun that follows 'a' must refer to a kind of
individual and thus must be a count noun.
In fact, linguistic cues are a more reliable cue to the semantic status of a novel word than perceptual cues are. This is because a
given percept can be construed in different ways; if a solid object is described as 'blicket', it is quite possible that 'blicket' is
actually a mass noun, because it could refer to the stuff that the object is made out of. But linguistic cues are flawless; every
noun that can co-occur with a quantifier that has the semantic role of individuation has to be a count noun. Given this, the
child's early sensitivity to linguistic information actually supports a semantic theory; it does not refute it.
4.2. Arguments for early competence
What positive evidence exists that children possess the requisite syntax-semantics mappings? Gordon (1982, 1985, 1988)
provides a learnability argument: Children must be capable of using semantic information when acquiring the grammatical
count/mass distinction, because a non-semantic distributional analysis would have to sift through several billion possibilities,
and children have productive command of count/mass syntax by about the age of 2-and-a-half. More generally, the argument is
that some semantic categorization would have to be done by children in order for them to so rapidly converge on the correct
linguistic generalizations, because no other source of information serves to distinguish count nouns from mass nouns in the input
they receive (see also Bloom 1994a).
Children's errors provide further evidence that they are exploiting syntax-semantics mappings. Words that are mass nouns in
English but which refer to entities that can be construed as discrete objects, like 'money', 'furniture', and 'bacon', are occasionally
misencoded as count nouns, e.g., young children will sometimes say things such as 'a money' (Bloom 1994a). These errors are
significantly more frequent than errors with more 'canonical' mass nouns, such as 'water' and 'milk', which refer to substances.
This suggests that the categorization of new words as either count or mass is facilitated (and sometimes hampered by) children's
use of mappings from syntax to semantics.
A further source of support is experimental. In a classic study, Brown (1957) showed 3- to 5-year-olds sets of pictures, one that
depicted an object, another that depicted a substance, and told them to either 'show me a sib' or 'show me sib'. Children tended
to point differently as a function of the syntax; when given a count noun they would tend to point to the object; when given a
mass noun, they would tend to point to the substance.
More recently, Soja (1992) found a sensitivity to syntax-semantics mappings as soon as children start to productively use
count/mass syntax in their spontaneous speech. When these 2-year-olds are taught a mass noun that describes a pile of stuff,
they tend to construe it as a name for that kind of stuff (i.e., as having a similar meaning to 'clay'), but when taught a count noun
that describes a pile of stuff, many appear to construe it as referring not to the stuff itself, but to the bounded pile (i.e., as having
a similar meaning to words like 'puddle' or 'pile'). Interestingly, this effect of syntax was limited to the stuff-condition: when
children were taught count nouns and mass nouns describing a novel object, few of the children construed the mass noun as
referring to the stuff that the object was made of (i.e., they would not construe it as having the same meaning as words like
'wood' or 'metal'). Regardless of the count/mass status of the noun, they would tend to interpret it as a name for that kind of
object. An explanation for this asymmetry will be discussed in section 5.
There is less evidence that young children can extend the semantic implications of count/mass syntax to non-material entities,
but there is one relevant study (Bloom 1994a). Here, 3- and 4-year-olds were taught names for perceptually ambiguous stimuli,
which could be construed as either a set of individuals or as an unindividuated portion. In one condition, the stimulus was food,
either lentils or colored pieces of spaghetti, and was the sort of entity that could be easily named with either a count noun or a
mass noun. In another condition, the stimulus was a string of bell sounds from a tape-recorder,
presented one after the other at a very fast rate, which could be construed either as a set of discrete bells or as an
undifferentiated noise and therefore could also be described with either a count noun or a mass noun.
All children were presented with both the 'food' and the 'bell' stimuli. One group was told: 'These are feps there really are a lot
of feps here' (count noun condition); the other group was told: 'this is fep there really is a lot of fep here' (mass noun condition).
Then the children who were taught the word as a count noun were told to 'give the puppet a fep' for the food condition and, in
the sound condition, were given a stick and a bell and asked to 'make a fep'. The children who were taught the word as a mass
noun were told to 'give the puppet fep' in the food condition or, in the sound condition, to 'make fep' with the stick and the bell.
If children are sensitive to the semantic properties of count/mass syntax, they should act differently in the count condition than
in the mass condition. When asked for 'a fep', they should tend to give one object or make one sound, and when asked for 'fep',
they should tend to give a handful of objects or make a lot of sounds.
These were the results obtained: Both 3- and 4-year-olds performed significantly above chance on both the food and sound
conditions. This finding provides further support for the hypothesis that there is a semantic basis to count/mass syntax even for
non-material entities, and indicates that an understanding of mappings between syntax and semantics is present in 3- and 4-year-
olds.
Sensitivity to the semantics of the noun/NP contrast is evident at an even earlier age than is an understanding of the semantic
basis of count/mass. The mapping hypothesis is that young children should understand that the grammatical contrast between
words that are nouns and words that are NPs corresponds to the contrast between words that refer to kinds and words that refer
to individuals. In a classic study by Katz et al. (1974), the experimenter taught young children new words by pointing to an
object and saying either 'This is a wug' (count noun context) or 'This is wug' (NP context). Even some 17-months-olds were
sensitive to this grammatical difference; when the word was presented as a noun they tended to construe the word as the name
for a kind, but when it was presented as an NP, they tended to construe it as a name for a particular individual (see Gelman and
Taylor, 1984, for a replication with slightly older children). The findings that children younger than two treat nouns and NPs
differently with regard to how they interact with determiners and adjectives, and that they categorize pronouns and proper names
as NPs (Bloom 1990b) constitute further evidence that children possess some grasp of syntax-semantics mappings.
they tend to initially view the appropriate dimension for generalizations as being made on the basis of shape, and this is viewed
as a better basis for extending the noun usage than properties such as color, size, or texture. (Recall that this does not apply for
names for substances, where properties such as texture and color are more relevant; Soja et al. 1991.) The appeal to count nouns
is relevant only insofar as count nouns are the only linguistic category that specifically pick out whole objects in the material
domain, and thus any bias to favor shape is most likely to apply for this set of nouns. But the nature of the bias has to do with
children's understanding of object kinds, not of count nouns.3 Some support for this interpretation comes from the finding that
the bias towards shape appears to shift in the course of development, presumably as the result of the child's expanding
understanding of how different categories of objects might be generalized in different ways (Becker and Ward 1991, Macario
1991).
Finally, consider the procedures of Soja et al. (1991). A solid object is likely to be construed as an individual, and thus a noun
(but not an NP, adjective, or verb) that describes such an object is likely to be a count noun and refer to that kind of object.
Similarly, a substance is likely to be construed as a portion and thus any noun that describes such a substance is likely to be a
mass noun and refer to that kind of portion. Exceptional cases such as 'wood' and 'puddle' are fully consistent with Mappings 2
and 3, but, as discussed above, the mapping from mass nouns to kinds of portions may be difficult for children to exploit in
cases where the portions are solid substances, as it runs afoul of the general cognitive bias to treat objects as individuals. Put
differently, to learn a word like 'wood' the child must construe an object as a unit of stuff, rather than as a single individual, and
this violates the bias to construe discrete physical objects as individuals.
6. Open Questions
6.1. Is There Developmental Change?
The claim above is that 1- and 2-year-olds possess unlearned mappings from count nouns to individuals and mass nouns to non-
individuated entities,
3 Landau et al. (1988) argue that children's different responses on linguistic and non-linguistic tasks (they treat shape as
more relevant for the former) suggests that the shape bias is special to language. As suggested above, however, an
alternative is that the use of the count noun informs the children that the task has to do with objects (where shape is very
relevant); when the noun is not present the children might just as well assume that the task has to do with properties
(where shape may have less priority). Note incidentally that this non-linguistic construal of the shape bias appears to be
more consistent with the theory of objects and places advanced in Landau and Jackendoff (1993).
as well as the bias to construe whole objects as individuals. Is it necessary to posit this sort of abstract knowledge, or might
children start off with a simpler representation, linking up the count/mass distinction directly to the contrast between bounded
and unbounded physical entities, and only later developing the more abstract adult understanding? It appears that even 2-year-
olds possess a cognitive understanding of 'count noun' that includes bounded substances and is thus not limited to the category
of 'whole object' (Soja 1992), but the question remains of whether this understanding is initially restricted to the material
domain, or whether it can extend to sounds, events, collections, and so on.
In the absence of any evidence for developmental change, one might argue that lack of a child-adult difference should be viewed
as the null hypothesis in psychology (Fodor 1975, Macnamara 1982) an argument that gains force from the fact that we have as
yet no understanding of how a cognitive notion can become 'more abstract'. But cases of representational change do appear to
exist (see Carey 1986, 1988), and so it remains an open question whether syntax-semantics mappings are yet another domain
where children differ from adults.
No decisive evidence exists at this point, but there are three sources of evidence suggesting that the abstract adult-like
understanding is present in very young children.
First, some evidence concerning early possession of the notion of '°individual' emerges from the research of Starkey et al.
(1990), who found that 6- to 8-month-olds possess a unified concept containing both whole objects and temporally bounded
sounds. In one study, infants were exposed to either two sounds or three sounds. Immediately following this, two pictures were
simultaneously shown to the infants, one with two objects and one with three objects. The subjects tended to look longer at the
picture which showed the same number of objects as there were sounds, providing some evidence that infants possess notions of
'two individuals' and of 'three individuals', where 'individual' encompasses both sounds and objects. Along the same lines, Wynn
(1990) discovered that almost immediately after children are able to use the linguistic counting system to count objects, they can
also use it to count sounds and events. These studies suggest that, quite independently of syntax, children do have the
appropriate abstract semantic notion of 'individual'.
Second, as noted above, children appear to be capable, even at a very early age, of productively and appropriately using words
that refer to non-material entities, such as temporal intervals ('day', 'minute') events ('bath', 'nap') and abstract entities ('story',
'joke'). If it turns out that (i) they encode these words as falling into the grammatical category of 'count noun' and (ii) they
understand them in the same way that adults do, then this would show that the abstract understanding of the count/mass
distinction is present in 2-year-olds. But there is no strong support at present for either of these claims, at least not for children
younger than three.
Finally, there is the experimental study noted above (Bloom 1994a), which showed that 3- and 4-year-olds are sensitive to the
application of quantificational syntax in a domain of non-material entities (sounds). Once again, however, evidence for this sort
of capacity on the part of younger children does not yet exist. At this point, then, it remains possible that the semantic category
'individual' emerges from some sort of simpler representation, such as 'bounded physical entity'.
6.2. What Is The Nature Of 'Kind Of Individual'?
This brings us to the second concern. Without a substantive theory of the precise nature of 'kind of individual', the sort of
account proposed here runs the risk of being empty. Despite the central role of this notion in semantic theories of quantification
and reference, we have as yet little understanding of how it links up with perception and non-linguistic cognition, and how it
serves to constrain the extent of possible word meanings.
If we assume that count nouns refer to kinds of individuals, it is apparent that the reference of such nouns includes bounded
substances, periods of time, events, mental states, collections of objects, and abstract social constructs. There are several
hypotheses about what all of these referents have in common, and thus what the core of this semantic notion is. Some
suggestions include boundedness, having a single functional role, and proximity or connectedness of parts (see e.g., Hirsch 1982,
Jackendoff 1991, Langacker 1987, 1990).
Consider, for instance, count nouns that name collections, such as 'forest', 'family' and 'army'. Although they describe material
entities, they violate the generalization that a count noun refers to a kind of object. As such, they are problematic for theories
that posit a privileged link between words (or count nouns) and kinds of whole objects. The premise of the mapping theory
sketched out above is that although there exist semantic constraints on what
can be a possible count noun, these are based on the semantic category 'kind of individual', not 'kind of object'. One could thus
describe the acquisition of words such as 'forest' by assuming that a collection of trees is construable by children as being a
single individual and therefore a word that refers to that kind of individual is learnable as a count noun. But what is it about
forests that makes them construable by children and adults as individuals? Why do children readily construe a group of trees as a
possible individual ('a forest'), and yet do not construe all of the leaves of a tree as a single individual (see Chomsky and Walker
1978)?
One tentative proposal is as follows:
Hypothesis about possible individuals:
Something (e.g., an object or set of objects) can be encoded as an 'individual' if we can construe it as playing an
independent causal role in some conceptual domain.
The intuition underlying this is as follows: We view something as an individual only if doing so allows us to better understand
and predict the causal relationships that hold within a given conceptual domain. The strongest example is the case of bounded
objects; these are the canonical example of 'individuals' within the physical domain and are highly privileged in the course of
development. Infants are predisposed to analyze their chaotic sensory input into a world of distinct bounded objects that persist
over time and space (e.g. Spelke 1988). This mode of interpretation is likely to have evolved because construing the
environment in these terms is the best way to make sense of what is going on, and allows for the most predictive power. Any
primate that lacked this conceptual scheme would fail to respond to the world in a timely and effective manner it would not be
able to track prey and avoid predators, for example and would not survive.
The bias to construe objects as individuals is the result of evolution, not learning, and this may hold as well for individuation
within other domains, such as social cognition or naive theories of mind. For instance, humans and other primates might be
predisposed to classify certain social groups as individuals for the purpose of inference and prediction; as such, notions like
'family' and 'social group' might be innate (Hirschfeld 1987, Jackendoff 1992). But for most domains, we have to discover the
relevant individuals in the course of understanding the nature of the domain. One would have a difficult time understanding
geography or politics, for instance, without the
ability to understand entities like 'Canada' and 'France' as discrete individuals that causally interact with each other.
Returning to names for collections, it is clear that one's conceptual framework also determines the specific conditions under
which a set of objects can be construed as forming a collection. The adult understanding of 'forest' is of a group of trees growing
together in a certain environment for the word to apply, there have to be a sufficient number of trees and they must be bunched
together, but they need not form a precise shape. Nouns such as 'family' have an even looser spatial restriction: they can apply
even if the elements that make up the collection bear no non-trivial spatial relationship at all; it is sensible to say 'that family is
scattered around the world' (compare the semantic oddness of 'that forest is scattered around the world'.) But now consider
Donald Judd's sculpture 'untitled' (1928), composed of 10 Plexiglas pieces mounted vertically on a wall, separated from one
another by exactly the same distance. In this case, the precise configuration does matter; the intuition would be that two rows of
five Plexiglas pieces stacked on a table would be different from 'untitled' and Judd might reasonably view this modification as a
destruction of his artwork.
If the notion of 'individual' can be related to notions of intentionality and social interaction, it follows that, in principle, any set
of objects can be construed as a single individual and thus named with a count noun. There is no count noun in English referring
to a single shoe and a single glove (i.e., such that exactly one shoe and one glove would be 'one fizzbit') but if such a pair of
objects was exactly what one needed in order to participate in some kind of religious ceremony, such a name would probably be
learnable by someone trying to make sense of that ceremony. One intriguing example of this again involves artwork; a proper
name (e.g., 'January Angst') might describe six concrete columns surrounded by broken glass. Although this set of objects is an
individual solely by virtue of the artist's intention, this fact is sufficient for adults (and possibly children) to acquire and
understand this new name.
Current research addresses these issues by exploring the circumstances under which people will give a collective interpretation
for a set of discrete objects. One methodology used is to show adults a set of four identical objects and tell them either 'this is a
fendle' (count syntax) or 'the new word for this is: fendle' (neutral syntax). Thus the novel word could either be a collective
noun, like 'forest', and refer to all four objects, or it could be an object name, like 'tree', and refer to a single object. Then the
adults are shown other displays with the same kind of objects used in the training phrase, such as a display with one object and
a display with eight
objects, and asked to describe these using the new word. Their responses indicate whether they think the word refers to a
collection or to an individual object. For instance, if they interpret 'fendle' as a collective noun, they should describe eight
objects as either 'one fendle' or 'two fendles' (depending on how crucial numerosity is in their collective interpretation) and
should describe one object as 'part of a fendle'. If they interpret it as an object name, they should describe eight objects as 'eight
fendles' and one object as 'one fendle'.
In a pilot study with a group of 36 adults, we tested the effects of syntax (singular count vs. neutral) and 'intentional integrity' on
their interpretation of novel words. This second manipulation went as follows: For half of the trials, the objects were placed in
front of the subject slowly and carefully; for the other half, the objects were casually dumped in front of the subject. The
prediction was that this simple manipulation would have an effect on whether the subjects construe the novel word as a
collective noun. The mere act of purposefully and intentionally setting out the stimuli in a given configuration should be
sufficient to emphasize to the subject that the set itself is relevant as a single individual.
This prediction was confirmed: When the novel word was presented as a singular count noun, there was a bias towards
interpreting it as referring to the entire collection. The bias increased (though not significantly so) when the objects were placed
in front of the subject, rather than dumped. When the novel word was presented without syntactic support, however, there was
an actual switch in the favored interpretation: In the dumping condition, almost all subjects construed the word as a name for a
single object (like 'tree'), but for the placing condition, the collection interpretation (like 'forest') was strongly favored.
Further research will explore whether young children can also acquire a collective noun through this sort of intentional cue, and
will also focus on the precise nature of adults' and children's construal of the new word. The hope is that by studying the
acquisition and understanding of nominals referring to sets of discrete objects including collective nouns and names for artwork
we will gain some insight into the nature and development of the notion 'individual' and how it relates to cognition and
perception.
6.3. How can these mappings apply prior to the acquisition of overt syntax?
The proposal here is that the constraints children use when acquiring words are the result of their understanding of the mappings
from syntactic
categories to categories of cognition. But productive command of count/mass syntax comes in at roughly the age of 2-and-a-half
and children have begun to learn words over a year prior to this. To make matters worse, there exist languages which do not
appear to exploit the count/mass distinction in either syntax or morphology (e.g., Mandarin Chinese), and yet children acquiring
these languages have no difficulty learning words.
Also, many scholars have argued that children use properties of word meaning to determine the syntactic categories that their
very first words belong to (Bloom 1990a, 1994a, under review; Grimshaw 1981, Macnamara 1982, 1986; Pinker 1984), a
proposal that has been dubbed 'semantic bootstrapping'. Thus these syntax-semantics mappings might work in both directions, to
facilitate both lexical and syntactic development. But if this is correct, then very young children must have the capacity to learn
at least some aspects of word meaning in the absence of syntax.
Finally, we know that syntax is not essential for adults. One can learn a word like 'pencil' perfectly well without hearing it used
with count syntax (e.g., without it being preceded by a quantifier like 'a' or 'many'). This proves that overt syntax cannot be
essential for word learning.
Nevertheles, even where there is no specific syntactic information, the existence of the mappings still serves to narrow down the
possible construals of what a word can mean. This is because any novel word must belong to a syntactic category, and as such it
must fall into one of a limited set of semantic classes. For instance, there are no words that refer to chains of thematically-
related entities no natural language could have a word that refers both to dogs and to everything that dogs standardly interact
with and the reason for this could be that there is no syntactic category that encodes such a notion. More generally, there may be
no constraints on word meaning per se; there might only be constraints on possible count nouns, possible mass nouns, possible
intransitive verbs, and so on. But since any word has to be either a count noun or a mass noun or an intransitive verb and so on,
any word is thereby constrained as to what it can mean.
For example, Mappings 2 and 3 limit possible word meanings even before children can distinguish the grammatical markings of
count nouns from those of mass nouns, because once children know that the word is a noun, they know it must be either count
or mass, and the mappings limit its meaning to two semantic classes either it refers to a kind of individual or it refers to a kind
of portion. Of course, the mappings are less effective at this point, since any ambiguities (cases where children cannot tell
whether the adult intends to describe an object or a substance) cannot be resolved through attention to
overt grammatical cues. This may be one explanation for the well-known finding that 3-year-olds show a greater sensitivity to
the constraints than 2-year-olds (e.g., Landau et al. 1988, Markman 1989, Nelson 1988) children can use the constraints to their
fullest only once they have some facility with the grammatical and morphological structure of their language.
The view that these mappings provide constraints on children's word meanings even prior to the acquisition of surface syntax
may be counterintuitive, but there is some support for it. In an extended analysis of the production and comprehension abilities
of children in the one-word stage, Huttenlocher and Smiley (1987: 84) state, 'Taken as a group, the object words in the single-
word period form a broad semantic class which contrasts with other semantic classes emerging at the same time. That is, the
pattern of usage of object words contrasts with that of words for events , words for persons , words for temporary states,
greetings, and negation, and so on'. They go on to suggest that this early demarcation of words into these classes provides a
semantic foundation for the later acquisition of syntactic categories. A more radical interpretation is that these children have
already classified these words into the relevant grammatical classes, and all they have left to do is learn how (or if) their
language expresses these classes in the grammar and morphology. (Does their language mark the contrast between count nouns
and mass nouns? Is the morphology different for verbs than for adjectives?) Once they have acquired these surface expressions
of the linguistic categories, children can use the mappings to further facilitate the acquisition of word meanings.
are exposed to words in the absence of the entities that they describe. An adult, for example, might point to a bowl of cereal and
say 'Do you want milk with that?' even when no milk is present. Consider also words for nonmaterial entities, like 'nap' and
'joke', where the notion of inferring what an adult is pointing to or looking at does not apply. One could also note the success of
blind children at learning words (Landau and Gleitman 1985) in order to appreciate the mystery here. A complete theory must
explain how children grasp the adult's intention to refer how they somehow make the correct guess as to what adults are talking
about when they use novel names (for research along these lines, see Baldwin 1991). More generally, no theory of the
acquisition of words can be complete without a prior theory of how children can pick up the intended reference of language-
users (Macnamara 1982).
Second, even with the aid of grammar-cognition mappings, there is still an infinity of possible meanings of the new word and
children are stuck with sorting them out. The count noun 'dog' could refer to dogs, but it could also refer to dogs and pencils, to
dogs until the year 2000 and then to cats, and so on. Knowing that a given word refers to a kind of individual is only a small
part of the word learning puzzle; children must also determine which kind of individual the word refers to, and it is here that the
induction problem runs deep, particularly given how the notion of 'kind of individual' interacts with conceptual systems such as
social cognition (see section 6.2). Crucially, an explanation of how children learn words involves a theory of psychologically
possible kinds (one that includes 'dogs' and 'tails', but excludes 'dogs and pencils'). In sum, while syntax-semantics mappings
may be part of the solution to how children learn new words, they are not sufficient. Not only does a complete account of the
acquisition of word meaning require an explanation of how people understand the intended reference of others, it also requires a
theory of conceptual representation.
8. Concluding comments
If one were to follow the standard course in the study of word learning, and only consider the acquisition of words like 'dog' and
'cup', it would be hard to empirically distinguish the claim that children possess special word learning constraints from the
alternative that they apply syntax-semantics mappings. Both theories avoid the same hard questions how do children determine
what a new word is meant to describe, and what constitutes a
psychologically possible kind and both can capture the same simple phenomenon; if a child hears a single object described with
a word, he or she will tend to take the word as referring to that kind of object. Within this domain, the advantage of the syntax-
semantics mapping theory is solely theoretical; it only posits aspects of children's psychology (a mapping from count nouns to
kinds of individuals and a bias to view discrete physical objects as individuals) that have independent empirical support and have
been previously proposed in adults for reasons that have nothing to do with lexical acquisition. This is preferable to having to
posit special unlearned constraints that exist solely to help children acquire words and which have no other motivation or
support.
The empirical differences between the two theories become more obvious when we consider words that do not describe whole
objects. The constraints advanced by Markman (1990), Soja et al. (1991), and Landau et al. (1988) do not apply to words like
'Fred', 'She', 'map', 'foot', and 'forest' and these sorts of words are present in the speech of 1- and 2-year-old children. By
shifting the focus to mappings between grammar and abstract cognition, we have a framework in which to deal with the
acquisition of pronouns and proper names, words for substances, words for material entities that are not whole objects (like parts
and collections), and words for abstract entities.
With the notable exception of research on the development of verb meaning (e.g., Gleitman 1990, Pinker 1989), most scholars
have viewed word learning as an independent issue from the nature and development of grammatical knowledge. It is also often
assumed that the theoretically interesting cases of word learning are limited to the acquisition of words for material entities,
usually names for whole objects. This article has presented reasons for abandoning both of these assumptions, and for exploring
how mappings between syntactic categories like 'count noun' and abstract semantic categories like 'kind of individual' facilitate
the acquisition of word meaning.
References
Au, T.K. and M. Glusman, 1990. The principle of mutual exclusivity in word learning: To honor or not to honor? Child
Development 61, 14741490.
Au, T.K. and E.M. Markman, 1987. Acquiring word meanings via linguistic contrast. Cognitive Development 2, 217236.
Bach, E., 1986. The algebra of events. Linguistics and Philosophy 9, 516.
Baldwin, D.A., 1989. Priorities in children's expectations about object label reference: Form over color. Child Development 60,
12911306.
Baldwin, D.A., 1991. Infants' contribution to the achievement of joint reference. Child Development 62, 875890.
Barwise, J. and R. Cooper, 1981. Generalized quantifiers and natural language. Linguistics and Philosophy 4, 159219.
Becker, A.H. and T.B. Ward, 1991. Children's use of shape in extending new labels to animate objects: Identity versus postural
change. Cognitive Development 6, 316.
Benedict, H., 1979. Early lexical development: Comprehension and production. Journal of Child Language 6, 183200.
Bloom, P., 1990a. Semantic structure and language development. Unpublished doctoral dissertation, MIT.
Bloom, P., 1990b. Syntactic distinctions in child language. Journal of Child Language 17, 343355.
Bloom, P., 1994a. Semantic competence as an explanation for some transitions in language development. In: Y. Levy (ed.)
Other children, other languages: Theoretical issues in language development, 4175. Hillsdale, NJ: Erlbaum.
Bloom, P., 1994b. Recent controversies in the study of language acquisition. In: M.A. Gernsbacher (ed.), Handbook of
psycholinguistics. San Diego, CA: Academic Press.
Bloom, P., under review. Meaning-form mappings and their role in the acquisition of nominals. Language.
Bloom, P. and D. Keleman, under review. Syntactic cues and the acquisition of collective nouns. Cognition.
Bloomfield, L., 1993. Language. New York: Holt.
Bolinger, D., 1973. Ambient it is meaningful too. Journal of Linguistics 9, 261270.
Brown, R., 1957. Linguistic determinism and the part of speech. Journal of Abnormal and Social Psychology 55, 15.
Callanan, M.A. and E.M. Markman, 1982. Principles of organization in young children's natural language hierarchies. Child
Development 53, 10931101.
Carey, S., 1978. The child as word learner. In: M. Halle, J. Bresnan, A. Miller (eds.), Linguistic theory and psychological
reality, 264293. Cambridge, MA: MIT Press.
Carey, S., 1982. Semantic development: The state of the art. In: E. Wanner, L.R. Gleitman (eds.), Language acquisition: The
state of the art, 347389. New York: Cambridge University Press.
Carey, S., 1986. Conceptual change in childhood. Cambridge, MA: MIT Press.
Carey, S., 1988. Conceptual differences between children and adults. Mind and Language 3, 167181.
Carlson., G., 1977. Reference to kinds in English. Unpublished doctoral dissertation, Department of Linguistics. University of
Massachusetts, Amherst.
Chomsky, N., 1981. Lectures on government and binding. Foris: Dordrecht.
Chomsky, N. and E. Walker, 1978. The linguistic and psycholinguistic background. In: E. Walker (ed.), Explorations in the
biology of language, 1526. Cambridge, MA: MIT Press.
Clark, E.V., 1973. What's in a word? On the child's acquisition of semantics in his first language. In: T.E. Moore (ed.),
Cognitive development and the acquisition of language, 65110. New York: Academic.
Clark, E.V., 1987. The principle of contrast: A constraint on language acquisition. In: B. MacWhinney (ed.), Mechanisms of
language acquisition, 133. Hillsdale, NJ: Erlbaum.
Clark, E.V., 1990. On the pragmatics of contrast. Journal of Child Language 17, 417432.
Di Sciullo, A. and E. Williams, 1987. On the definition of word. Cambridge, MA: MIT Press.
Dickinson, D.K., 1988. Learning names for materials: Factors constraining and limiting hypotheses about word meaning.
Cognitive Development 3, 1535.
Fodor, J., 1975. The language of thought. New York: Crowell.
Gathercole, V.C., 1985. 'He has too many hard questions': The acquisition of the linguistic masscount distinction in much and
many. Journal of Child Language 12, 395415.
Gathercole, V.C., 1986. Evaluating competing theories with child language data: The case of the count-mass distinction.
Linguistics and Philosophy, 6, 151190.
Gathercole, V.C., 1987. The contrastive hypothesis for the acquisition of word meaning: A reconsideration of the theory.
Journal of Child Language 14, 493531.
Gelman, S.A. and M. Taylor, 1984. How two-year-old children interpret proper and common names for unfamiliar objects.
Child Development 55, 15351540.
Gentner, D., 1982. Why nouns are learned before verbs: Linguistic relativity versus natural partitioning. In: S.A. Kuczaj (ed.),
Language development, Vol. II: Language, thought, and culture, 301334. Hillsdale, NJ: Erlbaum.
Gleitman, L.R., 1990. The structural sources of word meaning. Language Acquisition 1, 355.
Goodman, N., 1983. Fact, fiction, and forecast. Cambridge, MA: Harvard University Press.
Gopnik, A. and S. Choi, 1990. Do linguistic differences lead to cognitive differences? A cross-linguistic study of semantic and
cognitive development. First Language 10, 199215.
Gordon, P., 1982. The acquisition of syntactic categories: The case of the count/mass distinction. Unpublished doctoral
dissertation, MIT.
Gordon, P., 1985. Evaluating the semantic categories hypothesis: The case of the count/mass distinction. Cognition 20, 209242.
Gordon, P., 1988. Count/mass category acquisition: Distributional distinctions in children's speech. Journal of Child Language
15, 109128.
Grimshaw, J., 1981. Form, function, and the language acquisition device. In: C.L. Baker, J. McCarthy (eds.), The logical
problem of language acquisition, 183210. Cambridge, MA: MIT Press.
Hirsch, E., 1982. The concept of identity. New York, NY: Oxford University Press.
Huttenlocher, J. and P. Smiley, 1987. Early word meanings: The case of object names. Cognitive Psychology 19, 6389.
Jackendoff, R., 1983. Semantics and cognition. Cambridge, MA: MIT Press.
Jackendoff, R., 1990. Semantic structures. Cambridge, MA: MIT Press.
Jackendoff, R., 1991. Parts and boundaries. Cognition 41, 945.
Jackendoff, R., 1992. Is there a faculty of social cognition? In: R. Jackendoff (ed.), Languages of the mind: Essays on mental
representation, 6981. Cambridge, MA: MIT Press.
Karmiloff-Smith, A., 1979. A functional approach to language acquisition. New York: Cambridge University Press.
Katz, N. E. Baker and J. Macnamara, 1974. What's in a name? A study of how children learn common and proper names. Child
Development 45, 469473.
Keil, F.C., 1989. Concepts, kinds, and cognitive development. Cambridge, MA: MIT Press.
Lakoff, G., 1987. Women, fire, and dangerous things: What categories reveal about the mind. Chicago, IL: Chicago University
Press.
Landau, B. and L. R. Gleitman, 1985. Language and experience. Cambridge, MA: Harvard University Press.
Landau, B. and R. Jackendoff, 1993. 'What' and 'where' in spatial language and spatial cognition. Behavioral and Brain Sciences
16, 217238.
Landau, B, S. Jones and L. B. Smith, 1992. Perception, ontology, and naming in young children: Commentary on Soja, Carey,
and Spelke. Cognition 43, 8591.
Landau, B., L.B. Smith and S. Jones, 1988. The importance of shape in early lexical learning. Cognitive Development 3,
299321.
Langacker, R.W., 1987. Nouns and verbs. Language 63, 5394.
Langacker, R.W., 1990. Foundation of cognitive grammar, Volume II: Descriptive application. Unpublished manuscript, UCSD.
Levy, Y., 1988. On the early learning of grammatical systems: Evidence from studies of the acquisition of gender and
countability. Journal of Child Language 15, 179186.
Macario, J., 1991. Young children's use of color in classification: Foods and canonically colored objects. Cognitive
Development 6, 1746.
Macnamara, J., 1982. Names for things: A study of human learning. Cambridge, MA: MIT Press.
Macnamara, J., 1986. A border dispute: The place of logic in psychology. Cambridge, MA: MIT Press.
Macnamara, J. and G. Reyes, 1990. The learning of proper names and count nouns: Foundational and empirical issues.
Unpublished manuscript.
Markman, E.M., 1985. Why superordinate categories can be mass nouns. Cognition 19, 3153.
Markman, E.M., 1989. Categorization and naming in children: Problems of induction. Cambridge, MA: MIT Press.
Markman, E.M., 1990. Constraints children place on word meanings. Cognitive Science 14, 5777.
Markman, E.M. and J.E. Hutchinson, 1984. Children's sensitivity to constraints in word meaning: Taxonomic versus thematic
relations. Cognitive Psychology 16, 127.
Markman, E.M. and G.F. Wachtel, 1988. Children's use of mutual exclusivity to constrain the meaning of words. Cognitive
Psychology 20, 121157.
McCawley, J., 1975. Lexicography and the count-mass distinction. Proceedings of the first annual conference of the Berkeley
Linguistics Society.
McPherson, L., 1991. 'A little' goes a long way: Evidence for a perceptual basis of learning for the noun categories COUNT and
MASS. Journal of Child Language 18, 315338.
Mervis, C.B., R.M. Golinkoff and J. Bertrand, 1991. Young children learn synonyms: A refutation of mutual exclusivity. Poster
presented at the Biennial Meeting for the Society for Research in Child Development, Seattle, WA, April, 1991.
Mufwene, S., 1984. The count/mass distinction and the English lexicon. 1984 CLS Parasession on Lexical Semantics.
Naigles, L., 1990. Children use syntax to learn verb meanings. Journal of Child Language 17, 357374.
Nelson, K., 1988. Constraints on word meaning? Cognitive Development 3, 221246.
Nelson, K., 1990. Comment on Behrend's 'Constraints and development'. Cognitive Development 5, 331339.
Nelson, K., J. Hampson and L. L. Shaw, 1993. Nouns in early lexicons: Evidence, explanations, and extensions. Journal of Child
Language 20, 6184.
Nishigauchi, T. and T. Roeper, 1987. Deductive parameters and the growth of empty categories. In: T. Roeper, E. Williams
(eds.), Parameter-setting and language acquisition, 91121. Dordrecht: Reidel.
Parsons, T., 1990. Events in the semantics of English. Cambridge, MA: MIT Press.
Pinker, S., 1984. Language learnability and language development. Cambridge, MA: Harvard University Press.
Pinker, S., 1989. Learnability and cognition. Cambridge, MA: MIT Press.
Prasada, S., 1993. Learning names for solid substances: Quantifying solid entities as portions. Cognitive Development 8, 83104.
Premack, D., 1990. Words: What are they, and do animals have them? Cognition 37, 197212.
Quine, W.V.O., 1960. Word and object. Cambridge, MA: MIT Press.
Sapir, E., 1921. Language. New York: Harcourt, Brace.
Schlesinger, I.M., 1988. The origin of relational categories. In: Y. Levy, I.M. Schlesinger, M.D.S. Braine (eds.), Categories of
processes in language acquisition, 121178. Hillsdale, NJ: Erlbaum.
Shipley, E.F. and B. Shepperson, 1990. Countable entities: Developmental changes. Cognition 34, 109136.
Soja, N.N., 1987. Ontological constraints on 2-year-olds' induction of word meanings. Unpublished doctoral dissertation, MIT.
Soja, N.N., 1992. Inferences about the meanings of nouns: The relationship between perception and syntax. Cognitive
Development 7, 2945.
Soja, N.N., S. Carey and E.S. Spelke, 1991. Ontological categories guide young children's inductions of word meaning: Object
terms and substance terms. Cognition 38, 179211.
Soja, N.N., S. Carey and E.S. Spelke, 1992. Perception, ontology, and word meaning. Cognition 45, 101107.
Spelke, E.S., 1988. Where perception ends and thinking begins: The apprehension of objects in infancy. In: A. Yonas (ed.),
Minnesota Symposia on Child Psychology, 197234. Hillsdale, NJ: Erlbaum.
Starkey, P., E.S. Spelke and R. Gelman, 1990. Numerical abstraction by human infants. Cognition 36, 97127.
Taylor, M. and S. Gelman, 1988. Adjectives and nouns: Children's strategies for learning new words. Child Development 59,
411419.
Ware, R., 1979. Some bits and pieces. In: F. Pelletier (ed.), Mass terms: Some philosophical problems, 1529. Dordrecht: Reidel.
Waxman. S., 1990. Linguistic biases and the establishment of conceptual hierarchies: Evidence from preschool children.
Cognitive Development 5, 123150.
Waxman, S. and R. Gelman, 1986. Preschoolers' use of superordinate relations in classifications and language. Cognitive
Development 1, 139156.
Weinreich, U., 1966. Explorations in semantic theory. In: T. Sebeok (ed.), Current trends in linguistics, Vol 3: 395477. Mouton:
The Hague.
Whorf, B., 1956. Language, thought, and reality. Cambridge, MA: MIT Press.
Wierzbicka, A., 1985. Oats and wheat: The fallacy of arbitrariness. In: J. Haiman (ed.), Iconicity in syntax, 311342. Amsterdam:
Benjamins.
Wynn, K., 1990. Children's understanding of counting. Cognition 36, 155193.
Section 5
The case of verbs
1. Introduction
Every standard text in psychology (or in education or linguistics for that matter) asserts that children aged 18 months to 6 years
acquire 5 to 10 new words a day. How do they manage to do so?
We concentrate here on a single aspect of the word-learning problem: Granted that children can hypothesize some appropriate
set of concepts, how do they decide which sound segment corresponds to each such concept? For instance, granted that they can
entertain the concepts 'elephant' and 'give', how do they come to select the sound /elephant/ for elephants and /give/ for giving?
This aspect of acquisition is called the mapping problem for the lexicon.1
Solution of the mapping problem has traditionally been assigned to a word-to-world pairing procedure in which the learner lines
up the utterance of a word with the co-occurring extralinguistic contexts. Thus elephant comes to mean 'elephant' just because it
is standardly uttered by caregivers in the presence of elephants.
Gillette and Gleitman (forthcoming) have begun to document just how well the word-to-world pairing procedure works in
practice for simple nouns. In these manipulations, adult subjects watch a video, five or ten minutes in length, of mothers and
their young children (MLU 2) at play, but with the audio turned off. These lengthy situational segments allow the subjects to
pick up whatever clues are available from the pragmatic concomitants of the speech event. They are told that, at the instant
some particular noun is being uttered by the mother, they will hear a beep, their task being to guess what noun that was. For the
nouns most frequently used in these mother/child interchanges, the subjects are almost at ceiling. Usually, even a single
scene/beep pair is enough for the subject to identify the noun the mother was uttering.
These findings imply two things. The first concerns the input situation itself: Evidently, mothers of very young children usually
say nouns just when the objects that these label are the focus of conversation and are being manipulated by the participants.
This makes their recovery from context easy (see Bruner, 1975, and Slobin, 1975, for prior evidence of this here-and-now
property of maternal speech to children). The second concerns 'natural' interpretations of situational information: The observer
seems efficient at
1 In our notation, /slashes/ represent sound, 'single quotes' the concept, ''double quotes" the utterance, and italics the word
as an abstract object.
guessing the level of specificity at which the speaker is making reference elephants rather than animals or puppets despite the
fact that all of these interpretations fit the observed scenes equally well (pace Quine 1960; for evidence from child word
learning, see Hall 1993; Hall and Waxman 1993).
So far so good: One can learn that the word for 'elephant' is /elephant/ (or /beep/) because it is said in the presence of elephants.
However, when we turn to the acquisition of lexical categories other than the noun, this promising story appears to fall apart.
Subjects cannot correctly guess which verb the mother is saying under the same circumstances observation of the mother/child
scenes without audio other than the beep. Though the subjects do choose as their guesses the most common maternal verbs of
all (e.g., come and put as opposed to arrive and situate), they fail to select the one that the mother was actually uttering at the
sound of the beep. Their success rate is between 0 and 7%, depending on details of the manipulation.
Why is the observed scene so decisive for nouns and so uninformative for verbs? One factor proposed by Gentner (1978, 1982)
has to do with the concepts that these lexical classes standardly encode, namely the difference between object-reference
concepts and relational concepts (see also Nelson 1974). The reference of many nouns can apparently be extracted by appeal to
principles of object perception and pragmatic inference, but even the home-liest verb meanings express relations among such
concepts. Which such relation the speaker has in mind to convey is rarely accessible from observation alone. Moreover, the
nouns are frequently used in deictic-ostensive contexts to young learners: "This is a ball" (Ninio 1980, Bruner 1983), while
verbs are much rarer in such contexts as "This is hopping".
Another important factor is that the verbs are not uttered even to young learners in a tight time-lock with the events (Tomasello
and Kruger 1992, Lederer et al. 1991). Even when the events and verb utterances are relatively close in time, their seriation
differs, a problem we have called interleaving. For instance, consider a scene in which the child is pushing a car, and then upon
request from his mother carries it over to show to his grandmother, who beams. The serial order of events here is push, go,
show, beam. But the mother actually says "Go show Granny what you're doing, she'll think you push the car so well". Little
problem arises for these adult subjects (or, we presume, for children) in getting the gist of the conversation, but the gist is very
far from explicit identification of the verbs. The problem posed for identification is that the number and order of verb utterance
(go, show, do, think, push) do not line up with the event sequence. Notice, as well, that
certain verbs commonly used by the caregivers are so general (do) or so abstract (think) as to be difficult to relate at all to what's
actually going on (in this latter example, beaming).2
All these complexities bear on an extremely robust finding in the language learning literature: Verbs are very rare in the first
spoken (or comprehended) 50 words of child vocabularies, rather most items are nouns with a scattering of social items ("bye-
bye") and spatial prepositions (Goldin-Meadow et al. 1976, Nelson 1974, Bowerman 1976, Dromi 1987). This striking
dominance of nouns (above their type frequency in maternal speech) persists until the third year of life (Gentner 1982).
We will contend here that, owing to the kinds of problems just sketched, verbs must be learned by a procedure that differs from
the early noun-learning procedure that pairs isolated words (or beeps) to their real-world contingencies. According to our
hypothesis, verb learners recruit evidence from the syntactic structure in which new verbs appear, and pair this structural
evidence with the information present in the scene. Thus we postulate a sentence-to-world mapping procedure for verbs rather
than the word-to-world procedure that is satisfactory for explaining first nouns (for earlier statements of this position, see
Landau and Gleitman 1985, Fisher et al. 1991, Gleitman 1990). This would begin to explain the noun-before-verb
developmental findings: It takes time to acquire structural knowledge, and nouns but not verbs can be acquired efficiently in the
absence of such knowledge. Moreover, knowledge of the noun meanings is, as we shall see, a prerequisite to extracting the verb
meanings.
We will present here an experiment that assesses children's use of situational and syntactic evidence for solving the mapping
problem. But before doing so, we want to describe informally the ideas behind the syntax-sensitive learning procedure we have
in mind. Fuller discussion is reserved until the experimental findings have been presented.
2 Of course it is easy to think of nouns that are similarly 'abstract', such as liberty, so relative ease of learning via
extralinguistic observation is not theoretically identifiable with the noun/verb distinction. But it is as a practical matter:
Abstract verbs are common in usage to children (5 of the most frequent verbs in maternal use to children under two years
refer to mental states and acts, want, like, think, know, and see) but all the most frequent nouns in our corpus refer to
visible object classes or names, e.g., Mommy. The more important point is that subjects cannot reconstruct even the
maternal verbs that refer to observable actions (go, eat, catch, etc.) by watching the scene in the presence of evidence (the
beep) of just when they were uttered.
about language learning, it posits that verb learning occurs in the presence of a priorly learned vocabulary of nominal items
(Lenneberg 1967, Gentner 1978, 1982).
The innovation has to do with the way learners are posited to represent the linguistic input that is to be paired with the
extralinguistic input: as a parse tree within which the novel verb occurs. The structured sentence representation can help in
acquiring the novel verb just because it is revealing of the argument-taking aspects of that verb's interpretation. If phrase-
structural knowledge of the exposure language facilitates verb learning, then the developmental priority of nouns begins to be
understandable; and, so does the explosion of verb vocabulary acquisition simultaneous with the appearance of rudimentary
sentences in speech (Lenneberg 1967).
In the experiment that follows we examine an example of this problem and its proposed solution: There are many meaningfully
distinct paired verbs that occur in virtually all and only the same real-world contexts, for example, give and receive, or chase
and flee, lead and follow. When John gives a ball to Mary, Mary receives the ball from John. Movie directors make an art of
distinguishing such notions visually. They can zoom in on the recipient's grateful mien, the giver out of focus or off the screen
completely. Using the word receive rather than give is a linguistic way of making the same distinction. But only for a listener
who understands their meanings. Without a zoom lens, how is a learner to acquire the distinction in the first place?5
If the learner considers the novel verb use within a syntactic structure, and requires an interpretation that is congruent both with
the scene and the
5 For all these perspective-changing verb pairs, distinguishing environmental conditions are not really nonexistent, but are
very rare. For instance, it is reasonable to say The people fled the city but not so reasonable to say The city chased the
people (example from Pinker, personal communication). So in principle one can flee without being chased. The question
is whether these rare dissociating environments play a role in the child's differentiation of these paired verbs. We know
that close to a third of verb uses to young children are in the absence of their referents not about the here-and-now
(Beckwith et al. 1989), as in "Granny is coming to visit next week", which occurs in the absence of visible coming (or
visible Granny). This means that the learning device must be quite tolerant in evaluating scene-to-world conjectures.
/Come/ must be mapped onto 'come' though it is often said when nothing is coming, and often not said when something is
coming. No learning procedure willing to discount the large percentage of scene/usage mismatches for come could treat
the vanishingly rare mismatches for give/receive or chase/flee as anything but noise.
structure, there is a solution to the mapping problem for these verbs. Consider a listener hearing one of these sentences:
(1) Look, biffing!
(2) The rabbit is biffing the ball to the elephant.
(3) The elephant is biffing the ball from the rabbit.
while watching a rabbit give and an elephant receive a ball. As we will show, if the listener has no access to the syntactic
framework, as in (1), she will probably interpret /biff/ as related in meaning to English give. Hearing sentence (2) bolsters this
choice. But a learner who inspects sentence (3) favors receive.
There are two clues to this choice in sentences (2) and (3). First is the to/from distinction, which indicated which entity is source
and which is goal of the moving ball. Second is the placement of rabbit and elephant within the structure, for whatever entity
showed up as the subject of the sentence has been selected, in the utterance, as the one that the sentence is 'about' the entity of
whom the act is predicated. The notional interpretation of /biff/ must be one that still fits the scene observed but casts it in a
different light: If the subject of the predication was "rabbit", then the act was giving; if it was "elephant", then the act was
receiving. In essence, for a listener sensitive to the full sentence, the interpretation of the observed scene will have been affected
by the linguistic observation that accompanies it.6
The difficulty of the mapping problem is not restricted to the perspective-changing verbs that we have just discussed. Consider a
learner observing a scene in which a rabbit pushes a duck, who falls; and hearing one of these three sentences:
6 Presaging later discussion, note that a discovery procedure that implicates semantic deductions from surface structure
must confront the fact that the relation between surface syntax and argument structure, even within a single language, is
complex at best and can be misleading in some cases. Consider the case of get, a near relative of the two verbs (give and
receive) just discussed. Get is subject to two interpretations. When we say "Emmanuel got a book from the library", the
subject (Emmanuel) is certainly the causal agent in the book's moving out of the library. But when we say "Emmanuel
got the flu from Aaron", Aaron was the intended causal agent, assuming that Emmanuel wanted no part of the flu. Thus
surface position of the nominals does not uniformly reflect distinctions in their thematic roles. Moreover, get can appear in
two-argument sentences such as "Emmanuel got the flu'' in which its transfer-of-possession sense is masked (if intended)
and may not be intended in the first place. We will return to these issues.
Hearing (4) should not support the selection of 'push'-like vs. 'fall'-like interpretations. But (5) must be 'push' and (6) must be
'fall'. This time it is the number of noun phrases (in separate argument positions) in the sentences that bears on the interpretation
of the verb item; the intransitive sentence (6) simply will not support the causal property of push.
For this latter pair (as opposed to give/receive), cross-situational observation is available as an alternate route for acquiring the
distinction between them, requiring no attention to syntax: Eventually there will be falling scenes without a pusher, allowing
their disentanglement (Pinker 1984). All the same, in the real case learners may recruit syntactic cues to facilitate the choice.
(3) the elephant is agent of receiving. But there may be a preference between them all the same. In most give/receive scenarios,
the giver seems more volitional and thus is the plausible candidate for the agent role the cause, first-mover, or instigator (Dowty
1991).
Our experiment will examine the joint effects of preferences in event representation (the agency bias) and syntactic deduction.
Sometimes these factors work together to reveal the verb's meaning. For example, causal agent is linked to subject position in
the sentence in all known languages (Clark and Begun 1971, Grimshaw 1981, Bates and MacWhinney 1982, Pinker 1984, Givón
1986, Schlesinger 1988, Dowty 1991). Thus in sentence (5) the listener's event bias will mesh with syntactic deduction. In
contrast, sentence (6) pits the two evidentiary sources against each other, for the preferred causal interpretation ('push') is in this
case incompatible with the intransitive syntax.7
4. Experiment
We have proposed that perspective-changing verb pairs like give/receive and chase/flee (which are legion in the verb lexicon)
pose a special problem for an observational word-mapping scheme and hence offer a useful testing ground for determining
whether learners might be sensitive to other kinds of evidence. Specifically, we asked whether young children show sensitivity
to syntactic structure in disentangling the senses of such pairs of verbs, as well as other pairs which might be learned via cross-
situational observation. To find out, we taught 3- and 4-year-olds novel (nonsense) verbs by using them to describe action
scenes. These scenes depicted single events which could be interpreted in two complementary ways.
The manipulations of interest concerned the linguistic context in which the novel verb was presented. A child whose attention is
directed to a
7 For adults, feed can occur intransitively, e.g., The cattle are feeding. But as predicted, it then is synonymous with eat.
The wary reader will have noticed, as well, that eat often occurs transitively, as in The cattle eat the fodder, but does not
mean that the cattle cause the fodder to eat. In short, eat can drop its object while feed can omit its causative subject.
Then the two verbs share their licensed syntactic environments, as both can be both transitive and intransitive. It is the
positioning of nominals in the structures, as mapped against the scene in view, that reveals the difference in their
argument structures (see Levin and Rappaport, this volume, for a discussion of these verb types). The mapping problem
can no more be solved by attention to syntax alone than by attention to observation alone. It is the joint operation of the
two evidentiary sources that does the work.
relevant scene should be more likely to interpret a novel verb then heard as notionally resembling give if it is presented in
sentence (2) than if it is presented in sentence (3). Symmetrically, taking a novel verb to resemble get or receive should be more
likely upon hearing sentence (3) than upon hearing sentence (2).
To test these predictions, we needed some way of assessing how our subjects interpreted novel verbs. The method used was a
straightforward one: We simply asked the children what they thought the words meant. This method has two main advantages.
The first is its very straightforwardness. When children can provide a paraphrase of a novel word, it is unnecessary to attempt to
infer from some less obviously relevant aspect of their behavior what they consider the novel word to mean. The second
advantage is more central to our question: The essence of the problem we address in this work is that there are pairs of verbs
that will be difficult or impossible to differentiate from observational evidence alone. By our own argument there will be no
event we could show subjects that would isolate giving from receiving or chasing from fleeing. Thus we could not assess what
children learned about the novel verbs by, for example, asking them to choose among pictured events (as in Brown 1957). The
paraphrase method allowed us to examine just those cases that we have argued pose the knottiest problem for verb mapping.
This method has one disadvantage as well, in that two-year-olds learning first verbs cannot be induced to provide glosses or
paraphrases for made-up words, so our youngest subjects are three-year-olds. As we will show, these children are capable of
answering the question "What does biffing mean?" in revealing ways. No doubt can arise as to the relevance of three- and four-
year-olds to the question posed, for the bulk of the basic verb vocabulary is acquired by these age groups.
4.1. Method
Video-taped scenes were shown to preschoolers and to adult controls. Each scene was described by the experimenter with a
sentence that contained a nonsense verb. The subjects' task was to paraphrase the verb.
4.1.1. Subjects
The child subjects were twenty-four 3-year-old children (mean age 3;8, range 3;14;0), and thirty 4-year-old children (mean age
4;8, range 4;35;0). Nine children (five 3-year-olds and four 4-year-olds) were replaced in the
design for failure to respond (see the Procedure section below). Eighteen adults were included to provide a baseline measure of
competent performance in this task. A third of the subjects in each age group were randomly assigned to each of three
introducing context conditions (see Procedure, below).
4.1.2. Stimuli
Six brief motion scenes with puppet actors were video-taped. The scenes were designed to be naturally describable with two
English verbs that differed in their semantic and syntactic properties. One of the sentence contexts that could accompany each
scene was arbitrarily called the 'X' context, and the other was called the 'Y' context. Descriptions of the scenes and these
sentence contexts are shown in table 1.
Table 1
Scenes/sentence pairs
Scene Sentences
1: A rabbit is feeding an X: The elephant is ---ing. (eat); Y: The bunny is ---ing the
elephant with a spoon. elephant. (feed)
2: A rabbit comes up and X: The bunny is ---ing the monkey. (push); Y: The
pushes a monkey off a box. monkey is ---ing. (fall)
3: A rabbit runs across the
screen, followed by a X: The bunny is ---ing the skunk. (flee); Y: The skunk is -
skunk. --ing the bunny. (chase)
4: A monkey is riding X: The monkey is ---ing the bunny. (ride); Y: The bunny
piggy-back on a rabbit. is ---ing the monkey. (carry)
5: An elephant hands a ball X: The elephant is ---ing the ball to the bunny. (give); Y:
to a rabbit. The bunny is ---ing the ball from the elephant. (take)
X: The bunny is ---ing the blanket onto the monkey. (put);
6: A rabbit puts a blanket Y: The bunny is ---ing the monkey with the blanket.
over a monkey. (cover)
For the first two scenes in table 1 (feed/eat and push/fall), the syntax of the two sentences differs in the number of noun phrases,
i.e., transitive feed expresses the causal relationship while intransitive eat does not. For the next two scenes (chase/flee and
carry/ride), the number of noun phrases is equal but the order of the nouns encodes two perspectives on the event and,
consequently, who is the agent.8
8 Notice that ride/carry and chase/flee don't differ syntactically, i.e., in their subcategorization frames; both appear in
simple transitive sentences. Moreover, both members of these pairs have the same thematic-role assignment to syntactic
position: the subject is agent, the object is theme (or patient). What we mean by 'attention to the syntax' in solving the
mapping problem for the members of these pairs is that the observer's conjecture about whether the verb means 'chase' or
'flee' is consequent on noticing which observed entity receives the subject-agent slot. As we have emphasized earlier, this
is one reason why a priorly acquired nominal vocabulary is prerequisite to acquiring verb meanings.
In the last two scenes (give/receive and put/cover), the sentences for the two standard choices differ in the order of NP
arguments in the sentence as well as in the preposition used to mark the indirect object (to vs. from and onto vs. with). These
last two cases can be subdivided into one pair relevant to the choice of agent (give/receive), thus subject (elephant vs. rabbit),
and one pair relevant to the choice of goal and located object (put/cover), thus direct object (blanket vs. monkey). Thus the
stimuli overall can provide some indication of the kinds of syntactic-semantic linkages that young children can recruit for verb
mapping.
Each sentence structure was randomly paired with one of six nonsense syllables for each subject (zike, blick, pilk, dack, moke,
nade). All sentences were presented with the verb in the progressive form (blicking) to maximize intelligibility and pragmatic
felicity as descriptions of ongoing actions.
4.1.3. Procedure
Two experimenters tested each child individually; one showed the videotapes and uttered the stimulus sentences, while the other
recorded the subjects' responses. The sessions were also audio-taped, to allow later checking of the accuracy of the recording
experimenter.
A puppet (Mac) was introduced to the child who was then told "Mac doesn't speak English very well, so sometimes he uses
puppet words. Can you help us figure out what the puppet words mean?". Assent received, the child was then given a practice
trial in which Mac said "Look! The elephant is zorping!" as an experimenter made a hand-held elephant puppet laugh. The child
was then asked, "What does zorping mean?'' and prompted by asking "What is the elephant doing?" This latter prompt was used
only in the practice trial, to help the child understand the task. For the adult subjects, Mac the puppet was omitted for obvious
reasons. They were simply informed that their task would be to guess the meanings of nonsense words.
On each test trial, the subject first heard the stimulus sentence and then saw the video-taped scene. The scene was repeated for
up to one minute as the subject watched. The experimenter repeated the stimulus sentence at least
once while the videotape played. The subject was asked "What does gorping mean?". Additional prompts were "What's
happening?" or ''What's going on?". Subjects were encouraged to guess. Subjects who said nothing during the practice trial and
during the first two experimental trials were dropped from the study. Scenes were presented in two orders, chosen to allow all
subjects to begin with less difficult items (as revealed during pilot testing). The first order was feed/eat, give/receive, chase/flee,
cover/put, carry/ride, push/fall. The second order was its reverse.
The nonsense verbs were presented to the subjects in one of three linguistic contexts:
Neutral syntax ('No sentence') trials. A third of the subjects in each age group heard the nonsense words in the syntactically
uninformative context "Look! Ziking!". These subjects were cued (by the -ing suffix, Brown 1957) that the novel word was a
verb but received no information about its specific syntactic behavior. Even if young children use syntactic evidence in real-life
verb learning, this experimental condition withheld such evidence, allowing us to assess any biases subjects might show in
interpreting the scenes and providing a baseline against which to compare their performance when given syntactic information.
Sentence trials. The remaining two-thirds of the subjects saw the scenes accompanied by one of two introducing sentential
contexts (X or Y in table 1, for each scene). Each subject heard only one of the two sentences for a single scene. One group of
subjects heard the sentences designated 'X' in table 1, and the other group heard those designated 'Y'. We reiterate that the
assignment of sentences to the categories X and Y was entirely arbitrary. Thus, e.g., transitive feed was assigned to the Y
category while transitive push was assigned to the X category. This arbitrary assignment assured that each subject group heard
some sentences of both kinds.
4.1.4. Coding and scoring
Subjects' paraphrases were sorted into three categories:
(1) Response X: Responses that fit sentence X (shown in table 1), both syntactically
and in describing the scene, were coded as X responses. For example, for scene 1,
eat, or a phrasal equivalent ("He's drinking soup"), is congruent with the scene
and with the construal implied by the structure of sentence X. Notice that a
subject might be exposed to a Y sentence (The bunny is gorping the elephant) but
give the X response all the same. If so, the syntax of the input sentence has failed
to influence that response.
(2) Response Y: Responses that fit sentence Y (table 1), both syntactically and in
describing the scene, were coded as Y responses. For example, for scene 1, feed,
or a phrasal equivalent ("He's giving him medicine") would be congruent both
with the scene and with the (causative) construal implied by sentence Y of this
pair.
(3)Other (O): Failures to respond and responses that fit neither sentence X nor Y
were coded as Other. These were almost always relevant to the scene in some
way (e.g., the response "They're playing" for the give/receive scene), but
incongruent with the ditransitive syntax. Also included in this category were
responses that fit both X and Y contexts, and therefore could not demonstrate
sensitivity to the sentence structure.
4.2. Results
To assess the reliability of the coding, all of the responses from the 3-year-old subjects were recorded by an independent coder
who was blind to introducing context. The two coders agreed with each other in 94% of coding decisions. Residual
disagreements were resolved through discussion. The subjects' responses are summarized in table 2. This table shows the
proportion of subjects in each age group who produced each type of response for each scene. We now discuss these findings
under several rubrics:
4.2.1. Breadth of the hypothesis space
The scenarios were designed to be quite simple, containing few distracting properties. Even so, as table 2 shows, child subjects
gave a response that had to be relegated to the 'Other' category in a substantial proportion of the trials (29% across all conditions
for 3-year-olds, 23% for 4-year-olds). This finding constitutes yet one more demonstration of the many-many relations between
scene observation and verb interpretation. In several cases coded 'Other', the subjects explicitly mentioned both of the standard
choices (e.g., giving and getting). These responses were quite common in the 'No sentence' condition (36% of adults' and 20% of
children's total responses) but extremely rare in the sentence contexts (3% of adults' and 5% of children's responses). When a
sentence context was provided, both children (t(52) = 3.64, p 0.001) and adults (t(16) = 5.89, p 0.001) were much less likely to
propose both readings for the novel verb. This is a first indication that the syntactic contexts rein in the many interpretations
made available by scene inspection, focusing subjects' attention on specific aspects of our scenarios.
Table 2 Proportion each response for each scene, by age and introducing context
Scene 'No sentence' Sentence X Sentence Y
X Y Other X Y Other X Y Other
Three-year-olds (n = 24):
eat/feed 0.25 0.38 0.38 0.50 0.13 0.38 0.25 0.75 0.00
push/fall 0.38 0.13 0.50 0.88 0.13 0.00 0.25 0.75 0.00
flee/chase 0.00 0.63 0.38 0.38 0.13 0.50 0.00 0.88 0.13
ride/carry 0.25 0.75 0.00 0.63 0.25 0.13 0.25 0.50 0.25
give/take 0.50 0.25 0.25 0.50 0.00 0.50 0.25 0.25 0.50
put/cover 0.25 0.13 0.63 0.38 0.25 0.38 0.38 0.25 0.38
Mean 0.27 0.38 0.35 0.54 0.15 0.31 0.23 0.56 0.21
Four-year-olds (n = 30):
eat/feed 0.10 0.70 0.20 0.60 0.20 0.20 0.10 0.80 0.10
push/fall 0.70 0.00 0.30 1.00 0.00 0.00 0.50 0.40 0.10
flee/chase 0.00 0.70 0.30 0.40 0.10 0.50 0.00 0.90 0.10
ride/carry 0.00 0.80 0.20 0.40 0.20 0.40 0.20 0.70 0.10
give/take 0.40 0.10 0.50 0.50 0.00 0.50 0.30 0.60 0.10
put/cover 0.20 0.20 0.60 0.40 0.20 0.40 0.20 0.70 0.10
Mean 0.23 0.42 0.35 0.55 0.12 0.33 0.22 0.68 0.10
Adults (n = 18):
eat/feed 0.00 0.17 0.83 1.00 0.00 0.00 0.00 1.00 0.00
push/fall 0.50 0.00 0.50 1.00 0.00 0.00 0.00 1.00 0.00
flee/chase 0.00 0.83 0.17 1.00 0.00 0.00 0.00 1.00 0.00
ride/carry 0.17 0.33 0.50 0.83 0.17 0.00 0.00 1.00 0.00
give/take 0.50 0.00 0.50 1.00 0.00 0.00 0.00 1.00 0.00
put/cover 0.00 0.83 0.17 0.50 0.17 0.33 0.00 1.00 0.00
Mean 0.19 0.36 0.44 0.89 0.06 0.06 0.00 1.00 0.00
0,001).9 The effect of syntax on the interpretation of these nonsense words emerges in the same way for each scene with one
exception (the 3-year-old subjects paraphrased a nonsense verb describing put/cover consistent with put syntax more often than
cover syntax in all three introducing contexts). Overall, the result is that syntactic context had a powerful effect on subjects'
construal of the nonsense verb.
The children's behavior is epitomized most poignantly in several contrasting responses to the scenes in the presence of syntactic
context. Consider the carry/ride scene. Hearing the sentence that treats the carrier (rabbit) as subject, a child responds "He's
holding him on his back". But hearing the sentence that treats the rider (monkey) as subject, another child responds "He's sitting
on him's [sic] back". Clearly these children watch the scene and interpret what they see to derive the verb meaning: They induce
the meaning from inspection of its real-world accompaniments. Perhaps unlike the adults, they do not seem to be aware of a
linguistic puzzle. All the same, the sentence heard exerts a strong albeit implicit influence on just what they think the scene
depicts: whether it is 'about' the one who holds/carries or the one who sits/rides. This is the sense in which sentence-to-world
pairing can sharply limit the search-space for verb identification.
4.2.3. Semantic biases in the interpretation of verbs
In addition to constituting a baseline for comparison to the sentence trials in our task, the 'No sentence' condition provides a
chance to look for semantic biases in verb interpretation. After all, in observing the chase/flee scene while hearing "Look!
Ziking!", subjects are not really warranted in preferring one of these interpretations over the other. The scene fits them both. Our
initial hypothesis was that subjects would therefore choose chase or flee ("run away") more or less at random in this condition,
hence our original arbitrary division of the sentences (and their responses) into the 'X' and 'Y' categories. But our subjects were
anything but open-minded in their guesses, as table 2 shows. All age groups had a preferred response in the 'No sentence'
condition for 5 of the 6 scenes. They tended to
9 Preliminary analyses indicated that there was no effect of order (which of the stimuli the subjects saw first) on the
probability of sentence-congruent responses, so the two order groups were combined in this and all further analyses. The
effect of syntax on verb paraphrases was shown in a series of planned 1-tailed t-tests. In each case the dependent variable
was the proportion (arcsine transformed) of each response (X or Y), examined across sentence contexts. Separate analyses
for the 3- and 4-year-olds yielded the same results in each case; to simplify presentation of the results, the two groups of
children are pooled in the analyses presented here.
describe scene 1 as one of feeding rather than eating, scene 2 as pushing rather than falling, scene 3 as chasing, not fleeing or
running away, scene 4 as carrying rather than riding, and scene 5 as giving rather than taking or receiving. The adults showed
these biases even more regularly than the children.
In each of these cases, subjects evidently selected a 'more causal' or agentive participant in the scene, and took the verb to code
the actions of that participant, who thus became sentential subject. Scene 6 (put/cover) leaves the agent choice unaffected and,
interestingly enough, it is in this case only that children show no preference of choice in the 'No sentence' condition. Clearly,
this unanticipated agency factor was strongly affecting our subjects' responses.
To study this semantic bias, we now recoded the X and Y sentences according to whether they matched or mismatched this
agency bias. For feed/eat and push/fall, no difficulty in doing so arises: Only the feed and push sentences mention the causal
agent so these interpretations should be favored if there is a bias to conceive scenes as an agent acting on a thing affected. For
chase/flee and give/receive, this agentactpatient interpretation applies to both members of the pairs. For these, we relied on
Dowty's (1991) 'protoagent' classification scheme in which several factors are postulated to lead to the choice of plausible agent:
animacy (which does not distinguish for our stimuli), activity, and instigator of the action. Thus necessarily it is chasing that
precedes and causes fleeing, and giving that precedes and causes receiving.
For the case of carry/ride, no such principled distinction in the verb meanings themselves exists for choosing the plausible
agent. However, in our particular depiction of this act, the carrier (the rabbit) was actively running across the screen with an
inert monkey sitting on his back. Apparently this distinction of relative activity vs. passiveness led subjects to choose the rabbit
as instigator, thus plausible agent. (Had our scene instead shown, say, a child riding a mechanical bull, doubtless this choice
would have been reversed for this verb pair.)
Table 3 reorganizes the findings according to this distinction between Agentive (A) versus NonAgentive and 'less plausible
agent' (NA) responses.10 The table shows the proportion of subjects in each age group and introducing
10 Because the agency effect was unanticipated, the experiment was not balanced such that each subject would hear an
equal number of A and NA sentences. So that this difference will not contaminate the statistical assessments here, all
comparisons between the A and NA contexts are within-subjects. All other comparisons are between subjects, as before.
context who produced A (e.g., feed, carry) and NA (e.g., eat, ride) responses, as defined above. These values are shown only for
the five relevant scenarios (that is, excluding the put/cover scene which cast the same participant as subject in both context
sentences). Across all presentation conditions and all age groups, A responses (187) outnumber NA responses (91) two to one.
There is an agency bias.
Table 3 Proportiona agentive (A) and non-agentive (NA) responses for each biased scene, by age and introducing context
Scene No sentence Agentive Non-agentive
A NA A NA A NA A NA
Three-year-olds (n=24):
feed eat 0.38 0.25 0.75 0.25 0.13 0.50
push fall 0.38 0.13 0.88 0.13 0.25 0.75
chase flee 0.63 0.00 0.88 0.00 0.13 0.38
carry ride 0.75 0.25 0.50 0.25 0.25 0.63
give take 0.50 0.25 0.50 0.00 0.25 0.25
Means: 0.53 0.18 0.70 0.13 0.20 0.50
Four-year-olds (n=30):
feed eat 0.70 0.10 0.80 0.10 0.20 0.60
push fall 0.70 0.00 1 0.00 0.50 0.40
chase flee 0.70 0.00 0.90 0.00 0.10 0.40
carry ride 0.80 0.00 0.70 0.20 0.20 0.40
give take 0.40 0.10 0.50 0.00 0.30 0.60
Means: 0.66 0.04 0.78 0.06 0.26 0.48
Adults (n=18):
feed eat 0.17 0.00 1.00 0.00 0.00 1.00
push fall 0.50 0.00 1.00 0.00 0.00 1.00
chase flee 0.83 0.00 1.00 0.00 0.00 1.00
carry ride 0.33 0.17 1.00 0.00 0.17 0.83
give take 0.50 0.00 1.00 0.00 0.00 1.00
Means: 0.47 0.03 1.00 0.00 0.03 0.97
a Since 'Other' responses are left out of this table, proportions within each scene/context cell may not sum to 1.
Not surprisingly, this effect of the natural interpretation of the scenes is most powerful where it is not contaminated by
mismatching syntactic information.
Thus the preference for agentive responses appears most strongly in the 'No sentence' condition in table 3. This effect was
assessed in post-hoc (Bonferroni adjusted) matched-pairs t-tests, which revealed that A choices were more frequent than NA
ones in the 'No sentence' context for both children (t(17) = 6.18, p 0.001) and adults (t(5) = 5.93, p 0.01).
4.2.4. The interaction of syntax and semantics
So far we have shown (table 2) that structure affects Ss' guesses when it is made available, in the X and Y (sentential)
conditions of presentation; and that the agency bias (for which responses were recoded as A and NA) affects Ss' guesses heavily
in the absence of syntactic information (the 'No sentence' condition of table 3). The question remains how these factors interact
when (as in the real world of maternal speech) the new word is presented in a full sentence context.
To find out, we now look more closely at the finding (table 2) that, while the effects of syntactic context were very strong and
reliable, they were nowhere near categorical for the child subjects (though they were nearly so for the adults). The reason is that
while the syntactic introducing circumstances supported the agency bias in some cases (as when the subject heard "The rabbit is
biffing the elephant" while watching a feed/eat scene), the syntactic and semantic cues pulled in opposing directions in other
cases (as when the subject heard "The elephant is biffing" while watching this same scene). To repeat, subjects tend to view this
scene as a feeding scene (the semantic-interpretive influence), but to encode the intransitive sentence structure as favoring eat
(the syntactic influence).
The joint action of these two variables can be seen in table 4, which repeats the data of table 3, but summarizing across the five
relevant scenarios. Thus table 4 shows the proportion of subjects in each group and introducing context who produced A and
NA responses. Column 1 of this table shows the bias toward agentive responses which we have just documented. Column 2
shows a slight enhancement of this preference (slight, because the preference in this direction was already strong) when it is
supported by the syntactic evidence; that is, in agentive syntactic contexts. Column 3 shows that the agency bias is heavily
mitigated by mismatching syntactic evidence: When the syntax demands the NA interpretation, the A response is actually
dispreferred, no longer the modal response for any age group. That is, we see the same effect of sentence context on verb
interpretation when the normally disfavored non-agentive responses are examined separately: These responses are more frequent
in the non-agentive than the agentive sentence context (children:
t(35) = 7.13, adults: t(11) = 29.63, p 0.001). Both children and adults made use of syntactic evidence to override what seems to
be a strong semantic or observational preference in verb interpretation.
Table 4 Proportiona agentive (A) and non-agentive (NA) responses, by age and introducing context
Age group Introducing context
No sentence Agentive Non-agentive
A NA A NA A NA
Threes 0.53 0.18 0.70 0.13 0.20 0.50
Fours 0.66 0.04 0.78 0.06 0.26 0.48
Adults 0.47 0.03 1.00 0.00 0.03 0.97
Overall 0.57 0.08 0.81 0.07 0.18 0.61
a Since 'Other' responses are left out of this table, proportions within each scene/context cell may not sum to 1.
4.2.5. The effect of age
Inspection of table 4 also shows that there is an age effect on these response patterns, with adult responses almost categorical
and child responses probabilistic with respect to the variables under investigation. The effect of age on the likelihood of
producing frame-congruent responses in the two sentence contexts was shown in an ANOVA with age group as a between-
subjects effect, and sentence context (A versus NA) as a within-subjects factor. The main effect of age was significant (F(2,45)
= 17.36, p 0.001): adults were more likely than children to produce a response that fit the frame, but the two groups of children
did not differ significantly from each other. There was also a significant main effect of sentence context (F(1,45) = 5.33, p 0.05):
Frame-congruent responses were more likely in the A than the NA context for all age groups, the interaction of situation and
syntax that we discussed earlier. However, the effects of age and sentence context did not interact (F(2,45) 1). Thus the effect of
age is a simple one adult performance was more stable with regard to both the syntactic and salience variables, but all groups
took syntactic information into account when it conflicted with the agency bias as well as when it did not.
meanings to new words. They do not expect there to be exact synonyms. (For this effect with nouns, see e.g., Clark 1987, and
Markman, this volume; and for the analogous effects with verbs, Golinkoff et al., in prep.; Kako, in prep.)
We now ask whether the same pattern of syntactic effects on verb interpretation occurs for the phrasal responses taken alone. If
so, it is unlikely that the children were merely 'filling in the blank' in the stimulus sentences with a known verb that fit, but were
engaging in something more like the syntax-guided mapping procedure proposed here.
Table 5 shows the obtained response patterns for the phrasal responses, omitting all single-verb glosses. Three 3-year-olds and
one 4-year-old contributed no data to this analysis. Given the reduction in the data entailed by recoding the responses in this
way, the 3- and 4-year-olds' data were combined. Inspection of table 5 shows that the same pattern of structural effects on verb
interpretation remains. This pattern was reliable in independent t-tests on the (arcsine transformed) proportion of X and Y
responses. Response X was more likely in the context of sentence X than sentence Y (t(30) = 2.94, p 0.01) and the 'No sentence'
context (t(32) = 4.11, p 0.001). Similarly, response Y was more frequent given sentence Y than either of the other two
introducing contexts (sentence X: t(30) = 6.59, p 0.001; no sentence: t(32) = 3.61, p
0.001). In sum, even when the children evidently did not interpret the nonsense words as puppet synonyms for verbs in their
known lexicon, there are strong and reliable effects of sentence structure on construal.
5. Discussion
We now outline a verb mapping procedure that comports with the findings just described, and with related effects in the
language acquisition literature.
5.1. Extraction of linguistic formatives
A vexed problem in understanding language learning concerns how the child finds such units as word and phrase in the
continuously varying sound stream: how Henny-Penny's listeners interpreted her as saying "The sky is falling" rather than "This
guy is falling". A vast literature now supports the
view that these segmentation decisions are made by infants based on prosodic and distributional cues in caretaker speech (for
recent evidence and discussion, see Gleitman et al. 1988, Gerken et al. 1993, Fisher and Tokura, in press; Kelly, this volume;
Cutler, this volume; Brent, this volume).11 In what follows, we presuppose these approaches to solving the segmentation
problem, concentrating attention on acquisition of the phrase structure (which requires labeling as well as segmentation of
phrases) and the word meanings.
5.2. Word-to-world pairing and the acquisition of first nouns
We take as given that the human learner expects sentences to convey predicate/argument structure as organized by a phrase-
structure grammar that conforms to X-bar principles. However, universal grammar leaves open some parameters of the phrase
structure of the exposure language; these must be set by experience. Before this learning occurs, the novice can recruit only the
exigencies of word use the pairing of words to their extralinguistic contexts to solve the mapping problem.
11 For evidence on infant attention to clause bounding cues see Hirsh-Pasek et al. 1987; for phrase-bounding cues, see
Jusczyk et al. 1992; and for word-bounding cues see Grosjean and Gee 1987, Kelly, this volume. There is also a literature
on adult speech production and perception investigating the physical bases of these prosody-syntax mappings, too vast for
us to cite here (but for seminal articles, see Klatt 1975, Cooper and Paccia-Cooper 1980, Lehiste et al. 1976, Cutler, this
volume) and the availability of such cues in infant-directed speech (Fernald and Simon 1984, Fisher and Tokura, in press;
Lederer and Kelly 1991).
By default, the effect should be that youngest children can acquire only object terms (nouns, in the adult language). This is
because, as we described in introductory remarks, only the nouns occur in maternal speech in a tight time-lock with the
situational contexts, and in ostensive contexts. And indeed one of the most striking findings in the language-learning literature is
that first words are nouns despite the fact that from the beginning the learner is exposed to words from every lexical category.
We assume that these first words are assigned to the formal category noun on a semantic basis, as conjectured by Grimshaw
(1981) and Pinker (1984).
5.3. Setting the phrase structure, and first verbs
Our findings suggest that verb learning implicates a sentence-to-world pairing procedure and cannot in general be accomplished,
as Pinker (1984) and others have advocated (see footnote 3), by pairing the isolated verb to its observational contingencies. But
then how much grammatical knowledge is required as input to verb learning? And how might children acquire this?
We propose that the meanings of the first verbs, as well as relevant components of the phrase structure, are acquired by
bootstrapping from a partial sentential representation (henceforth, PSR) that becomes available once some nouns have been
learned: This consists of the known nouns and the unknown verb, as sequenced in the input sentence, e.g.
By hypothesis, it is this richer-than-the-word, poorer-than-the-phrase-structure, representation that learners past the one-word
(noun) stage first attempt to pair with the scene in view.
There is some evidence that the PSR can aid in verb identification in two ways. First, the identity of the nouns can provide
information about the selectional properties of the verb. Lederer et al. (1991) showed that adults can identify about 28% of the
verbs that mothers are uttering if, in addition to the scene information, they are also told which nouns occurred with the verb in
the maternal utterances. This level of performance is not great, but is a significant improvement over the 7% success rate that
subjects achieve if shown only the video-taped scene. It is easy to see why having the nouns is so helpful: If you are told that
baby and cookie
occurred in construction with the mystery verb, eat becames a plausible conjecture just because verbs that mean 'eat' should
select for edibles.12
The PSR yields a second advantage for verb mapping, provided that the learner also has implicit access to the Projection
Principle (roughly, that every argument position required by the verb will be reflected as a noun phrase in the surface sentence;
Chomsky 1981). So armed, the learner can make a secure conjecture as between an intended unary relation (such as fall) vs. a
binary relation (such as push), simply by counting the number of noun phrases in the sentence.13
Early use of this machinery is suggested by findings from Naigles (1990), with babies 2325 months of age. They were shown a
video-taped scene in which (a) a duck who by pushing on a rabbit's head forces the latter into a squatting position whilst (b)
both the duck and the rabbit wheel their free arms in a circle. Half the subjects were introduced to the scene with the sentence
"The duck is gorping the rabbit" and the other half heard "The duck and the rabbit are gorping". Thereafter, two new videos
were shown, one to the child's left and one to her right, along with the prompt "Find gorping now!". One of the new videos
showed the duck forcing the rabbit to squat (but no arm-wheeling) and the other showed the two side by side wheeling their
arms (but no forcing-to-squat). The children who had been introduced to gorping within the transitive sentence now gazed
longest at the causal scene while those who had heard the intransitive sentence looked longer at the noncausal scene. Here, as in
the push/fall and feed/eat scenes investigated in the present experiment, the one-argument structure did not substain a causal
interpretation despite the agency bias in event representation.
A related point is made by Fisher (1993), who showed unfamiliar agentpatient events to children aged three and five years but
with the entities named only by pronouns. For instance, they saw one person causing another to rotate on a swiveling stool by
alternately pulling on the ends of a scarf around the victim's waist. Half the subjects heard "She's blicking her around" and the
other half heard "She's blicking around". The child's task was to point out, in a still photograph of the event, the one whose
action was
12 We must acknowledge, however, that the Gleitman-Landau archive of maternal speech includes many examples like
"Don't eat that paper!", "We don't eat the book, Bonnie".
13 Of course this can only work for simple sentences for, e.g., "The rabbit in the grass hopped away" contains two NPs
within an argument position. But sentences to novices are characteristically short (approximately 5 words long, on
average; Newport 1977) and so rarely embody this problem.
labelled by the novel verb ("Point to the one who's blicking the other one around" or "Point to the one who's blicking around").
The intent here was to put the children into the position of much younger learners who have access only to the PSR: They knew
how many arguments were supplied to the new verb, but not which was which. Those who had heard the transitive frame
confidently chose the causal agent as the blicker, while those shown the intransitive frame were willing to select the patient as
blicker. Thus without being told which event participant has been cast as sentence subject, preschoolers interpreted a one-
argument structure as incompatible with a causal interpretation.
The PSR taken together with the scene observed will allow learners to acquire a crucial aspect of the phrase structure itself.
Suppose a child hears "kick" for the first time in the frame "The bunny kicked the monkey". The agency bias, as constrained by
the minimal structure given by the PSR (namely, a 2-argument structure), will lead the learner to seek an agentpatient
interpretation of the scene. Provided that she knows the nouns bunny and monkey, she can annotate the phrase structure as
shown in figure 1, marking bunny, the first noun in the structure, as the agent (Joshi and Rambow, in prep.). This representation
now matches two of the quasi-universal properties of the category subject of transitive sentence there is one noun which is both
the agent and the leftmost noun in a transitive structure.14 There is a strong tendency for languages to place subjects before
objects (Keenan 1976, Kayne 1992). Further phrase-structure options can then be set based on this initial assignment.
Fig. 1.
14 Notice that this claim differs from the subject-agent link which has been invoked in the literature to support an initially
asyntactic verb-learning procedure (e.g., Grimshaw 1981, Braine and Hardy 1982, Pinker 1984, and others). In fact,
subjects are often patients or experiencers. It is the transitive sentence whose agent just about universally surfaces as
subject. To recognize this distinction requires, at minimum, PSR knowledge that will reveal the number of argument
positions.
Notice finally that if there are languages in which objects precede subjects, the assignment of subject of the sentence (the NP
immediately dominated by S) to the serially second NP is still possible, based on the PSR. In that case, the learner would have
heard "Kicked the monkey the bunny" in the presence of a bunny-kicking-monkey scene. The observed scene identifies bunny
as agent (hence subject of the transitive verb), consistent only with template (b) of figure 2.
language. Once the full phrase structure is acquired, the learner can approach the perspective-changing verbs that we have
studied. Disentanglement of the members of these pairs requires more than counting NP positions (which are the same for both
interpretations) and fitting these to the logic of the observed situation (which suits either choice).
Once the phrase structure has been bootstrapped from PSR, the learner can make this decision by inspecting the geometry of the
tree to determine which noun is sentence subject. If the plausible agent appears as subject with the give/receive scene, then the
situational and syntactic cues converge on give. But if the plausible agent appears in nonsubject position then it is not the agent,
despite appearances. In the present experiment, we showed child responsiveness to these implications of structure for verb
interpretation even in cases where they had to overcome a semantic bias in event interpretation to use it.
5.5. The informativeness of multiple frames
In principle, attention to the licensed range of syntactic environments for a verb can provide converging evidence about its
interpretation, just because these several environments are projections from the range of argument structures associated with that
verb. This feature of syntactic bootstrapping is controversial (as opposed to the 'zoom lens' notion which appears to have gained
wide currency). So after describing the potential usefulness of frame-range information for solving the mapping problem, we
will discuss available experimental evidence in its favor. Specifically, we will discuss the experimental manipulations deemed
critical by some skeptics (particularly, Pinker, this volume) for confirming the hypothesis.
5.5.1. The resolving power of frame ranges for verb mapping
In very many cases, a surface-structure/situation pair is insufficient or even misleading about a verb's interpretation. One such
case is the eat example that we mentioned earlier. The phrase structure is the same when the adult says "Did you eat your
cookie?" as when he says "Do you want the cookie?", and the two verbs are used by caretakers in situations where their
interpretations can easily be mistaken. Subjects always come up with an action term that fits the observed scene and the
structure, and guess eat instead of want. In response to the fact that the next scene observation does not support the eat
conjecture (it may show, say, the mother offering a toy rather than a
cookie to the child), subjects now come up with yet another physical term (e.g., take). They are mulishly resistant to
conjecturing any mental term. While successive observations force them to change their minds about which physical-action term
is the right one, they never seem to get the idea that this bias should be overridden altogether. This effect was shown by Lederer
et al. (1991) with adults, and by Gillette (1992) with children.
Examination of the further syntactic privileges of eat and want can resolve this problem. Eat occurs intransitively and in the
progressive form while want does not. Want also occurs with tenseless sentence complements (Do you want to eat the apple?).
These distinctions are sufficient to disentangle the two verb construals, for only mental activity verbs license these constructions.
(Note that force verbs, which also accept sentence complements, require an additional nominal position, e.g., Make him eat the
apple!, but not *Make eat the apple!). As we will discuss presently, subjects seize upon this disambiguating structural
information to find the right construal.
A second example of residual problems unresolved by single frames, even though these are paired with differing events,
concerns a blind child's learning of the distinction between touch and see. Blind learners receive observational evidence about
the meanings of both terms though it is perforce haptic and not visual. The blind child's first uses of see at age two were in the
sense 'touch', e.g., she commanded "Don't see that!", while pushing her brother away from her record-player. The confusion
arose, doubtless, because every scene in which the blind child can see ('ascertain by perceptual inspection') is a scene in which
she can touch. And both verbs occur most often in maternal speech as simple transitives. Further syntactic experience ("Let's see
if there's cheese in the refrigerator") can account for how the blind child could, as she did, come to distinguish between the two
construals by age three (Landau and Gleitman 1985).
As a more general example of the convergence that frame ranges make available for verb mapping, consider the four verbs give,
explain, go, and think. These verbs are cross-classified both conceptually and syntactically. Give and explain, different as they
are in many regards, both describe the transfer of entities between two parties. Accordingly, they can appear in structures with
three noun-phrase positions:
(7) Ed gave the horse to Sally.
(8) Ed explained the facts to Sally.
In (7), a physical object (the horse) is transferred from Ed's to Sally's hand and in (8) abstract objects (the facts) are transferred
from Ed's to Sally's mind. A noun phrase is required for each of the entities involved: the giver, the receiver, and that which is
transferred between them. It is this similarity in their meanings that accounts for the similarity in the structures that they accept.
Verbs that describe no such transfer are odd in these constructions:
(9) *Philip went the horse to Libby.
(10) * Philip thought the facts to Libby.
But there is another semantic dimension for these four verbs in which the facts line up differently. Explain and think concern
mental events while give and go concern physical events. There is a typical surface reflex of this distinction also, namely,
mental verbs accept sentence complements (express a relation between an actor and a proposition):
(11) Jane thinks/explains that there is a mongoose in the parlor.
(12) *Jane goes/gives that there is a mongoose in the parlor.
The learner who appreciates both of these mapping relations can deduce from the range of syntactic environments that give
expresses physical transfer while explain expresses mental transfer (that is to say, communication; Zwicky 1971, Fisher et al.
1991). Potentially there can be a rapid convergence on the meaning of a verb from examination of the several structures in
which it appears in speech. Though there are hundreds of transfer verbs and scores of cognition-perception verbs, there is a
much smaller number of verbs whose meanings are compatible with both these structures, and which therefore can express
communication (e.g., tell, shout, whisper). Thus across uses, the syntax can significantly narrow the hypothesis space for the
verb meaning.
Perhaps the most important reason for postulating this cross-sentence procedure has to do with 'open roles'. Though eat is
logically a two-argument predicate, with an eater and an eatee, still one can say The baby is eating. It is often supposed that
learners would unerringly interpret a co-occurring scene as one of eating, and simply refuse this scene/sentence as a learning
opportunity because the scene doesn't line up with the required argument structure for eat (assuming, of course, that the child
can't 'hear traces'). But unfortunately, scenes are complex and therefore almost always support false construals if a single
scene/sentence pair is to be decisive. For instance, the baby is sitting, smiling, and so forth, while she eats. Why not map one of
these
inalienable acts onto the observed intransitive sentence? The advantage of cross-sentence analysis taken together with cross-
scene analysis is that it can reveal the argument structure associated with the verb overall. Attention to several structural
environments becomes an even more important capacity of the learning device when we consider languages such as Chinese,
which allows rampant omission of arguments in the surface structure.15
We have now hypothesized that the lexical entry for the verb is derived from observing its range of syntactic environments
(which reveal the verb's argument-taking properties) taken together with the observational environments (which reveal
'everything else'.) This does not mean that each use of the verb instantiates each argument-taking component of the lexical entry.
For instance, verbs like open and sink express a causal relation in some environments but not others (Carol opens the door vs.
The door opens). The gloss assigned to /open/ in the lexicon must be one that is compatible with both licensed semantic-
syntactic environments. In contrast, die is noncausal (intransitive) only and kill is causal (transitive) only. Following Grimshaw
(1992), we might render the entry for open as
'decomposing' the meaning only to the level required to state the argument structure and thus to predict the surface structures.
The semantic distinction between opening and closing is then derived from examining situational factors, though only God and
little children know just how. There is no opening-vs.-closing syntactic reflex to aid them.
This scheme certainly does not imply that open is interpreted as meaning 'an event which is causal and noncausal at the same
time'. If that were true, then the larger a verb's syntactic range the less it would mean. Rather, the lexical description predicts
that open is causal when transitive, noncausal when intransitive. The interpretive choice among those made available by the
lexical entry is (on any single use of the verb) derived computationally from the truth value of the sentence structure.
15 To mention one more example pertinent to the verbs studied in our experiment, note that both eat and get can occur in
transitive environments. But a clue to the transfer-of-possession sense of get is manifest in distransitive Emily gets
Jacques an ice cream cone vs. *Emily eats Jacques an ice cream cone. Similarly, there is a semantic correlate of object
dropping (eat vs. want), see Resnik (1993).
A difficulty with interpreting these results onto the child learning situation is that these subjects (when correct) were identifying
old verbs that they knew, by definition: Perhaps they just looked up the frame ranges for these known verbs in their mental
lexicons rather than using the frames to make semantic deductions. Because of this possibility, the pertinence of the findings is
much more easily interpreted by inspecting the 48% of cases where the subjects failed to identify the maternal verb, guessing
something else.16 The finding is that false guesses given in response to frame-range information are semantically close to the
actual verb the mother said (as assessed by the Fisher et al. semantic-similarity procedure) while false guesses in response to
scenes were semantically unrelated to the verb the mother actually uttered. As syntactic bootstrapping predicts, the frame range
put the subjects into the 'semantic neighborhood' even when they did not allow convergence to a unique verb construal.
Note that 52% percent correct identification, while a significant improvement over 7% or 28%, is not good enough if we want to
model the fact that verb learning by three-year-olds is a snap. They do not make 48% errors so far as we know, even errors
close to the semantic mark. But as we have repeatedly stressed, syntactic bootstrapping is not a procedure in which the child is
assumed to forget about the scene, or the co-occurring nominals, and attend to syntax alone (as Lederer et al. forced their
subjects to do in this manipulation by withholding all other evidence). It is a sentence-to-world pairing procedure. Indeed,
adding the real nouns to the frames without video in this experiment led to over 80% correct verb identification; adding back the
scene yielded almost perfect performance. So if the child has available (as she does, in real life) multiple paired scenes and
sentences, we can at last understand why verb learning is easy.
Collateral evidence from children for the use of multiple frames is at present thin, largely because it is difficult to get young
children to cooperate while a lengthy set of structures/scenes is introduced. (For this reason, the experiment presented in this
article settled for studying the effect of single structures on the interpretation of single scenes, though we interpret the findings
as a snapshot of an iterative process.) Supportive evidence comes from Naigles et al. (1993), who found that young children will
alter their interpretation of known verbs in response to hearing them in novel syntactic
16 This is analogous to the findings of our experiment as organized in table 5. There we looked only at instances where
the children did not come up with the known verb, but indicated some new construal through a paraphrase. The findings
were the same as for the single-word glosses.
environments, while older children and adults usually will not (for a replication and extension, see Naigles et al. 1992).
Evidently, expansion of the frame range is taken as evidence for alteration of the construal early in the learning process for that
word, but after extensive experience the word's meaning is set and the syntax loses its potency to change the construal.
5.3. What semantic clues reside in the syntax?
We have suggested that the formal medium of phrase structure constrains the semantic content that the sentence is expressing,
thus providing crucial clues to the meaning of its verb. One such clue resides in the number of arguments: A noun phrase
position is assigned to each verb argument; this will differentiate push from fall. Another concerns the positioning of the
arguments: the subject of transitives is the agent, differentiating chase from flee. The case-marking and type of the argument
also matters, e.g., verbs whose meaning allows expression of paths and locations typically accept prepositional phrases (The
rabbit puts the blanket on the monkey; Jackendoff 1978, Landau and Jackendoff, in press), and verbs that express mental acts
and states accept sentential complements (John thinks that Bill is tall, Vendler 1972).
Of course one cannot converge on a unique construal from syntactic properties alone. Since the subcategorization properties of
verbs are the syntactic expressions of their arguments, it is only those aspects of a verb's meaning that have consequences for its
argument structure that could be represented in the syntax. Many most semantic distinctions are not formally expressed with this
machinery. An important example involves the manner in which an act is accomplished, e.g., the distinctions between slide, roll
and bounce, which are not mapped onto differences in their syntactic behavior (Fillmore 1970). All these verbs require as one
argument the moving entity and allow the causal agent and path of motion as other arguments; hence, The ball slid, rolled,
bounced (down the hill); Kimberley slid, rolled, bounced the ball (down the hill). The specific manners of motion are expressed
within the verb rather than surfacing as distinctions in their syntactic ranges.
In sum, it is only the meaning of a verb as an argument-taking predicate that can be represented by the surface phrase structures
(Rappaport et al. 1987, Fisher et al. 1991, Fisher, in press). The structures can therefore reveal only certain global properties of
the construal, such as whether the verb can express inalienable (intransitive), transfer (ditransitive), mental/perceptual (inflected
sentence complement), and symmetrical (sensitivity of the frame to
the number of one of its arguments) contents, and whether it expresses an activity (progressive) or a state (simple present).
Overall, our view is not that there are 'verb classes', each of which has semantic components and (therefore) licenses certain
structures. Rather we suggest that verb frames have semantic implications (truth values), and verbs have meanings. Owing to the
meaning of the verb, it will be uncomfortable and thus rarely or never uttered in some frame, e.g., we don't say "Barbara looked
the ball on the table" because no external agent can cause a ball to move just by looking at it (that would be psychokinesis). If
the circumstances warrant, however, look can and will be used unexceptionally in this frame; for example, the rules of baseball
make it possible to say (and sports announcers do say) "The shortstop looked the runner back to third base". As for learners, we
believe they note the frame environments in which verbs characteristically occur, and thus the argument structures with which
their meanings typically comport. These ranges of 'typical structures' are compatible with only small sets of verb meanings.
Because the formal medium of phrase structure is revealing only of a restricted set of semantic properties, we cannot and have
not argued that the verb mappings are learned 'from' the syntax. Indeed we have just made clear that what most people think of
as the 'meaning' (that open concerns being ajar while close concerns being shut) is nowhere to be found in the syntax of
sentences. Rather, we have shown that the initial narrowing of the search-space for that meaning, by attention to the argument
structure as revealed by the syntax, is the precondition for using the scene information efficiently to derive the meaning. When
babies do not appear to know the phrase structure, they learn few verbs; when adults and young children are required to identify
verbs without phrase structure cues (as when told "Look! Ziking!" or when presented with silent videos of mother-child
conversation) again they do not converge to a unique interpretation. We conclude that the phrase structure is the learner's
version of a zoom lens for verb vocabulary acquisition.17
17 Pinker (1984) hypothesized that, in the relatively advanced child, phrase structural information could be used for
another purpose: to assign abstract words, those that are neither things nor acts (e.g., situation, know) to lexical categories
such as noun and verb: a new item that occurs in a verb position in the structure is, in virtue of that position, a verb; and
so forth. Pinker termed this procedure 'structure dependent distributional learning'. This seems plausible. But this
procedure will give no clue to the verb meaning, other than that the word means 'something verby'. In contrast, the
position we adopt allows semantic distinctions within the verb class to be extracted. Know can be assigned to the class of
'mental' verbs just because it accepts tensed sentence complements. This gross semantic classification accomplished, the
burden on observation is still to distinguish among think, know, realize and so forth.
one of the meanings and others the consequence of the other. Putting them together as a single frame-range should lead to
chaos. The degree to which polysemy reduces the plausibility of the use of multiple frames is unknown in detail (see Grimshaw,
this volume, for a pessimistic view). However, there is some suggestive evidence that the frame-ranges of verbs are well-
correlated with their meanings in the general case, despite this problem.
The manipulations of interest in this regard were carried out in English by Fisher et al. (1991) and in Hebrew by Geyer et al.
(forthcoming). One group of subjects provided the frame ranges for a set of common verbs (they gave judgments of
grammaticality of all the verbs in various syntactic environments). A second group of subjects provided semantic-relatedness
judgments for these verbs presented in isolation (with no syntactic context). The question was whether the overlap in frame-
ranges predicted the semantic relatedness among the verbs. The answer is yes, massively and in materially the same way for
English and for Hebrew. The more any two verbs overlapped in their syntactic privileges, the closer they were judged to be in
their meanings. Evidently, overlap in frame range provides a guide to semantic relatedness that (though probabilistic) is stable
enough to contribute to the verb-learning feat.
The three problems just described variability of the mapping relations, alternate parses for input sentences, and polysemy limit
or at least complicate the potential effectiveness of the procedure we have called syntactic bootstrapping. Thus any linguist or
psychologist worth his or her salt can find counterexamples to the claim that frame-information, or frame-range information,
always and perfectly predicts the relevant (argument-taking) properties of the verbs just as counterexamples to the usefulness of
situational information are easy to find.
We must suppose, in consequence, that the learner draws on convergent cues from prosody, syntax, and situation, as available,
jiggling them all across instances to achieve the best fit to a lexical entry. That is, the internal structure of the child's learning
procedure is likely to be quite mixed in the information recruited and probabilistic in how such information is exploited, sad as
this seems. In the work presented, we could show only that syntactic evidence is on theoretical grounds crucial for working out
certain mapping problems (those that involve perspective-taking verbs) and indeed is used by youngsters solving for these under
some exquisitely constrained laboratory conditions. We take the outcomes to lend plausibility to the overall approach.
6. Conclusions
We have proposed a learning procedure for verbs which requires that children be armed with at least some innate
semantic/syntactic correspondence rules, and considerable abilities and dispositions to perform formal analyses on the speech
they hear. As such, these ideas have often been rejected as too formidable to be used by babies sometimes by the same
commentators who invoke highly abstract formal principles to account for the child's acquisition of syntax. In Pinker's (1989:
263264) words, if the syntactic bootstrapping 'mechanism is used at all, it is used as a sophisticated form of cognitive problem
solving rather than a general interpretive linguistic scheme', 'a kind of riddle solving'. In contrast, there is something so tangible
and appealing to introspection about the idea of parsing of ongoing events that this is widely accepted as a sufficient basis for
lexical learning.
But these intuition-derived theoretical biases cannot so lightly be accepted. They have fooled us before. For example, there is
strong cross-linguistic evidence that two-year-olds are at least as quick probably quicker to extract the formal aspects of gender
as to extract their semantic aspects (Levy 1983). Gordon (1985) has shown that young children are more attentive to the formal
distinction between mass and count nouns in English than to the semantic correlates of this distinction.
The present experiment documented only the focusing ('zoom lens') aspect of the syntactic bootstrapping procedure. The
successive narrowing of the semantic conjecture that derives from observation of a verb's several licensed structural
environments was not tested, though prior experimentation we have cited demonstrates both the strength of these relations and
use of this procedure by adults and children in verb identification.
It is rather more surprising that there is little systematic evidence for the word-to-world pairing procedure either. An interesting
exception is Gropen et al. (1991). Children in this experiment heard new motion verbs while shown several example scenes
along with the syntactically uninformative sentence "This is pilking". Most of them learned the verb meanings (though some did
not). Unfortunately, the teaching procedure included negative instances ("This is not pilking") and specific correction whenever
the children erred. So far as we know, such explicit negative evidence is not usually available to learners.18 Still, no one can
doubt that a salient motion (e.g., zigzagging in
18 For Gropen et al.'s purposes this unusual teaching environment did not matter. Their aim was to show that if the
children did learn the verb, they could project its argument structure and hence its probable surface structure privileges.
The answer was positive: Knowledge of meaning predicts surface structure (though errorfully, see Bowerman 1982) just
as the latter (again, errorfully) predicts the former.
this experiment) can sometimes be mapped against a spoken verb from the bare evidence of observation.
It is the overwhelming fallibility of such a word-to-scene procedure that we have emphasized in this article. Therefore we have
challenged the logic of observation alone as the input to verb learning. One such challenge is that too many verbs come in pairs
that are just about always mapped onto the same situations, so cross-situational observation will never distinguish between
them. Another is that some verbs encode concepts that are not observable at all, e.g., know or want. Another is that
considerations of salience (the agency bias) are fatal to the possibility of verb learning in all the cases where the caretaker
happens to utter some word which is not 'the most salient of all' for the scene then in view.
To help redress these logical and practical problems we have shown that children can make significant use of structural
information. At the same time their construals of new verb meanings are affected by biases as to how to represent an event. But
this influence from plausibility considerations acknowledged, the influence of structure is materially stronger and wins out in the
majority of cases. Toddlers know that it is better to receive than to give when Santa is the indirect object of the sentence.
References
Armstrong, S., L. R. Gleitman and H. Gleitman, 1983. What some concepts might not be. Cognition 13, 263208.
Bates, E. and B. MacWhinney, 1982. Functionalist approaches to grammar. In: E. Wanner, L. R. Gleitman (eds.), Language
acquisition: The state of the art, 173218. New York: Cambridge University Press.
Beckwith, R., E. Tinker and L. Bloom, 1989. The acquisition of non-basic sentences. Paper presented at the Boston University
Conference on Language Development, Boston.
Bowerman, M., 1976. Semantic factors in the acquisition of rules for word use and sentence construction. In: D.M. Morehead,
A.E. Morehead (eds.), Normal and deficient child language. Baltimore, MD: University Park Press.
Bowerman, M., 1982. Reorganizational processes in lexical and syntactic development. In: E. Wanner, L. R. Gleitman (eds.),
Language acquisition: The state of the art, 319346. New York: Cambridge University Press.
Braine, M. and J. Hardy, 1982. On what case categories there are, why they are, and how they develop: An amalgam of a priori
considerations, speculation, and evidence from children. In: E. Wanner, L.R. Gleitman (eds.), Language acquisition: The state of
the art, 219239. New York: Cambridge University Press.
Brown, R., 1957. Linguistic determinism and the part of speech. Journal of Abnormal and Social Psychology 55, 15.
Bruner, J.S., 1975. From communication to language: A psychological perspective. Cognition 3, 255287.
Bruner, J.S., 1983. Child's talk. New York: Norton.
Chomsky, N., 1957. Syntactic structures. New York: Mouton Publishers.
Chomsky, N., 1981. Lectures on government and binding. Dordrecht: Foris.
Clark, E.V., 1987. The principle of contrast: A constraint on language acquisition. In: B. MacWhinney (ed.), Mechanisms of
language acquisition, 264293. Hillsdale, NJ: Erlbaum.
Clark, H.H. and J.S. Begun, 1971. The semantics of sentence subjects. Language and Speech, 14, 3446.
Cooper, W. and J. Paccia-Cooper, 1980. Syntax and speech. Cambridge, MA: Harvard University Press.
Dowty, D., 1991. Thematic proto-roles and argument selection. Language 67(3), 547619.
Dromi, E., 1987. Early lexical development. New York: Cambridge University Press.
Fernald, A. and T. Simon, 1984. Expanded intonation contours in mothers' speech to newborns. Developmental Psychology
20(1), 104113.
Fillmore, C.J., 1968. Lexical entries for verbs. Foundations of language 4, 373393.
Fillmore, C.J., 1970. The grammar of hitting and breaking. In: R. Jacobs, P. Rosenbaum (eds.), Readings in English
transformational grammar, 120133. Waltham, MA: Ginn.
Fisher, C., in press. Structure and meaning in the verb lexicon: Input for a syntax-aided verb learning procedure. Language and
Cognitive Processes.
Fisher, C., 1993. Preschoolers' use of structural cues to verb meaning. Paper presented at the 60th Annual meeting of the Society
for Research in Child Development, New Orleans, LA.
Fisher, C. and H. Tokura, in press. Prosody in speech to infants: Direct and indirect cues to syntactic structure. In: J. Morgan, C.
Demuth (eds.), Signal to syntax. Hillsdale, NJ: Erlbaum.
Fisher, C., H. Gleitman and L.R. Gleitman, 1991. On the semantic content of subcategorization frames. Cognitive Psychology
23, 331392.
Fodor, J.A., 1981. The present status of the innateness controversy. In: J.A. Fodor (Ed.), Representations. Cambridge, MA: MIT
Press.
Fritz, J.J. and G. Suci, 1981. Facilitation of semantic comprehension at the one-word stage of language development. Journal of
Child Language 9, 3139.
Gentner, D., 1978. On relational meaning: The acquisition of verb meaning. Child Development 49, 988998.
Gentner, D., 1982. Why nouns are learned before verbs: Linguistic relativity versus natural partitioning. In: S.A. Kuczaj (ed.),
Language development, Volume 2: Language, thought and culture, 301334. Hillsdale, NJ: Erlbaum.
Gerken, L., P.W. Jusczyk and D.R. Mandel, 1993. When prosody fails to cue syntactic structure: Nine-month-olds' sensitivity to
phonological vs. syntactic phrases. Ms., State University of New York at Buffalo.
Geyer, J., L.R. Gleitman and H. Gleitman, forthcoming. Subcategorization as a predictor of verb meaning: Evidence from
modern Hebrew. Ms., University of Pennsylvania.
Gillette, J.A., 1992. Discovering that /think/ means 'think': The acquisition of mental verbs. Ms., University of Pennsylvania.
Gillette, J.A. and L.R. Gleitman, forthcoming. Effects of situational cues on the identification of nouns and verbs.
Givón, T., 1986. The pragmatics of word order: Predictability, importance, and attention. Amsterdam: Benjamins.
Gleitman, L.R., 1990. The structural sources of verb meanings. Language Acquisition 1, 155.
Gleitman, L.R., H. Gleitman, B. Landau and E. Wanner, 1988. Where learning begins: Initial representations for language
learning. In: E. Newmayer (ed.), The Cambridge Linguistic Survey, Vol. III: 150193. New York: Cambridge University Press.
Goldin-Meadow, S., M. Seligman and R. Gelman, 1976. Language in the two-year-old. Cognition 4, 189202.
Golinkoff, R.M., R.C. Jacquet and K. Hirsh-Pasek, in progress. Lexical principles underly the learning of verbs. Ms., University
of Delaware.
Gordon, P., 1985. Evaluating the semantic categories hypothesis: The case of the count/mass distinction. Cognition 20, 209242.
Grimshaw, J., 1981. Form, function and the language acquisition device. In: CL. Baker, J.J. McCarthy (eds.), The logical
problem of language acquisition, 165182. Cambridge, MA: MIT Press.
Grimshaw, J., 1990. Argument structure. Linguistic Inquiry Monograph 18. Cambridge, MA: MIT Press.
Gropen, J., S. Pinker, M. Hollander and R. Goldberg, 1991. Affectedness and direct objects: The role of lexical semantics in the
acquisition of verb argument structure. Cognition 41, 153195.
Grosjean, F. and J.P. Gee, 1987. Prosodic structure and spoken word recognition. Cognition 25, 135155.
Gruber, J.S., 1965. Studies in lexical relations. Ph.D. dissertation, MIT.
Hall, D.G., 1993. Basic-level individuals. Cognition 48, 199221.
Hall, D.G. and S. Waxman, 1993. Assumptions about word meaning: Individuation and basic level kinds. Child Development
64, 15501570.
Hirsh-Pasek, K. and R.M. Golinkoff, 1991. Language comprehension: A new look at some old themes. In: N. Krasnegor, D.
Rumbaugh, M. Studderty-Kennedy, R. Schiefelbusch (eds.), Biological and behavioral aspects of language acquisition, 301320.
Hillsdale, NJ: Erlbaum.
Hirsh-Pasek, K., H. Gleitman, L. Gleitman, R. Golinkoff and L. Naigles, 1988. Syntactic bootstrapping: Evidence from
comprehension. Paper presented at the Boston University Conference on Language Development, Boston.
Hirsh-Pasek, K., D. Kemler-Nelson, P. Jusczyk, K. Cassidy, B. Druss and L. Kennedy, 1987. Clauses are perceptual units for
young infants. Cognition 26, 269286.
Jackendoff, R., 1972. Semantic interpretation in generative grammar. Cambridge, MA: MIT Press.
Jackendoff, R., 1978. Grammar as evidence for conceptual structure. In: M. Halle, J. Bresnan, G. Miller (eds.), Linguistic theory
and psychological reality, 201228. Cambridge, MA: MIT Press.
Jackendoff, R., 1990. Semantic structures. Cambridge, MA: MIT Press.
Joshi, A. and O. Rambow, in progress. Dependency parsing for phrase structure grammars. Ms., University of Pennsylvania.
Jusczyk, P.W., K. Hirsh-Pasek, D.G. Kemler-Nelson, L.J. Kennedy, A. Woodward and J. Piwoz, 1992. Perception of acoustic
correlates of major phrasal units by young infants. Cognitive Psychology 24, 252293.
Jusczyk, P. and D. Kemler-Nelson, in press. Syntactic units, prosody, and psychological reality during infancy. In: J. Morgan, C.
Demuth (eds.), Signal to syntax. Hillsdale, NJ: Erlbaum.
Kako, E., in progress. Preschoolers assign novel verbs to novel actions. Ms., University of Pennsylvania.
Kayne, R., 1992. A restrictive theory of word order and phrase structure. Paper presented at the Annual Meeting of GLOW,
Lisbon, Portugal.
Keenan, E.L., 1976. Toward a universal definition of 'subject'. In: C.N. Li (ed.), Subject and topic, 305333. New York:
Academic Press.
Klatt, D.H., 1975. Vowel lengthening is syntactically determined in a connected discourse. Journal of Phonetics 3, 129140.
Landau, B. and L.R. Gleitman, 1985. Language and experience: Evidence from the blind child. Cambridge, MA: Harvard
University Press.
Landau, B. and R. Jackendoff, in press. 'What' and 'where' in spatial language and spatial cognition. Brain and Behavioral
Sciences.
Lederer, A., and M. Kelly, 1991. Prosodic correlates to the adjunct/complement distinction in motherese. Papers and Reports in
Child Language Development 30, Stanford, CA‥
Lederer, A., H. Gleitman and L.R. Gleitman, 1991. The informativeness of cross-situational and cross-sentential evidence for
learning the meaning of verbs. Paper presented at the Boston University Conference on Language Development, Boston.
Lederer, A., H. Gleitman and L.R. Gleitman, in press. The syntactic contexts of maternal verb use. In: M. Tomasello, W.
Merriman (eds.), Beyond names for things: Young children's acquisition of verbs. Hillsdale, NJ: Erlbaum.
Lehiste, I., J.R. Olive and L. Streeter, 1976. The role of duration in disambiguating syntactically ambiguous sentences. Journal
of the Acoustical Society of America 60(5), 11991202.
Lenneberg, E.H., 1967. Biological foundations of language. New York: Wiley.
Leslie, A.M., 1982. The perception of causality in infants. Perception 11, 173186.
Leslie, A.M. and S. Keeble, 1987. Do six-month-old infants perceive causality? Cognition 25, 265288.
Levin, B., 1985. Lexical semantics in review: An introduction. In: B. Levin (ed.), Lexical semantics in review. Lexicon Project
Working Papers, 1: 162. Cambridge, MA: The MIT Center for Cognitive Science.
Levy, Y., 1983. It's frogs all the way down. Cognition 15, 7593.
Mandler, J., 1991. Prelinguistic primitives. In: L.A. Sutton, C. Johnson (eds.), Proceedings of the 17th Annual Meeting of the
Berkeley Linguistics Society, 414425. Berkeley, CA: Berkeley Linguistics Society.
Markman, E., 1989. Categorization and naming in children: Problems of induction. Cambridge, MA: MIT Press.
Mazuka, R. 1993. How can a grammatical parameter be set before the first word? Paper presented at the Signal to Syntax
Conference, Brown University, Providence, RI.
Michotte, A., 1963. The perception of causality. London: Methuen.
Naigles, L., 1990. Children use syntax to learn verb meanings. Journal of Child Language 17, 357374.
Naigles, L., A. Fowler and A. Helm, 1992. Developmental shifts in the construction of verb meanings. Cognitive Development
7, 403427.
Naigles, L., H. Gleitman and L.R. Gleitman, 1993. Children acquire word meaning components from syntactic evidence. In: E.
Dromi (ed.) Language and development, 104140. Norwood, NJ: Ablex.
Nelson, K., 1973. Structure and strategy in learning to talk. Monographs of the Society for Research in Child Development 38.
Nelson, K., 1974. Concept, word, and sentence: Interrelations in acquisition and development. Psychological Review 81,
267285.
Newport, E.L., 1977. Motherese: The speech of mothers to young children. In: N. Castellan, D. Pisoni, G. Potts (eds.),
Cognitive Theory, Volume 2. Hillsdale, NJ: Erlbaum.
Ninio, A., 1980. Ostensive definition in vocabulary acquisition. Journal of Child Language, 7, 565573.
Pinker, S., 1984. Language learnability and language development. Cambridge, MA: Harvard University Press.
Pinker, S., 1989. Learnability and cognition. Cambridge, MA: MIT Press.
Quine, W.V., 1960. Word and object. Cambridge, MA: MIT Press.
Rappaport, M., B. Levin and M. Laughren, 1987. Levels of lexical representation. Lexicon Project Working Papers #20.
Cambridge, MA: MIT Center for Cognitive Science.
Resnick, P., 1993. Selectional preference and implicit objects. Paper presented at the CUNY Sentence Processing Conference,
Amherst, MA.
Schlesinger, I., 1988. The origin of relational categories. In: Y. Levy, I. Schlesinger, M. Braine (eds.), Categories and processes
in language acquisition, 121178. Hillsdale, NJ: Erlbaum.
Slobin, D.I., 1975. On the nature of talk to children. In: E.H. Lenneberg, E. Lenneberg (eds.), Foundations of language
development. New York: Holt, Rinehart and Winston.
Talmy, L., 1985. Lexicalization patterns: Semantic structure in lexical forms. In: T. Shopen (ed.), Language typology and
syntactic description, Volume 3: Grammatical categories and the lexicon. New York: Cambridge University Press.
Tomasello, M. and A.C. Kruger, 1992. Joint attention on actions: Acquiring verbs in ostensive and non-ostensive contexts.
Journal of Child Language 19, 311333.
Vendler, Z., 1972. Res cogitans: An essay in rational psychology. Ithaca, NY: Cornell University Press.
Zwicky, A., 1971. In a manner of speaking. Linguistic Inquiry 11(2), 223233.
from its situations of use that could only be resolved by using the verb's set of subcategorization frames.
(see, e.g., Moravscik 1981, Markman 1989, 1990; Jackendoff 1990). For example, this representational system would allow
'object with shape X' and 'object with function X' as possible word meanings, but not 'all the undetached parts of an object with
shape X', 'object with shape X or a Buick', and 'object and the surfaces it contacts'. The second constraint comes from the way in
which a child's entire lexicon may be built up; on how one word's meaning may be related to another word's meaning (see Miller
1991, Miller and Fellbaum 1992). For example, the lexicons of the world's languages freely allow meronyms (words whose
meanings stand in a partwhole relationship, like bodyarm) and hyponyms (words that stand in a subsetsuperset relationship, like
animalmammal), but do not easily admit true synonyms (Bolinger 1977, Clark 1987, Miller and Fellbaum 1991). A child would
therefore not posit a particular meaning for a new word if it was identical to some existing word's meaning. Finally, the child
would have to be equipped with a procedure for testing the possible hypotheses about word meaning against the situations in
which adults use the words. For example, if a child thought that pet meant 'dog', he or she will be disabused of the error the first
time the word is used to refer to a fish.
Although the problem of learning word meanings is usually discussed with regard to learning nouns, identical problems arise
with verbs (Landau and Gleitman 1985, Pinker 1988, 1989; Gleitman 1990). When a parent comments on a dog chasing a cat by
using the word chase, how is the child to know that it means 'chase' as opposed to 'flee', 'move', 'go', 'run', 'be a dog chasing',
'chase on a warm day', and so on?
As in the case of learning noun meanings (indeed, learning in general), there must be constraints on the child's possible
hypotheses. For example, manner-of-motion should be considered a possible component of a verb's mental dictionary entry, but
temperature-during-motion should not be. (See Talmy 1985, 1988; Pinker 1989, Jackendoff 1990, and Dowty 1991, for
inventories of the semantic elements and their configurations that may constitute a verb's semantic representation.) Moreover,
there appear to be constraints on lexical organization (Miller 1991, Miller and Fellbaum 1991). For example, verb lexicons often
admit of co-troponyms (words that describe different manners of performing a similar act or motion, such as walk-skip-jog) but,
like noun lexicons, rarely admit of exact synonyms (Bolinger 1977, Clark 1987, Pinker 1989, Miller and Fellbaum 1991).
Finally, the child must be equipped with a learning mechanism that constructs, tests, and modifies semantic representations by
comparing information about the uses of verbs by other speakers across speech events (Pinker 1989).
associated with a verb is highly informative about the meaning it conveys. In fact, since the surface forms are the carriers
of critical semantic information, the construal of verbs is partly indeterminant without the subcategorization information.
Hence, in the end, a successful learning procedure for verb meaning must recruit information from inspection of the many
grammatical formats in which each verb participates.' (1985: 138139)
For example, here's how a child hearing the verb glip in a variety of syntactic frames could infer various components of its
meaning from the characteristic semantic correlates of those frames. Hearing I glipped the book (transitive frame, with a direct
object), a child could guess that glipping is something that can be done to a physical object. Hearing I glipped that the book is
on the table (frame with a sentential complement), the child could infer that glipping involves some relation to a full
proposition. Hearing I glipped the book from across the room (frame with an object and a directional complement) tells him or
her that glipping can involve a direction. Moreover, the absence of Glip that the book is on the table! (imperative construction)
suggests that glipping is involuntary, and the absence of What John did was glip the book (pseudo-cleft construction) suggests
that it is not an action. With this information, the child could figure out that glip means 'see', because seeing is an involuntary
non-action that can be done to an object or a proposition from a direction. Note that the child could make this inference without
seeing a thing, and without seeing anyone seeing anything. In her 1990 paper laying out this hypothesis in detail and discussing
the motivation for it, Gleitman calls this learning procedure 'syntactic bootstrapping', and offers it as a major mechanism
responsible for the child's success at learning verb meanings.
The goal of the present paper is to examine the general question of how a child could use the syntactic properties of a verb to
figure out its semantic properties. I will discuss several kinds of mechanisms that infer semantics from syntax, attempting to
distinguish what kinds of inputs they take, how they work, what they can learn, and what kind of evidence would tell us that
children use them. I will focus on Gleitman's (1990) thorough and forceful arguments for the importance of syntax-guided verb
learning. After she puts these arguments in particularly strong form in order to make the best case for them and to find the limits
as to what they can accomplish, Gleitman settles on an eclectic view in which a set of learning mechanisms, some driven by
syntax and some not, complement each other. I agree with this eclectic view and will try to lay out the underlying division of
labor among learning mechanisms more precisely. In doing so, I will, however, be disagreeing with some of the particular
strong claims that Gleitman makes about syntaxguided learning of meaning in the main part of her paper.
Moreover, some of the information about how a verb is used in a sentence is based on universal features of semantics. For
example, the sentence I am glipping apples could inform a learner that glip can't mean 'like', because the progressive aspect
marked on the verb is semantically incompatible with the stativity of liking. Here, too, one can learn something about a verb's
meaning from the sentence in which the verb is used, as opposed to the situation in which the verb is used, but the learning is
driven by semantic information (in this example, that liking does not inherently involve changes over time), not syntactic
information.
Gleitman (1990) does not contest this distinction; in footnote 8 on p. 27 and in footnote 26 (p. 379) of Fisher et al. (1991), she
states that her arguments are not about the use of linguistically-conveyed information in general, but about the use of the
syntactic properties of verbs per se. Nonetheless, the distinction has implications that bear on her arguments in ways she does
not make explicit.
First, the distinction blunts the intuitive impact of two of Gleitman's recurring arguments for the importance of syntactic
information: that blind children learn verbs' meanings without seeing their referent events, and that parents do not invariably use
verbs in unique situations (e.g., they do not say open simultaneously with opening something). These phenomena suggest that
children must attend to what parents say, not just what they do. The phenomena do not, however, lead by some process of
elimination to the hypothesis that children are using the syntactic subcategorization properties of individual verbs. The children
may just be figuring out the content of the sentences, and inferring a verb's semantics from its role in the events conveyed.
Second, many of the supposedly syntactically-cued inferences that Gleitman appeals to may actually be semantically cued in the
same sense that hearing a verb used with sandwich suggests that it involves eating. The 'subcategorization frames' that Landau
and Gleitman (1985), Gleitman (1990), and Fisher et al. (1991) appeal to are distinguished more by the semantic content of
particular words in them than by their purely syntactic (i.e., categorical) properties. Indeed, most of the entries are not
syntactically distinct subcategorization frames in the linguist's sense at all. Of the 33 entries listed in Appendix A of Fisher et al.
(1991), two thirds are actually not syntactically distinct subcategorization frames. Seventeen frames are syntactically identical
V-PP frames differing only in the choice of preposition (e.g., in NP versus on NP). (Fisher et al. did, to be sure, collapse these
prepositions into a single frame type in the data analysis of their study.) Three are V-S'
frames differing only in the choice of complementizers (e.g., that S versus if S). There are V-NP-PP frames differing only in the
choice of preposition (e.g., NP to NP versus NP from NP; these were, however, collapsed in the analysis). And three are not
subcategorization frames at all but the morphosyntactic constructions imperative, progressive, and pseudo-cleft, which are
syntactically well-formed with any verb (though some are awkward because of semantic clashes, such as involuntary verbs in
the imperative). The problem is that even if learners can use verbs' patterning across these linguistic contexts, it is misleading to
say that they would be relying on syntactic information. In most modern theories of verbs' compatibility with prepositions and
complementizers (see Jackendoff 1987, 1990; Pinker 1989, Grimshaw 1979, 1981, 1990), the selection is made on semantic
grounds: for example, verbs involving motion in a direction can select any preposition that involves a direction. There are verb-
specific idiosyncrasies, to be sure (such as rely on and put up with), but even these may be treated as involving idiosyncratic
semantic properties of the verb. Thus if a child notices that a verb takes across and over but not with or about, and infers that
the verb involves motion, the child is not using syntactic information, but figuring out that an event involving the traversal of
paths (inherent to the meaning of across and over) is likely to involve motion, just as an event that involves sandwiches and
hunger is likely to involve eating.1
2.2. The term 'syntactic bootstrapping' and the opposition of 'syntactic' and
'semantic' bootstrapping are misleading
It is unfortunate that Gleitman chose the term 'syntactic bootstrapping' to refer to the process of inferring a verb's meaning from
its set of subcategorization
1 Note that some of the other linguistic contexts that Landau and Gleitman call 'subcategorization frames' are not
subcategorization frames either, but frozen expressions and collocations that are probably idiosyncratic to English and
hence no basis for learning. These include Look!, See?, Look! The doggie is running!, See? The doggie is running!, Come
see the doggie, and look like in the sense of 'resemble'. Since look and see are the only two verbs that Landau, Gleitman,
and their collaborators discuss in detail, if their learning scenarios for these two verbs adventitiously exploit particular
properties of English, one has to be suspicious about the feasibility of the scenario in the general case. More generally,
Fisher, Gleitman, and Gleitman's claim that there are something like 100 distinct syntactic subcategorization frames,
hence, in principle, 2100 syntactically distinguishable verbs, appears to be a severe overestimate. I think most linguists
would estimate the number of syntactically distinct frames as an order of magnitude lower, which would make the
estimated number of syntactically distinguishable verbs a tiny fraction of what Fisher et al. estimate.
frames. She intended the term to suggest an opposition to my 'semantic bootstrapping' (Pinker 1982, 1984, 1987, 1989), and one
of the sections in her 1990 paper is even entitled 'Deciding between the bootstrapping hypotheses'. Though the opposition
'semantic versus syntactic bootstrapping' is catchy, I suggest it be dropped. The opposition is a false one, because the theories
are theories about different things. Moreover, there is no relationship between what Gleitman calls 'syntactic bootstrapping' and
the metaphor of bootstraps, so the term makes little sense.
Gleitman uses the term 'semantic bootstrapping' to refer to the hypothesis that children learn verbs' meanings by observing the
situations in which the verbs are used. But this is not accurate. 'Semantic bootstrapping' is not even a theory about how the child
learns word meanings. It is a theory about how the child begins learning syntax. 'The bootstrapping problem' in grammar
acquisition (see Pinker 1987) arises because a grammar is a formal system consisting of a set of abstract elements, each of
which is defined with respect to other elements. For example, the 'subject' of a sentence is defined by a set of formal properties,
such as its geometric position in the tree with respect to the S and VP nodes, its ability to force agreement with the verb, its
intersubstitutability with pronouns of nominative case, and so on. It cannot be identified with any semantic role, sound pattern,
or serial position. The bootstrapping problem is: How do children break into the system at the very outset, when they know
nothing about the particular language? If you know that verbs agree with their subjects, you can learn where the subjects go by
seeing what agrees with the verb but how could you have learned that verbs agree with their subjects to begin with, if you don't
yet know where the subjects go? How can children 'lift themselves up by their bootstraps' at the very outset of language
acquisition, and make the first basic discoveries about the grammar of their language that are prerequisite to any further
learning?
Pinker (1982), following earlier suggestions of Grimshaw (1981), suggested that certain contingencies between perceptual
categories and syntactic categories, mediated by semantic categories, could help the child get syntax acquisition started. For
example, if the child was built with the universal linking rule that agents of actions were subjects of active sentences, and they
could infer from a sentence's perceptual context and the meanings of some of its content words that a particular word referred to
the agent of an action, the child could infer that that word was in subject position. Once the position of the subject is established
as a rule or parameter of the child's nascent grammar, further kinds of learning can proceed. For example, the child could now
infer that any new word in this newly-identified position must be a subject, regardless
of whether it is an agent; he or she could also infer that verbs must agree in person and number with the element in that position.
See Pinker (1984) and (1987) for a more precise presentation of the hypothesis.
The semantic bootstrapping hypothesis does require, as a background assumption, the idea that the semantics of at least some
verbs have been acquired without relying on syntax. That is because the theory is about how syntax gets 'bootstrapped' at the
very beginning of learning; if all word meanings were acquired via knowledge of syntax, and if syntax were acquired via
knowledge of words' meanings, we would be faced with a vicious circle. The semantic bootstrapping hypothesis is agnostic
about how children have attained knowledge of these word meanings. Logically speaking, they could have used telepathy,
surgery, phonetic symbolism, or innate knowledge of the English lexicon, but the most plausible suggestion is that the children
had attended to the contexts in which the words are used. Gleitman takes this latter assumption (that the child's first word
meanings are acquired by attending to their situational contexts), generalizes it to a claim that all verb meanings are acquired by
attending to their situational contexts (i.e., even verbs acquired after syntax acquisition is underway), and refers to the
generalized claim as 'semantic bootstrapping'. But this is a large departure from its intended meaning.
And what Gleitman calls, in contrast, 'syntactic bootstrapping', is not a different theory of how the child begins to learn syntax.
Thus it is not an alternative to the semantic bootstrapping hypothesis. (The only reason they could be construed as competitors is
that semantic bootstrapping assumes that at least some verb meanings can be acquired before syntax, so a very extreme form of
Gleitman's negative argument, that no verb meaning can be learned without syntax, is incompatible with it.) Moreover, since
'syntactic bootstrapping' is a theory of how the child learns the meanings of specific verbs, and since it can only apply at the
point at which the child has already acquired the syntax of verb phrases, it is not clear what it has to do with the 'bootstrapping
problem' or the metaphor of lifting oneself up by one's bootstraps. For these reasons, I suggest that the term be avoided.
Here is a somewhat cumbersome, but transparent and accurate set of replacements. 'Semantic cueing of syntax' refers to the
semantic bootstrapping hypothesis. 'Semantic cueing of word meaning' refers to the commonplace assumption that meanings are
learned via their semantic contexts (perceptual or linguistic). 'Syntactic cueing of word meaning' is the hypothesis defended by
Gleitman and her collaborators.
Now, in some contexts Gleitman does present a genuine alternative to the semantic bootstrapping hypothesis. She suggests that
the child can use the prosody of a sentence to parse it into a syntactic tree. Though she never specifies exactly how this could be
done, presumably the child would assume that pauses or falling intonation contours signal phrase boundaries. Having thus
inferred a syntactic tree, the child could infer a verb's meaning from the trees it appears in. Note, though, that the information
that the child uses to get syntax acquisition started is not itself syntactic, but prosodic; the hypothesis can thus sensibly be called
`prosodic bootstrapping'. If both prosodic bootstrapping, and syntactic cueing of word meaning were possible, semantic
bootstrapping would be otiose.
But while it is plausible that the infant uses prosodic information to help in sentence analysis at the outset of language
acquisition (e.g., to identify utterance boundaries), it is completely implausible that this information is sufficient to build a full
syntactic tree for an input sentence (see Pinker 1987). the prosodic bootstrapping hypothesis, taken literally, is quite
extraordinary. It is tantamount to the suggestion that there is a computational procedure that can parse sentences from any of the
world's 5,000 languages when the sentences are spoken from behind a closed door (i.e., the sentences are filtered so that only
prosodic information remains). Among the surprising corollaries to this claim is that it should be fairly easy for a person or
machine to give a full parse to an English sentence heard from behind a closed door, because the listener can use both the
universal and the English-specific mappings between prosody and syntax, whereas the child supposedly is capable of doing it
using only the universal mappings. If, on the contrary, we, knowing English, cannot parse a sentence from behind a closed door,
it suggests that the young child, not knowing English, is unlikely to be able to do so either. Thus the claim that infants can
bootstrap syntax from prosody must be viewed with considerable skepticism.2
Overview of Gleitman's arguments for the syntactic cueing of verb semantics. With these independent issues out of the way, we
can now turn to Gleitman's arguments for the importance of the syntactic cueing of verb meaning. These arguments fall into
three categories. There are negative arguments: verb meanings cannot be learned from observation of situational contexts alone;
therefore some other source of information is required. There is a positive
2 Moreover, many of the 'syntactic' frames that Gleitman assumes the child is discriminating in order to infer verbs'
meanings are prosodically identical, such as frames differing only in the specific prepositions or complementizers they
contain, like in versus on or that versus if (see, e.g., Gleitman 1990: table 2).
hypothetical argument: verb meanings could be learned from verb syntax; therefore verb syntax probably is that other source.
And there are empirical arguments: Children in fact learn verb meaning from verb syntax. I will examine these arguments
separately.
someone who holds his ground), and move will be used for instances of moving without pushing (e.g., sliding or walking). To
take another one of Gleitman's examples (1990: 14), even though a single event may be describable as pushing, as rolling, and
as speeding, most events are not. The child need merely wait for an instance of rolling without pushing or speeding, speeding
without pushing or rolling, and pushing without rolling or speeding. See Gropen et al. (1991a) for experimental demonstrations
that children use this kind of information.
3.1.2. Paired verbs that describe single events
Gleitman (1990: 16; see also Fisher et al. 1991: 380) suggests that there are pairs of verbs that overlap 100% in the situations
they refer to. For example, there can be no giving without receiving, no winning without beating, no buying without selling, and
no chasing without fleeing.
In fact, I doubt that pairs of verbs that refer to exactly the same set of situations exist (or if they do, they must be extremely
rare.) Such pairs would be exact synonyms, and there is good reason to believe that there are few if any exact synonyms (Clark
1987, Bolinger 1977, Miller and Fellbaum 1991). To take just these examples, I can receive a package even if no one gave it to
me; perhaps I wasn't home. John, running unopposed, can win the election, though he didn't beat anyone, and the second-place
Celtics beat the last-place Nets in the standings last year, though neither won anything. Several of my gullible college friends
sold encyclopedias door to door for an entire summer, but in many cases, no one bought any; I just bought a Coke from the
machine across the hall, but no one sold it to me. If John fled the city, no one had to be chasing him; Bill can chase Fred even if
Fred isn't fleeing but hiding in the garbage can.
I would certainly not claim that the learning of all these distinctions awaits the child's experience of the crucially disambiguating
situation. But a lot of it could, and more important, the in-principle arguments for an alternative that are based on putative total
overlap among verb meanings are not valid if meanings rarely overlap totally.
3.1.3. The subset problem
In some cases, Gleitman suggests, verb learning is impossible even if verbs do not totally overlap in the situations to which they
refer. If the situations referred to by Verb A are a superset of the situations referred to by Verb B, a
child who mistakenly thought that Verb B had the same meaning as Verb A could never reject that hypothesis by observing how
Verb B is used; all instances would fit the A meaning, too. The only disconfirming experience would be overt correction by
parents, and there is good reason to believe that children cannot rely on such corrections. This argument is parallel to one
commonly made in the acquisition of syntax (see, e.g., Pinker 1984, 1989; Wexler and Culicover 1980; Berwick 1985, Marcus
1993). For example, move, walk, and saunter are in a superset relation; any child that thought that saunter meant walk would do
so forever, because all examples of sauntering are also examples of walking.
But this is only a problem if the child is allowed to maintain synonyms in his or her vocabulary. If children do not like to keep
synonyms around (see Carey 1982, Clark 1987, Markman 1989, for evidence that they do not), then if they have a verb A (e.g.,
walk), and also a verb B (suanter) that seems to mean the same thing, they know something is wrong. They can look for
additional meaning elements from a circumscribed set to make the meaning of B more specific (like the manner of motion).
Pinker (1989: ch. 6) outlines a mechanism for how this procedure could work.
3.1.4. The poor fit of word to world
Gleitman suggests that even when a verb corresponds in principle to a unique set of situations, it is not, in practice, reliably used
in that set of situations, so the child has no way of figuring out a verb's meaning based on the situations it actually is used in.
For example, Landau and Gleitman showed that the blind child they studied learned haptic equivalents of the verbs look
(roughly, 'palpate' or 'explore haptically') and see (roughly, 'sense haptically'). But, they found, her mother didn't use look and
see more often when object was near than when object was far.
The point of this argument is unclear. Of course, the mother didn't necessarily use look when an object was near. Look doesn't
mean 'an object is near'; it means 'look'. The lack of correlation between some easily sensed property like nearness and use of a
verb is only relevant if the child is confined to considering lists of sensory properties as possible verb meanings. If children can
entertain the concept of looking, in something like the adults' sense (and Gleitman 1990: 4, assumes they can), it doesn't matter
how many sensory properties a verb fails to correlate with if those properties define only a crude approximation of the verb's
actual meaning. (This is a problem, for
example, with the conclusions drawn by Lederer et al. 1989.) All that matters is whether a child can recognize situations in
which that correct concept applies.
Gleitman (1990) then turns to a stronger argument. Even when one examines genuine instances of the concept corresponding to
a verb's meaning, one finds a poor correlation with instances of the parent uttering the verb. For example, in one study put was
found to be used 10% of the time when there was no putting going on. Similarly, open was used when there was no opening
37% of the time. As Gleitman notes, this is not a surprise when one realistically considers how parents interact with their
children. When a mother, arriving home from work, opens the door, she is likely to say, What did you do today?, not I'm
opening the door. Similarly, she is likely to say Eat your peas when her child is, say, looking at the dog, and certainly not when
the child is already eating peas. Indeed, Gleitman (1990: 15) claims that 'positive imperatives pose one of the most devastating
challenges to any scheme that works by constructing word-to-world pairings'.
The problem with this argument is that it, too, only refutes the nonviable theory of learning by associate pairing, in which verb
meanings are acquired via temporal contiguity of sensory features and utterances of the verb. It doesn't refute any reasonable
account, in which the child keeps an updated mental model of the current situation (created by multi-sensory object- and event-
perception faculties), including the likely communicative intentions of other humans. The child could use this knowledge, plus
the lexical content of the sentence, to infer what the parent probably meant. That is, chldren need not assume that the meaning of
a verb consists of those sensory features that are activated simultaneously with a parental utterance of the verb; they can assume
that the meaning of a verb consists of what the parent probably meant when he or she uttered the word. Thus imperatives, where
the child is not performing the act that the parent is naming, are not 'devastating'. Certainly when a parent directs an imperative
at a child and takes steps to enforce it, the child cannot be in much doubt that the content of the imperative pertains to the
parents' wishes, not the child's current activities.
3.1.5. Semantic properties closed to observation
Gleitman considers this the 'most serious challenge' to the idea that children learn verb meanings by attending to their
nonsyntactic contexts. Mental verbs like think, know, guess, wonder, know, hope, suppose, and understand involve private
events and states that have no external perceptual
correlates. Therefore children could not possibly infer their meanings observationally.
One problem I see with this argument is that although children may not be able to observe other people thinking and the
contents of others' beliefs, they can observe themselves thinking and the contents of their own beliefs. Similarly, children may
not know what their mothers are feeling, but they certainly know what they are feeling. And crucially, in many circumstances so
do their mothers. When a parent comments on what a child is thinking or feeling, that constitutes information about the
meanings of the mental state verbs they use.
Moreover, there surely are ways to infer a person's mental state from his or her behavior. Indeed, the standard way that humans
explain each other's behavior is to assume that it is caused by beliefs and intentions, which can only be inferred. This must be
how adults, during ordinary speech production, know when to use mental verbs based on their own mental state or guesses
about other's, even though there is no obvious referent event. There is no principled reason that children could not infer
meanings of new mental verbs using exactly the same information that adults employ to use existing mental verbs accurately.
3.1.6. Does a richer system of mental representation hurt or help the child?
Gleitman suggests that if children are not temporal contiguity associators if they can entertain hypotheses about causes, mental
states, goals, speakers' intentions, and so on their learning task is even harder. For the very richness of such representational
abilities yields a combinatorial explosion of logically possible hypotheses for the child to test.
This argument, however, seems to conflate two ideas: 'a rich set of hypotheses', and 'a set of rich hypotheses'. Gleitman correctly
points out that a rich (i.e., numerous) set of hypotheses is a bad thing if you're a learner. But replacing her associative-pairing
mechanism with a cognitively more sophisticated one results in a set of rich (i.e., structured) hypotheses, not a rich set of
hypotheses. And a set of rich hypotheses may in fact be fewer in number than a set of impoverished ones (e.g., combinations of
sensory features) in any given situation: creatures with complicated human brains see the world in only a few of the logically
possible ways. Presumably there are many more hypotheses for a learner who considers all subsets of patches of color and bits
of fur and whisker than there are for a learner with a sophisticated object-recognition system who obligatorily perceives these
patches as a single
'rabbit'. The whole point of a rich computational apparatus is to reduce the interpretations of a scene to the small number of
correct ones. This is exactly what is needed to help solve the learning problem.
3.2. Problems in understanding observational learning do not constitute evidence
for syntactic cueing
In much of her discussion, Gleitman attempts to place the burden of proof on anyone who believes that verb learning depends on
observation, by identifying many areas of ignorance and difficult puzzles regarding how it could work. Indeed, anyone who
thinks that a child can infer what a parent means from the situation and the nonverb content of the sentence must propose that a
heterogeneous collection of not-very-well specified routes to knowing indeed, the entirety of cognition is available for use in the
learning of verb meanings. Moreover, any such proposal must deal with the fact that even the most perceptive child and
predictable parent cannot be expected to be in perfect synchrony all the time.
Gleitman's discussion contains penetrating and valuable analyses that clearly define central research problems in how children
learn the meanings of words. But to support the alternative claim that verb subcategorization information is crucial, it is
necessary to show that no theory of inferring communicative intent could ever be adequate, not that we currently don't have one
that is fully worked out.
Moreover, Gleitman's attempt to shift the burden of proof ultimately fails, because she herself, at the end of the 1990 article and
in Fisher et al. (1991 and this volume), concedes (in response to some of the points I elaborate on in the next section) that some
form of observational learning in indispensable. She notes that information about manner of motion, type of mental state, nature
of physical change undergone, and so on, are simply not available in the syntax of subcategorization: 'the syntax is not going to
give the learner information delicate and specific enough, for example, to distinguish among such semantically close items as
break, tear, shatter, and crumble Luckily, these distinctions are almost surely of the kinds that can be culled from transactions
with the world of objects and events' (Gleitman 1990: 35).
This concession, however, completely redirects the force of Gleitman's criticisms of observational learning. For the meaning
components that Gleitman agrees are learned by observation are the very components that she, earlier in the article, claimed that
observation cannot acquire! For example, the fact that open is often used when opening is not taking place (e.g.,
imperatives), and that open is not used when opening is taking place (e.g., when someone enters the house), if it is relevant at
all, pertains in full force to the 'delicate and specific' aspects of the meaning of open (i.e., those aspects that differentiate it from
syntactically identical close). Similarly, parents surely cannot be counted on to use break or tear when and only when breaking
or tearing are taking place, respectively. Nonetheless, Gleitman concedes that the meanings specific to open, break, and tear are
somehow learned by observation. Thus it is not true, as she suggests (1990: 48), that 'semantically relevant information in the
syntactic structures can rescue observational learning from the sundry experiential pitfalls that threaten it'. There are pitfalls, to
be sure, but for most of the ones Gleitman originally discussed, syntax offers no rescue. What we need is a better, non-
associationist theory of observational learning.
3.3. Conclusions about Gleitman's arguments against observational learning
Gleitman convincingly refutes a classical associationist theory of semantic learning, in which word meanings are acquired via
temporal contiguity of sensory features of the scene and utterances of the word. She also convincingly shows that to explain
verb learning, we need a constrained representational system for verbs' meanings, principles constraining how one verb is related
to another in the lexicon, a learning mechanism that can construct and modify semantic representations over a set of uses of the
verb, and a greater understanding of how children interpret events, actions, mental states, and other speakers' communicative
intentions. But the arguments do not show that the full set of semantic cues to semantics is so impoverished in principle that the
child must use sets of syntactic subcategorization frames as cues instead, nor that syntactic cues provide just the information that
semantic cues fail to provide. Rather, Gleitman herself assumes that there exists some form of observational learning powerful
enough to acquire aspects of meaning that her own arguments show to be hard to acquire.3
3 Paul Bloom has pointed out to me that arguments similar to Gleitman's were originally made by Chomsky (1959) in his
review of Skinner's Verbal Behavior. For example, Chomsky showed that noun meanings could not in general be learned
by hearing the nouns in the presence of their referents. But Chomsky used examples like Eisenhower, a proper name,
whose meaning could not possibly be distinguished using syntactic cues from the thousands of other proper names that
must be learned (e.g., Nixon). This suggests that observation and syntactic cues are not the only possible means of
learning. See Bloom (this volume) for discussion of similar issues in the learning of noun meanings.
Frames
NP_ NP_NP NP_S NP_PP NP_NP-PP NP_NP-S
Roots
eat x x
move x x x x
boil x x
open x x
kill x
die x
think x x x
tell x x x
know x x x
see x x x x
look x x
Fig. 1
think of as the content of a verb. The frame meaning the fact that there must be an agent causing the physical change when the
verb is used in the transitive frame, and that the main event being referred to is the causation, not the physical change is just as
important in understanding the sentence, but it is not inherently linked to the verb root boil. It is linked to the transitive syntactic
construction, and would apply equally well to melt, freeze, open, and the thousands of other verb roots that could appear in that
frame. This is a crucial distinction.
4.2. Learning about a verb in a single frame
The first question that follows is, What can be learned from hearing a verb in one frame? Something, clearly, for frame
semantics and frame syntax are highly related. For example, it is a good bet that in A glips B to C, glip is a verb of transfer. The
regularities that license this inference are what linguists call linking rules (Carter 1988, Jackendoff 1987, 1990; Pinker 1989,
Gropen et al. 1991a). For example, if A is a causal agent, A is the subject of a transitive verb. Linking rules are an important
inferential mechanism in semantic bootstrapping (semantic cueing of syntax at the outset of language acquisition), in predicting
how one can use a verb once one knows what it means, and in governing how verbs alternate between frames (see Gropen et al.
1991a for discussion).
One might now think: If syntax correlates with semantics, why not go both ways? If one can infer a verb's syntax from its
semantics (e.g., in semantic bootstrapping), couldn't one just as easily infer its semantics from its syntax? As Gleitman puts it
(1990: 30):
'The syntactic bootstrapping proposal in essence turns semantic bootstrapping on its head. According to this hypothesis,
the child who understands the mapping rules for semantics on to syntax can use the observed syntactic structures as
evidence for deducing the meanings. The learner observes the real-world situation but also observes the structures in
which various words appear in the speech of the caretakers. Such an approach can succeed because, if the syntactic
structures are truly correlated with the meanings, the range of structures will be informative for deducing which word goes
with which concept.'
I believe this argument is problematic. The problem is that a correlation is not the same thing as an implication. 'Correlation'
means 'many X's are Y's or many Y's are X's or both'. 'Implication' means 'if X, then Y, though not necessarily vice-versa'. The
asymmetry inherent in an implication is crucial to understanding how it can be used predictively. For example, if I feed two
numbers (e.g., 3 and 5) into the sum-of function, the value must be 8. But if I guess which inputs led to a value of 8, I cannot
know that they were 3 and 5.
Linking rules are implications. They cannot straightforwardly be used in the reverse direction. If a verb means 'X causes Y to
shatter', then X is the subject of the verb. But if X is the subject of a verb, the verb does not necessarily mean 'X causes Y to
shatter'. This asymmetry is inherent to the design of language. A grammar is a mechanism that maps a huge set of semantic
distinctions onto a small set of syntactic distinctions (for example, thousands of kinds of physical objects are all assigned to the
same syntactic category 'noun'). And because this function is many-to-one, it is not invertible.
Now, if one casts away most of the meaning of a verb (e.g., the part about shattering), there may remain some abstract feature of
meaning that could map in one-to-one fashion to syntactic from. To the extent that that can be done, one could learn some
things about a verb form's meaning from the frame that the verb appears in. First, one can learn how many arguments the verb
relates in that form, as in the difference between The water boiled (one argument) and She boiled the water (two arguments), or
the difference between die (one argument) and kill (two arguments). Second, one can infer something about the logical type of
some of the arguments, like 'proposition' (if the verb appears with a clause) versus 'thing' (if the verb appears with an NP) versus
'place/path' (if the verb appears with a PP). That is, the syntax can help one distinguish between the meaning of find in find the
book and find
that the book is interesting; between shoot the man and shoot at the man; perhaps even between think, eat, and go. Third, the
syntax of a sentence can help identify which argument can be construed as the agent (viz., the subject) in cases where the
inherent properties of the arguments (such as animacy) leave it ambiguous, for example, in kill versus is killed by, and chase
versus flee. Similarly, syntactic information can distinguish the experiencer from the stimulus in 'psych-verbs' with ambiguous
roles, such as Bill feared Mary and Mary frightened Bill. Fourth, syntactic information can help identify which argument is
construed as 'affected' (viz., the syntactic object) in events where several entities are being affected in different ways. For
example, in load the hay and load the wagon, on cognitive grounds either the hay or the wagon could be interpreted as
'affected': the hay, because it changes location, or the wagon, because it changes state from not full to full (similar
considerations apply to the pair of verbs fill and pour. The listener has to notice which of the two arguments (content or
container) appears as the direct object of the verb to know which one to construe as the 'affected' argument for the purpose of
understanding the verb in that frame. Gleitman and her colleagues give many examples of these forms of learning, which I have
called 'reverse linking' (see Pinker 1989 and Gropen et al. 1991a, b for relevant discussion and experimental data).
Unfortunately, while one can learn something about a verb form's meaning from the syntax of the frame it appears with,
especially when there are a small number of alternatives to select among, one cannot learn much, relative to the full set of
English verbs, because of the many-to-one mapping between the meanings of specific verbs and the frames they appear in. For
example, one cannot learn the differences among slide, roll, bounce, skip, slip, skid, tumble, spin, wiggle, shake, and so on, or
the differences among hope, think, pray, decide, say, and claim; among build, make, knit, bake, sew, and crochet; among shout,
whisper, mumble, murmur, yell, whimper, whine, and bluster; among fill, cover, tile, block, stop up, chain, interleave, adorn,
decorate and face, and so on. Indeed, Gleitman herself (1990: 35) concedes this point in the quote reproduced above.
In sum, learning from one frame could help a learner distinguish frame meanings, that is, what the water boiled has in common
with the ball bounced and does not have in common with I boiled the water. But it does not distinguish root meanings, that is,
the difference the water boiled and the ball bounced. And the root meanings are the ones that correspond to the 'content' of a
verb, what we think of as 'the verb's meaning', especially when a given verb root appears in multiple frames.
The frame meanings (partly derivable from the frame) are closer to the 'perspective' that one adopts relative to an event: whether
to focus on one actor or another, one affected entity or another, the cause or the effect. Indeed in some restricted cases,
differences in perspective are most of what distinguishes pairs of verb roots, such as kill and die, pour and fill, or Gleitman's
example of chase and flee. Gleitman (1990) and Fisher et al. (this volume) adopt a metaphor in which the syntax of a verb frame
serves as a 'zoom lens' for the aspects of the event referred to by the verb. This metaphor is useful, because it highlights both
what verb syntax can do and cannot do. The operation of lens when aimed at a given scene gives the photographer three degrees
of freedom, pan, tilt, and zoom, which have clear effects on the perspective in the resulting picture. But no amount of lens
fiddling can fix the vastly greater number of degrees of freedom defined by the potential contents of the picture whether the lens
is aimed at a still life, a nude, a '57 Chevy, or one's family standing in front of the Grand Canyon.
So I have no disagreement with Gleitman's arguments that a syntactic frame can serve as a zoom lens, helping a learner decide
which of several perspectives on a given type of event (discerned by other means) a verb forces on a speaker. But because this
mechanism contributes no information about a verb's content, it cannot offer significant help in explaining how children learn a
verb's content despite blindness, nor in explaining how children learn a verb's content despite the complexity of the relationship
between referent event and parental usage.
4.3. Learning about a verb from its multiple frames
Gleitman recognizes the limitations of learning about a verb's meaning from a single frame:
'To be sure, the number of such clause structures is quite small compared to the number of possible verb meanings: It is
reasonable to assume that only a limited number of highly general semantic categories and functions are exhibited in the
organization that yields the subcategorization frame distinctions. But each verb is associated with several of these
structures. Each such structure narrows down the choice of interpretations for the verb. Thus these limited parameters of
structural variation, operating jointly, can predict possible meaning of an individual verb quite closely.' (Gleitman 1990:
3032)
The claim that inspection of multiple frames can predict a verb's meaning 'quite closely' appears to contradict the earlier quote in
which Gleitman notes that syntactic information in general is not 'delicate and specific enough to
distinguish among semantically close items'. To see exactly how close the syntax can get the learner to a correct meaning, we
must ask, 'What can be learned from hearing a verb in multiple frames?' In particular, can a root meaning the verb's content be
inferred from its set of frames, and if so, how?
Unfortunately, though Gleitman and her collaborators give examples of how children might converge on a meaning from several
frames, almost always using the problematic example of see (see fn. 1), they never outline the inferential procedure by which
children do so in the general case. In Fisher et al. (this volume) they suggest that the procedure is simply the zoom lens (single-
frame) procedure applied 'iteratively'. They give the procedure as follows: 'In assigning a gloss to the verb, satisfy all semantic
properties implied by the truth conditions of all its observed syntactic frames'. But this cannot be right, for reasons they mention
in the next paragraph. The truth conditions (what I have been calling 'frame meaning') that belong to a verb form in one frame
do not belong to it in its other frames. So satisfying all of them will not give the root meaning or verb's content. If we interpret
'satisfying all semantic properties' as referring to the conjunction of the frame meanings, we get the meaning of its most
restrictive frame, which will be incompatible with its less restrictive frames. For example, the truth conditions for transitive boil
include the presence of a causal agent. But presence of a causal agent cannot be among the semantic properties of boil across the
board, for its intransitive version (The water boiled) is perfectly compatible with spontaneous boiling in the absence of any
agent. But if we interpret 'satisfying all semantic properties' to be the disjunction of frame meanings, the aggregation leads to
virtually no inference at all. Consider again the frame involved in The water boiled. This intransitive frame tells the learner that
the meaning of boil in the frame consists of a one-place predicate. Now consider a second frame, the one involved in I boiled
the water. This transitive frame tells you (at most) that the meaning of boil in the frame consists of causation of some one-place
predicate. What do they have in common? 'One-place predicate'. Which is not very useful. It says nothing whatsoever about the
root meaning of boil, that is, that it pertains to liquid, bubbles, heat, and so on.
This is a problem even for verbs that appear in many frames, for which the syntax would seem to provide a great deal of
converging information (see Levin 1985, Pinker 1989). For example sew implies an activity. Sew the shirt implies some activity
performed on an object. Sew me a shirt implies an activity creating an object to be transferred to a beneficiary. Sew a shirt out
of the rags implies an activity transforming material into some object. What do these frame meanings have in common? Only
'activity'. Not 'sewing'.
The conclusion is clear: you can't derive a verb's root meaning or content by iterating the zoom lens procedure over multiple
frames and taking the resulting union or intersection of perspectives.
4.3.1. Can anything be learned from multiple frames?
I do not wish to deny that there is some semantic information implicit in the set of frames a verb appears with, nor that an astute
learner could not, in principle, use this information. The example Gleitman uses most often, see, has clear intuitive appeal. But
which general procedure is driving the inference about see and other such cases? I can think of two.
According to Gleitman, a set of argument frames implicitly poses the question, 'What notion is compatible with involving a
physical object, involving a proposition, and involving a direction?' The child deduces the response 'seeing'.4 In other words,
this is a kind of cognitive riddle-solving (Pinker 1989); it involves all of a learner's knowledge, beliefs, and cognitive inferential
power.
I am not arguing either that children can or cannot solve such riddles. I am simply pointing out what would be going on if they
could do so. In particular, note what they would not be doing. They would not be relying on any grammatical principle, and
hence would not be enjoying the putative advantages of universal constrained linguistic principles to drive reliable inferences.
That is, if guessing a verb's meaning from its set of frames succeeds at all, it does so by virtue of the child's overall cognitive
cleverness, and hence could suffer from the same unreliability of overall cleverness as inferring a speaker's likely meaning from
the knowledge of the situation. It is not a straightforward mechanical procedure that succeeds because the frames 'are abstract
surface reflexes of the meanings' (Landau and Gleitman 1985: 138)
4 Actually, the question and answer should be stated in terms of 'a family of notions', not 'notion', because verbs like see
that can take either objects or clausal complements do not exhibit a single content meaning across these frames: 'see NP'
does not mean the same thing as 'see S'. The latter is not even a perception verb: I see that the meal is ready does not
entail vision. (Clearly not, because you can't visually perceive a proposition.) Similarly, I feel that the fabric is too
smooth does not entail palpation; it's not even compatible with it. And Listen! I hear that the orchestra is playing is quite
odd. (These observations are due to Jane Grimshaw.) Clearly there is a commonality running through each of these sets,
but it is a metaphorical one; 'knowing' can be construed metaphorically as a kind of 'perceiving'.
or because 'much of the [semantic] information can be read off from the subcategorization frames themselves by a general
scheme for interpreting these semantically' (Landau and Gleitman 1985: 142). Moreover, the premises that would drive this
riddle-solving are far more impoverished than the premises derived from inferring a speaker's meaning from the context. The
latter can include any concept the child is capable of entertaining (sewing, boiling, and so on); the former are restricted to a
smaller set of abstract concepts like causability and taking a propositional argument.
There is a second way that sets of syntactic frames could assist semantic learning. That is via narrow argument-structure
alternations. Often the verbs that can appear with a particular set of syntactic frames have surprisingly specific meanings (see
Levin 1985, in press; Pinker 1989, for reviews). For example, in English, the verbs that can appear in the double-object form
but not the to-object form are verbs of implied deprivation like envy, bet, and begrudge (e.g., I envied him his good
looks/*envied his good looks of him). Similarly, verbs of manner of motion can alternate between causative-transitive and
inchoative-intransitive forms (e.g., I slid the puck/The puck slid), but verbs of direction of motion cannot (e.g., I brought the
book/*The book brought). An astute learner, in principle, could infer, from hearing I glipped him those things and from failing
to hear I glipped those things of him, that glip involves some intention or wish to deprive someone of something. But note that
these regularities are highly specific to languages and to dialects within a language. (For example, *I suggested her something is
grammatical in Dutch, and *I pushed him the box is grammatical in some dialects of English; see Pinker 1989.) Exploiting them
requires first having acquired these subtle subclasses and their syntactic behavior in the dialect, presumably by abstracting the
subclasses from the semantics and syntax of individual verbs, acquired by other means. This kind of inference depends on a
good deal of prior learning of verbs' meanings in a particular language, and thus is most definitely not a case of 'bootstrapping'
performed by a child to acquire the meanings of the verbs to begin with.
In general, learning a verb's content or root meaning from its set of syntactic frames ('syntactic bootstrapping') is fundamentally
different from learning its perspective or frame meaning from a single frame ('zoom lens'). Thus I disagree with Gleitman's
(1990) suggestion that they are versions of a single procedure, or Fisher et al.'s suggestion (this volume) that one is simply the
iteration of the other. There is a clear reason why they are different. While there may be a universal mapping between the
meaning of a frame and the syntax of that frame (allowing the lens to zoom), there is no universal
mapping between the meaning of a root and the set of frames it occurs in (see Talmy 1985 and Pinker 1989 for reviews). For
example, universal linking rules imply, roughly, that an inchoative verb can appear in an intransitive frame, and a causative verb
can appear in a transitive frame. And it's clearly possible for some roots to be able to have both causative and inchoative
meanings (and hence to appear in both frames). But it's an accident of English that slide appears in both frames, but come and
bring appear in one each. Thus the kinds of learning that are licensed by universal, reliable, grammatical linking regularities are
restricted to differences in perspective. A verb's content is not cued by any one of its syntactic frames, and at best might be
related to its entire set of frames in a tenuous, language-specific way.
content (flexing). The content (flexing) was acquired through observation, not syntax; it was depicted on the video screen, and
the child was watching it. Thus at best the children were demonstrating use of the zoom lens procedure; there was no
opportunity for multiple frames to cue the verb's content. (At worst, the children were not acquiring any information about the
verb at all, but were ignoring the verb and merely responding to the transitive and intransitive sentence frames themselves when
directing their attention.)
5.2. Naigles (1990)
A second study appears to show children learning a verb's content from a single frame. In Naigles (1990), 24-month-olds first
saw a video of a rabbit pushing a duck up and down, while both made large circles with one arm. One group of children heard a
voice saying 'The rabbit is gorping the duck'; another heard 'The rabbit and the duck are gorping'. Then both groups saw a pair
of screens, one showing the rabbit pushing the duck up and down, neither making arm circles, the other showing the two
characters making arm circles, neither pushing down the other. In response to the command 'Where's gorping now? Find
gorping!', the children who heard the transitive sentence looked at the screen showing the up-and-down action, and the children
who heard the intransitive sentence looked at the screen showing the making-circles action.5
What have the children learned, and from what source? Clearly, they learned most of the verb root's content that gorp means
pushing and/or making circles, but not sliding or boiling or killing or dancing not by attending to the syntax, but by observing
the scene; that's what the video depicted. Without the video, the children would have learned little if anything. What the
children learned from the sentence syntax was, once again,
5 Unfortunately, the Naigles experiment had a confound, commented on by Gleitman, 1990, in footnote 13 on p. 43. The
difference between the gorps could have been cued by the conjoined versus singular subjects, disregarding the verb syntax
entirely. That is, the difference between The rabbit is gorping and The rabbit and the duck are gorping, with identical
verb syntax, could have been sufficient for children to pick out a screen to look at. In the first case, the children would
look at what the rabbit alone was doing; in the second case, they could look at what the rabbit and duck were doing
simultaneously. This would be sufficient to direct their attention in testing to the push-down screen in the first case, the
make-circles screen in the second. This confound, a good example of the difference between using linguistically-
conveyed content as opposed to verbal syntax, could be eliminated by using the sentences of Hirsh-Pasek et al. The voice-
over would say either 'the rabbit is gorping the duck' or 'the rabbit is gorping near the duck'. Children in the first
condition, if they could use verb syntax as a cue, would still find gorping in the up-and-down screen, children in the
second would find it in the make-circles screen.
the meaning of the verb in its particular frame. The sentence played to the first group of children told them that gorp is a two-
place predicate, presumably a causative. It means either 'cause to pop up and down' or 'cause to make arm circles'. The sentence
played to the second group told them that gorp is a one-place predicate, presumably an activity. It means either 'make arm
circles' or 'pop up and down'. Once again, the syntactic frame cued only the coarse information of how many arguments were
immediately related by the verb; the rest came from observation.
One might object at this point that Naigles's experiment has demonstrated what I have been arguing is impossible: children
appear to have learned about a verb's content (in the case, up-and-down versus make-circles) from the sentence in which it is
used, and could not have learned that content from observation alone. But this is misleading. Success depended completely on
the fact that Naigles engineered an imaginary world in which perspective and content were confounded, so that when children
were using syntax to choose the right perspective, they got the right content, too, by happy accident. Note there is no
grammatical constraint forcing or preventing either 'popping up and down' or 'making arm circles' from being exclusively
transitive, exclusively intransitive, or alternating. Nor is there any real-world constraint that could cause creatures to make arm
circles and to pop up and down in tandem. But Naigles's teaching example exemplified both such constraints: only popping up-
and-down was causable, and such causation took place in the presence of arm circles. It was only these artificial contingencies
that made the forms learnable by syntax rather than observation. Consider what would have happened if the children had been
shown a scene depicting circles without pushing up and down or vice-versa. In that case, observation would have been sufficient
to distinguish the two actions, with no syntax required. Now consider what would happen if the children had been shown an
arm-circling rabbit causing the duck simultaneously to pop up and down and to make arm circles. This is no more or less
bizarre than the conjunction that Naigles did show children, where causation of popping up and down was simultaneous with
uncaused arm-circling. In that case, neither the sentence The duck is gorping the rabbit nor the sentence The duck and rabbit
are gorping would have distinguished the two kinds of motion. This shows that syntax is neither necessary nor sufficient to
distinguish the alternative content meanings of gorp, across all the different scenes in which it can be used; observation, in
contrast, is sufficient. In sum, Naigles simply selected a contrived set of exposure conditions that penalized observation while
letting syntax lead to the right answer by coincidence.
there is no reason to think that the retrieval cue that the experimenter provides now, for existing knowledge, was ever used as a
learning cue, in order to acquire that knowledge originally. Substitute 'transitive syntax' and 'intransitive syntax' for 'p.' and 'f.'
and one has the Fisher et al. experiment (this volume) a test of whether children can use transitivity correctly as a retrieval cue
for previously learned words when the content of the words is available observationally.
5.4. What experiment would show syntactic cueing of verb semantics?
There is an extremely simple experiment that could test whether children can learn a verb root's semantic content from multiple
frames. There could be no TV screen, or content words, just syntactic frames. For example, children would hear only She
pilked; She pilked me something; She pilked the thing from the other things; She pilked the other things into the thing; She
pilked one thing to another, and so on. If children can acquire a verb's content from multiple frames, they should be able to infer
that the verb basically means 'create by attaching' (Levin 1985). (Of course, one would have to ensure that the child was
learning a new meaning and not simply using the frames to retrieve an existing word, for reasons mentioned in the preceding
subsection.) Lest one think that this set of inputs is way too impoverished and boring for a child to attend to, let alone for the
child to draw semantic conclusions from, in the absence of perceiving some accompanying real-world event, recall that this is
exactly the situation that Landau and Gleitman assume the blind child is in. It would be an interesting finding if children (or
adults) could learn significant aspects of a verb's content from syntactic cues, as this experiment would demonstrate. If Gleitman
and her collaborators are correct, they should be able to do so.
6. Conclusions
I have gone over Gleitman's arguments against the sufficiency of learning verb semantics by observation of semantic cues in the
situations in which a verb is used, and her arguments for the utility and use of syntactic sub-categorization information. I
suggest that a careful appraisal of these arguments leads to the following conclusions.
As Gleitman shows, temporal contiguity between sensory features and verb usages cannot explain the acquisition of verb
meaning. What this suggests is
that the explanation of verb learning requires a constrained universal apparatus for representing verb meanings, principles
governing the organization of the lexicon, a perceptual and conceptual system acute enough to infer which elements of verb
meanings an adult in a situation is intending to refer to, and a learning procedure that can compare hypothesized semantic
representations across situations.
Gleitman has also convincingly demonstrated that single syntactic frames provide information about aspects of the meaning of
the verb in that frame (the 'zoom lens' hypothesis). This information is largely about the perspective that a verb forces a speaker
to take with regard to an event. It includes the number of arguments, the type of argument, a focus on the cause or effect, and
the choice of agent and affected entity when more than one is cognitively possible. As Gleitman points out, these are exactly the
kinds of information that are difficult or impossible to infer from observing the situations in which a verb is used.
I disagree, however, that multiple syntactic frames provide crucial information about the semantic content of a verb root across
its different frames (what Gleitman calls 'syntactic bootstrapping'). There is no syntactically-driven general inferential scheme by
which such learning could work; there is no empirical evidence that children use it; and it does not make up for any of the
problems Gleitman notes in understanding how children learn about a verb's meaning from observing the situations in which it is
used. Indeed, the suggestion is incompatible with one of the basic design features of human language: a vast set of concepts is
mapped onto a much smaller set of grammatical categories.
References
Berwick, R. C., 1985. The acquisition of syntactic knowledge. Cambridge, MA: MIT Press.
Bloom, P., 1994. Possible names: The role of syntax-semantics mappings in the acquisition of nominals. Lingua 92, 297329
(this volume).
Bolinger, D., 1977. Meaning and form. London: Longman.
Brown, R., 1957. Linguistic determinism and the part of speech. Journal of Abnormal and Social Psychology 55, 15.
Carey, S., 1982. Semantic development: The state of the art. In: E. Wanner, L.R. Gleitman (eds.), Language acquisition: The
state of the art, 347389. New York: Cambridge University Press.
Carter. R.J., 1988. On linking: Papers by Richard Carter. (Lexicon Project Working Paper 25). B. Levin, C. Tenny (eds.).
Cambridge, MA: MIT Center for Cognitive Science.
Chomsky, N., 1959. A Review of B.F. Skinner's 'Verbal behavior'. Language 3, 2658.
Chomsky, N., 1965. Aspects of the theory of syntax. Cambridge, MA: MIT Press.
Clark, E. V., 1987. The principle of contrast: A constraint on language acquisition. In: B. MacWhinney (ed.), Mechanisms of
language acquisition. Hillsdale, NJ: Erlbaum.
Dowty, D., 1991. Thematic proto-roles and argument selection. Language 67, 547619.
Fisher, C., H. Gleitman and L. R. Gleitman, 1991. On the semantic content of subcategorization frames. Cognitive Psychology
23, 331392.
Fisher, C., G. Hall, S. Rakowitz and L. R. Gleitman, 1994. When it is better to receive than to give: Syntactic and conceptual
constraints on vocabulary growth. Lingua 92, 333375 (this volume).
Gleitman, L.R., 1990. The structural sources of verb meaning. Language Acquisition 1, 355.
Grimshaw, J., 1979. Complement selection and the lexicon. Linguistic inquiry 10, 279326.
Grimshaw, J., 1981. Form, function, and the language acquisition device. In: C.L. Baker, J.J. McCarthy (eds.), The logical
problem of language acquisition 165182. Cambridge, MA: MIT Press.
Grimshaw, J., 1990. Argument structure. Cambridge, MA: MIT Press.
Gropen, J., S. Pinker, M. Hollander and R. Goldberg, 1991a. Affectedness and direct objects: The role of lexical semantics in
the acquisition of verb argument structure. Cognition 41, 153195.
Gropen, J., S. Pinker, M. Hollander and R. Goldberg, 1991b. Syntax and semantics in the acquisition of locative verbs. Journal
of Child Language 18, 115151.
Hirsh-Pasek, K., H. Gleitman, L.R. Gleitman, R. Golinkoff and L. Naigles, 1988. Syntactic bootstrapping: Evidence from
comprehension. Paper presented at the 13th Annual Boston University Conference on Language Development.
Jackendoff, R.S., 1987. The status of thematic relations in linguistic theory. Linguistic Inquiry 18, 369411.
Jackendoff, R.S., 1990. Semantic structures. Cambridge, MA: MIT Press.
Katz, B., G. Baker and J. Macnamara, 1974. What's in a name? On the child's acquisition of proper and common nouns. Child
Development 45, 269273.
Landau, B. and L.R. Gleitman, 1985. Language and experience. Cambridge, MA: Harvard University Press.
Lederer, A., H. Gleitman and L.R. Gleitman, 1989. Input to a deductive verb acquisition procedure. Paper presented to the 14th
Annual Boston University Conference on Language Development.
Levin, B., 1985. Lexical semantics in review: An introduction. In: B. Levin (ed.), Lexical semantics in review. Lexicon Project
Working Papers # 1. Cambridge, MA: MIT Center for Cognitive Science.
Levin, B., in press. English verb classes and alternations: A preliminary investigation. Chicago, IL: University of Chicago Press.
Marcus, G.F., 1993. Negative evidence in language acquisition. Cognition 46, 5385.
Markman, E.M., 1989. Categorization in children: Problems of induction. Cambridge, MA: MIT Press/Bradford Books.
Markman, E.M., 1990. Constraints children place on word meanings. Cognitive Science 14, 5777.
Miller, G.A., 1991. The science of words. New York: Scientific American Library.
Miller, G.A. and C. Fellbaum, 1991. Semantic networks of English. Cognition 41, 197229.
Moravcsik, J.M.E., 1981. How do words get their meanings? The Journal of Philosophy 78, 524.
Naigles, L., 1990. Children use syntax to learn verb meanings. Journal of Child Language 17, 357374.
Pinker, S., 1982. A theory of the acquisition of lexical interpretive grammars. In: J. Bresnan (ed.), The mental representation of
grammatical relations. Cambridge, MA: MIT Press.
Pinker, S., 1984. Language learnability and language development. Cambridge, MA: Harvard University Press.
Pinker, S., 1987. The bootstrapping problem in language acquisition. In: B. MacWhinney (ed.), Mechanisms of language
acquisition. Hillsdale, NJ: Erlbaum.
Pinker, S., 1988. Resolving a learnability paradox in the acquisition of the verb lexicon. In: M. Rice, R. Schiefelbusch (eds.),
The teachability of language. Baltimore, MD: Brookes.
Pinker, S., 1989. Learnability and cognition. The acquisition of argument structure. Cambridge, MA: MIT Press/Bradford
Books.
Quine, W.V.O., 1960. Word and object. Cambridge, MA: MIT Press.
Talmy, L., 1985. Lexicalization patterns: Semantic structure in lexical forms. In: T. Shopen (ed.), Language typology and
syntactic description, Vol. III: Grammatical categories and the lexicon. New York: Cambridge University Press.
Talmy, L., 1988. Force dynamics in language and cognition. Cognitive Science 12, 49100.
Wexler, K. and P. Culicover, 1980. Formal principles of language acquisition. Cambridge, MA: MIT Press.
Lexical reconciliation*
Jane Grimshaw
Department of Linguistics and Center for Cognitive Science, Rutgers University, 18 Seminary Place,
New Brunswick, NJ 08903, USA
In the context of current research in lexical representation there are two fundamental ideas about lexical learning. One holds that
the semantics of a word is critically involved in the acquisition of its syntax, another holds that the syntax of the word is
critically involved in the acquisition of its semantics. This paper examines the positive results and limitations of the two views,
and proposes a reconciliation of the two in which a hypothesized meaning based on observation is the input to the linguistic
mapping principles. These derive a predicted s-structure, checked against observed syntax. Learning occurs when the two
match.
1. Introduction
The now standard lines of reasoning concerning universal properties of language and language learnability lead inexorably to the
conclusion that the lexicon that is eventually mastered by a language learner is every bit as much a function of the system
mastering it as it is of the linguistic data the learner is exposed to. This represents something of a change in earlier days of
generative grammar, the lexicon was often defined as a collection of idiosyncratic information. The assumption was that lexical
information was simply what was unpredictable and unprincipled, and what therefore must be learned about the language. Since
the lexicon was in no way determined by UG, its properties depended only on what was observed.
In the past ten or fifteen years, we have achieved greater understanding of the structure of lexical systems, as a result of work on
the representation of
* This paper grew out of joint work with Steve Pinker, which was presented at the Boston University Language
Development Conference in 1990, and owes much to his input and influence. Many discussions with Lila Gleitman
helped to clarify the arguments immensely, as did conversations with Barbara Landau. Thanks are also due to Alan
Prince, the Rutgers-Penn lexicon group, participants at the University of Pennsylvania Lexical Learning Conference in
1992, and the Workshop on Thematic Roles in Linguistic Theory and Language Acquisition held at the University of
Kansas in 1993, and an anonymous reviewer.
lexical items, and the grammatical processes which involve them (Grimshaw 1990, Jackendoff 1990, Levin and Rappaport 1992,
Pinker 1989, Williams, in press, Zubizarreta 1987). With this has come the recognition that lexical systems, just like other
linguistic domains, are subject to universal constraints and a high degree of language-internal consistencey. (For a particularly
interesting perspective on the relationship between the theory of words and the notion of lexical listing, see Williams 1992.) A
change in perspective has resulted. If the endpoint of the acquisition process is highly principled, then the acquisition process
itself must be interesting.
The acquisition issues that have been addressed in recent research seem to fall into two subgroups. First, especially since Baker
(1979) there has been an important series of studies which attempt to uncover the principles behind the acquisition of patterns of
alternation shown by verbs: including Bowerman (1974, 1990), Choi and Bowerman (1991), Grimshaw (1989), Gropen et al.
(1991), Pinker (1989), Randall (1990). These questions have been the concern in particular of theories which attempt to make
use of mapping between semantics and syntax. Second, some recent work has investigated the question of how the relatively
gross meaning of a word can be acquired (Landau and Gleitman 1985, Gleitman 1990, Naigles 1990, Fisher et al. 1991),
looking into theories which make use of mapping of some kind between syntax and meaning. Research of the first kind asks
how learners determine that give participates in the 'dative alternation' (We gave a book to someone, We gave someone a book)
whereas donate does not (We donated a book to someone, *We donated someone a book). Research of the second kind asks how
learners determine that a particular morpheme means 'give', rather than 'hold'. Whether these are ultimately different questions,
with different answers, remains to be seen.
lexical semantics as elsewhere. The theory of lexical semantics plays the same role in constraining possible word meanings and
the range of solutions that the child will entertain, as the theory of Universal Grammar plays in syntax. For example, it seems to
be the case that while a causative meaning can be expressed either by a single morpheme, as in (1a), or by a phrase, as in (1b), a
causative-of-a-causative meaning can never be expressed by a single morpheme, even though it can be expressed by a phrase, as
in (c).
(1a) The chemical killed the grass. cause-to-die
(1b) The chemical caused the grass to die. cause-to-die
(1c) Overapplication caused the chemical to kill the grass. cause-to-cause-to-die
This restriction is a consequence of the principles which organize lexical semantic structure and argument structure, some of
which are discussed in the work cited above. Thus linguistic theory provides a constrained representational system which limits
the solutions available to the learner of the lexicon, along lines familar from other domains.
Starting from the simplest point of view, for each morpheme in the lexicon, there are two sorts of information to be determined
by a learner. These are the syntax of the lexical item, which includes its syntactic category and its subcategorization properties,
however they are expressed, and its semantics. Minimally, for example, a learner must determine that hope is a verb which
occurs with a sentential complement, and that it expresses a relation of a certain kind between an individual, the hoper, and a
proposition or situation (We hope that it won't rain tomorrow, We hope for rain tomorrow).
Much current work on lexical theory attempts to uncover the principles of UG which constrain the relationships among various
aspects of the lexical representation of predicates and between the lexical representation and the syntactic realization. (On the
latter see especially Baker 1988, Pesetsky 1992.)
The picture that emerges from this research program is one in which the range of syntactic configurations associated with a verb
is highly predictable from its semantics, once parametric syntactic variation is taken into account. The lexical semantic
representation of a predicate determines its argument structures, its argument structures plus parametric properties of phrase
structure entail the d-structure configurations of the predicate, and the
d-structure plus further parametric effects entail the s-structure configurations the predicate appears in. The existence of
principles of these kinds guarantees that certain relationships will obtain between the s-structure of a verb, noun, or adjective,
and its meaning.
This basic point is fundamental to almost all current ideas about lexical learning. However, the principled character of the
relationship between meaning and syntax is sometimes questioned (for a recent example see Bowerman 1990), so it is important
to clarify the nature of the claim that the relationship is indeed principled. The key point is that the notion of mapping is a
highly abstract one, in two important respects.
First, mapping from lexical semantics is onto d-structure, not s-structure, although all that the language learner and the linguist
can observe is the s-structure. S-structure of course is affected by other considerations also, some cross-linguistically variable,
and thus reflects linking only quite indirectly. In fact, it is not possible to draw conclusions about linking directly from d-
structure either, since d-structure itself is subject to parametric variation. All predictions of a mapping theory are made modulo
parametric properties, both lexical and grammatical, of the language.
Second, mapping does not take as its input an event, but a semantic representation. That is to say, the theory of UG does not say
anything at all about how events are described syntactically. What it does say something about is how particular semantic
representations are expressed syntactically (modulo the parametric properties of the language, as just discussed). To say, then,
that a verb which 'means x' will map into the syntax in a particular way, is to say that a verb with a particular lexico-semantic
representation will have a particular syntax. Another verb which superficially appears to have the same meaning, in that it
describes (at least) approximately the same events, may have a very different syntax. A particularly neat example of this type is
the contrast discussed by Levin and Rappaport Hovav (1992) between the English blush and Italian arrossire. Both are used to
describe the same event, and in this sense they mean the same, yet the Italian verb has unaccusative syntax while the English
verb has unergative syntax. Levin and Rappaport Hovav show that the semantic representations of the verbs are different:
arrossire is a change of state predicate, while blush is not. Thus despite their similarity in terms of the events they describe, the
verbs have different lexical semantic representations, and this is what determines their syntax. The principles which relate one
level of representation to another relate the lexical semantics of a predicate to its (lexical) syntax: they do not relate events in the
world to (lexical) syntax.
The input to the learning system is a pair of an s-structure, and a representation of a world situation. This representation will be
an interpretation of a perceived event, for example, based on observation, previous observations, surrounding discourse, etc.
How then does a learner arrive at the lexical representation for the verb in the sentence, given this kind of learning situation?
Two ideas form the focus of a large amount of current research on the question.
The first idea is that analysis of the situation can make it possible to determine the meaning of a word, and that the meaning of a
word in turn makes it possible to determine its lexical syntax. This type of proposal has been very fully developed by Pinker
(1989), in an important extension of earlier work, which concentrated on the question of how meaning might play a role in
allowing the child to perform an initial syntactic analysis upon which the syntactic system of the language could be erected, and
on ways in which the meaning of a verb could make aspects of its syntax predictable to a learner; Grimshaw (1981), Pinker
(1984, 1989). The second idea which has emerged on how linguistic relations play their part in acuisition, is that analysis of the
sentence makes it possible to determine (parts of) its semantics (see Landau and Gleitman 1985, Fisher et al. 1991, Fisher et al.,
this volume, Gleitman 1990). These two ideas about mapping are sometimes contrasted under the rubrics 'semantic
bootstrapping' and 'syntactic bootstrapping' (but see Pinker, this volume, for discussion).
'Semantic bootstrapping' has one extremely important property: it makes direct use of the principles mapping from lexical
meaning to lexical syntax discussed earlier. I will not illustrate this in detail the works referred to above contain many examples.
Nonetheless, it would probably be incorrect to maintain that an analysis of the situation alone is the input to the learner. Because
of the indirect relationship between events and semantic representations of verbs, discussed in the previous section, it is not easy
for an observer to determine a verb's meaning from an event. Gleitman and colleagues (see Gleitman 1990 and Fisher et al., this
volume, for examples) have looked into this point, showing that events typically have multiple construals, hence many
verb meanings will be compatible with most events. As an example, consider a verb which has a caused change of state
interpretation, with a semantic representation of the form: 'x causes y to become'. Such verbs often have a change of state
counterpart, which can utilize the same morpheme, as with melt, or a different morpheme, as in kill/die. The change-of-state
version has a semantic representation of the form: 'y becomes'.
(3) We melted the ice/the ice melted
We killed the dragon/the dragon died
The problem for word learning from world situations is that circumstances that can be described by one member of the pair of
verbs, can often be described equally well by the other: see the discussion of give and receive in Fisher et al. (this volume) for
example. The causative entails the change of state, hence whenever the causative description is true of some state of affairs in
the world, the change-of-state description is true also (although not vice versa). So if a learner guesses the causative (kill) when
in fact the verb has a change-of-state meaning (die), and if word learning is based on world observation alone, recovery is only
possible if some situation tells the learner that even when there is no possibility of construing the event as involving an agent,
die is still used. If a learner chooses the change-of-state meaning ('die') for a morpheme which is in fact the causative ('kill'),
there is no way to correct the mistake, because there is no world situation where the inchoative is inappropriate and the causative
appropriate. The correct meanings must therefore be assigned to the members of the pairs by some other means, since learning
by world observation seems at worst to be impossible, and at best to require that juvenile speakers have access to a
disambiguating situation for every pair of meanings like this.
But if observation of the world is not enough, other means must be available for lexical learning. This is where key information
contained in the sentence in (2) offers the most promising avenue for successful word learning, very roughly along the lines
sketched by Gleitman (1990). The fundamental idea is that the linguistics of words itself makes learning words possible. It is the
language, and not the world, that supports the process of word learning. Returning to the kill/die problem, although the
situations in which the two morphemes are used are not good sources of information about the meanings of the verbs, the two
are linguistically different in a crucial respect, and this linguistic difference makes word learning possible. What I will for now
term, loosely, the 'transitivity' of kill and causative melt contrasts with the
'intransitivity' of die and inchoative melt, and this property is correlated with the meanings in a way that can be exploited by a
language learner.
What aspects of verbal meaning can in principle be deduced from the syntactic context a verb appears in? The proposal
advanced by Landau and Gleitman (1985) is that learners used information about the surface syntax of a clause to determine
(aspects of) the meaning of the verb in the clause. Specifically, they suggest that the subcategorization frame of a verb contains
the critical information. It is easy to see that this idea will be useful in solving the kill/ die problem: since kill is subcategorized
for an NP complement and die is not, a learner who knew the subcategorizations in advance could use them to choose the right
morpheme for the right meaning. (Presumably variability resulting from parametric variation can be factored out of the
situation, and a sufficiently abstract view of the subcategorization sets will make it possible to treat the subcategorization of a
verb in one language as being the same as the subcategorization of a verb with the same meaning in another language, despite
superficial differences in the syntactic systems in the two cases.) When the issue is considered in a more precise fashion,
however, a number of considerations suggest that subcategorization frames are not the optimal source of information. (See
Pinker, this volume, and Fisher et al., this volume, for additional remarks on the limitations of frames for learning semantics.)
First, observable context alone cannot determine what the subcategorization frames of a verb are. Arguments figure in
subcategorization frames, adjuncts do not. However both occur on the same side of the verb in a language like English, hence
there is no positional evidence to distinguish one from the other. This problem arises wherever adjuncts and arguments have the
same form, e.g. with PPs. Consider the examples in (4): while write takes an optional PP adjunct, put has an obligatory PP
argument. Similarly, last has a temporal argument, while wriggle occurs with a temporal adjunct.
(4a) He wrote a book in his room.
(4b) He put a book in his room.
(4c) The performance lasted for an hour.
(4d) The performer wriggled for an hour.
Without knowing which expressions are arguments it is not possible to know what frames the verb appears in, but in order to
know which expressions are arguments it is necessary to know what the verb means. Thus it is not clear what role the
subcategorization frames could have in the acquisition of verbal
meanings in cases where there is no clear independent indication of the argument or adjunct status of a phrase associated with
the verb.
A second kind of limitation arises because of the existence of large numbers of many-to-one semantics-to-syntax mappings. For
example, the set of verbs which subcategorizes for NP is both enormous and extremely disparate semantically:
(5a) He weighed the tomatoes.
(5b) He weighed 300 pounds.
(6a) He became a doctor.
(6b) He shot a doctor.
(7a) He asked someone the time.
(7b) He asked someone a question.
So the fact that a verb takes an NP complement is not very informative as far as its meaning is concerned. (Of course it is not
completely uninformative, since it does make it possible to eliminate a number of candidates: 'put' and 'die', for example, are not
possible meanings for any of the verbs in (5)(7).) The reason that the syntax here is comparatively uninformative is that many
different meanings are mapped onto a syntactic expression like NP.
Syntactic mapping based on a single subcategorization frame will be maximally effective when the mapping is one-to-one or
just few-to-one. This is probably the case with, for example, sentential complements. A verb with a sentential complement must
draw its meaning from a relatively small set of possibilities, which include verbs of communication (say, announce, state), verbs
of logical relation (entail, presuppose), and verbs of propositional attitude (hope, believe).
Being aware of the limitations and problems of mapping from subcategorization frames to meaning in cases such as the ones
just discussed, the researchers working on syntactic mapping have investigated the idea that sets of subcategorization frames,
and not just single frames, play a role in the acquisition of verbal meaning (see especially Landau and Gleitman 1985, Fisher et
al. 1991, and the discussion of 'frame range' in Fisher et al., this volume). The learner will examine the set of subcategorizations
that a verb appears in, and discover properties of its meaning in this way.
Note that one strong disadvantage of this position is that it is not possible to learn a meaning for a morpheme on exposure, even
repeated exposure, to a single sentence type. Analysis across sentence types will be required, since the
entire set of frames that the verb appears in, or some approximation thereof, must be observed before its meaning can be
determined.
There is an important pre-condition for the success of the 'sets of frames' idea. In order for learners to be able to use a verb's
subcategorization set to determine its meaning, the learner must know in advance which set goes with which meaning. This is
possible only to the extent that the subcategorization set/meaning mapping is cross-linguistically stable, or shows only
parametric variation. So the question is whether UG determines the subcategorization set associated with a verb, or not. This
issue turns out to be highly problematic the reason is that the total subcategorization set of a verb is a function of the set of
subcategorizations in which each sense of the verb participates. And the way senses are distributed across morphemes is not
uniform across languages. (This point is also made in Pinker, this volume.)
Let us consider why this should be so. The relevant principles of UG govern the mapping between the meaning of a verb and its
syntax in that meaning. In addition, UG regulates the set of possible semantic alternations that a morpheme may participate in,
such as the causative/change-of-state relation. If each morpheme had exactly one sense, or if the alternative senses of a
morpheme were always regulated by UG, this would exhaust the situation. But in fact, a morpheme can have several senses,
each with its own UG determined syntax and each participating in alternations in the ways prescribed by UG. UG says little or
nothing about the complete set of senses the verb has, and therefore little or nothing about the total set of subcategorizations of
the morpheme. UG only determines the properties of the individual senses and those that are related grammatically.
As an example, consider the verb shoot. It has at least two senses: one exemplified in She shot the burglar, and one in The
burglar shot out of the room. In the first sense, shoot takes an NP complement, in the second a PP. In both cases the
subcategorizations are highly predictable. shoot-1 is like, say, stab, and takes an individual/NP as its complement, while shoot-2
is like, say raced, and takes a directional PP as its complement. The subcategorization of each sense is completely in accordance
with the theory of grammar, but nothing about the theory of grammar determines that shoot will have these two senses. Hence
the theory of grammar cannot possibly predict that shoot will have these two subcategorization frames, and a learner could not
know this in advance.
In cases like these, the senses of the verb do not seem to be related by UG at all, even though they are all realized by a single
morpheme. Presumably they are related by association, which depends on semantic field and other
cognitively real but grammatically irrelevant factors. Thus it is probably not an accident that shoot has a verb-of-motion
meaning, and stab does not, since shooting involves a rapidly moving bullet. The point is that the relationship is associative in
character, and not UG determined. Similarly, consider the case of know as in I know her and in I know that she is here. The two
uses of the verb are cognitively close both are state predicates and both are psychological. Hence it is presumably much more
likely that these two senses will be clustered together into a single morpheme than that say, eat and one of the two senses of
know will be clustered together. The probabilities involved are not a matter of UG, however, and of course the clustering of
senses together under a single morpheme is notoriously variable cross-linguistically. The two senses of know correspond to
different morphemes in many other languages: French for one. What is involved in these effects is the spreading of morphemes
across meanings. The results range in character from pure homonyms where no significant semantic relation can be seen at all,
through relations that seem cognitively natural, and that probably reflect various clustering effects, to relations that are
grammatical in character.
The point is, though, that the entire subcategorization set associated with a morpheme is simply the sum of the
subcategorization sets for each meaning of the morpheme. While it is true that the subcategorization set for each meaning is
highly principled and strongly related to the meaning, the same is not true for the entire set. Each individual subcategorization
frame reflects the systematic properties of the form-meaning correspondences, but the entire set of subcategorizations also
reflects accidental combinations of subcategorizations, which result from a single morpheme's appearing in several different
senses. It follows that there is no stable mapping between the full set of subcategorizations that a morpheme appears in and the
meaning of the morpheme. The full set of subcategorizations will depend, not just on UG, but also on the range of meanings
that the morpheme assumes in a given language.
The conclusion is, then, that a predictable relationship between subcategorizations and morphemes does not hold: the predictable
relationship is between subcategorizations and a particular sense of the morpheme. It follows that it is not possible to use the
entire subcategorization to learn meaning, because the entire subcategorization set is not associated with a single meaning in
the first place.
As a result, a learning mechanism based on subcategorization sets will give poor results on morphemes with many senses,
typically the most common ones. It will predict many errors, or failure of learning. Suppose that the
learning device tracks the syntactic context of the morpheme and attempts to determine its meaning based on this information. It
will collect up all observed subcategorizations for a morpheme, disregarding the fact, which it cannot by definition be sensitive
to, that verb's meaning differs across these subcategorizations. It will then assign to the verb the meaning that would be assigned
to a verb which, under a single meaning, occurred in this set of frames. But this will be incorrect, except in the case of a verb
with exactly one sense. For verbs with multiple senses, if the procedure succeeds in assigning a meaning at all, it will give one
meaning to the verb instead of several, and the meaning it gives might well be completely unrelated to any actual meaning of
the verb being learned. Without knowing about meaning it is not possible to know which subcategorizations should be grouped
together, and which should be kept separate.
What of lexical entries in which the relationship between the alternatives is regulated by UG, such as the causative/change of
state examples discussed above (e.g. melt/melt)? Suppose that there are two subcategorization frames associated with each of
these verbs, one transitive and one intransitive. Can this set of frames be exploited to learn the meaning of the verb? (Note that
the existence of the alternation cannot be crucial for learning the meanings of the members, because many verbs, such as kill
and die do not alternate, but are still learned.) Here again the answer is negative, because the two occurrences of the morpheme
have different meanings, under standard assumptions about lexical representation.1 So there cannot be a single sense specified
by UG as associated with this subcategorization set. Note that it also cannot be the case that the shared meaning, i.e. what the
two cases of the verb have in common, is learned from the subcategorization set either, because all causative/change of state
pairs have the same subcategorization set, even though each pair has its own meaning.
Clearly what we must aim for is a learning procedure that uses the alternation as a clue to the semantic analysis of the verbs,
that might tell the learner that these verbs have change and caused change meanings. However,
1 However, Grimshaw in work in progress (1993) denies these assumptions for all cases of alternations involving no overt
morphology, proposing instead that there is only one meaning for e.g. melt in its causative and inchoative uses. The
'alternation' is just the result of the meaning interacting with clausal structure. Further work is required to see how this
ultimately bears on the learning questions addressed here, but it seems that in this sense of 'meaning', observation of the
two clause structures associated with melt is essential for arriving at the correct analysis. This conclusion will hold for the
UG-governed alternations only.
just observing subcategorization alternations will not achieve this result, because there are other transitive/intransitive
alternations like eat and leave.
(8a) We melted the ice.
(8b) The ice melted.
(9a) We left the room.
(9b) We left.
(9c) *The room left.
(10a) We ate the ice.
(10b) We ate.
(10c) *The ice ate.
What will guarantee success is to take into account the properties of the arguments, and not subcategorization alone; it is the fact
that the subject of one case of the verb corresponds to the object of the other cases that reliably distinguishes change/caused
change pairs from eat and leave.
One final point concerning the causative/change-of-state pairs. In fact both are 'transitive' in d-structure, according to the
unaccusative hypothesis (Perlmutter 1978, Burzio 1986), which means that technically they both have transitive
subcategorizations. Again, this suggests that subcategorization frames are not exactly the right place for the child to look for
help in figuring out verb meaning.
We can summarize the conclusions so far as follows. Mapping from meaning onto syntax can successfully exploit a set of
principles of Universal Grammar which relate lexical meaning ultimately to surface syntax. However, determining the meaning
just from observation of the world seems to require multiple exposures across situations in many cases, and may be impossible
under certain circumstances, when the world is particularly uninformative. On the other hand, mapping from syntax onto
meaning promises to successfully exploit linguistically encoded information about a verb's meaning. However as formulated so
far, it seems to require multiple exposures across sentences, and also may be impossible in many cases, when syntax is
uninformative.
4. Reconciliation
Clearly then we seek a model which preserves the advantages of both kinds of ideas: which makes it possible to use UG
mapping principles to regulate
syntax, and to use surface syntax to regulate the semantic analysis. One way in which it is possible to combine the essential
good effects of both types of mapping gives them different roles in the learning process: the semantics-to-syntax mapping
principles provide a predictive mechanism, and the observed s-structure provides a checking mechanism. This is the basis of
Reconciliation. (Wilkins 1993 explores a rather similar model for lexical learning, with particular emphasis on the acquisition of
morphology.)
Reconciliation
(1)The learner interprets a scene or situation, hears a sentence and detects the verb.
(2)The learner finds a relationship R among participants in the situation (entities,
propositions etc.) that is sensible given the interpretation of the observed situation.
(3)The learner checks that R involves participants consistent with the content of the
(candidate argument) expressions in the sentence, and rejects an R that does not
meet this requirement.
(4)The learner constructs a lexical conceptual structure which is consistent with R,
and assigns candidate argument expressions in the sentence to argument positions
in the lexical conceptual structure.
(5)This lexical conceptual structure is fed through the semantics-to-syntax mapping
principles of UG in their language particular instantiation.
(6)The s-structure predicted by step 5 is compared to the observed s-structure.
(7)If they do not match then no learning takes place.
(8)If they do match then the morpheme is entered into the lexicon with the
hypothesized lexical conceptual structure.
A few comments are required about the steps involved in reconciliation. Step 2 excludes situations where the interpretation of
the event is one of throwing, say, and R is a relationship between propositions. Step 3 constrains the device to considering Rs
that express relationships between the right kind of entitities: if the sentence contains two NPs, and one is the ball, then a verb
meaning 'say' is not a candidate, since it is not a possible relationship between a ball and some other entity. Similarly, with the
NPs John and the ball, R cannot be a verb meaning 'dress', although it could mean 'throw'. Whether this should depend on
probabilities given real world knowledge, or only on strictly linguistic selectional restrictions, I leave open. Note that step 3 is
one way in which the sentence constrains the process of word learning,
not through the syntactic form of the arguments of the predicate but through gross consistency between the (candidate)
arguments and the relationship expressed by a candidate R. This step is pre-linguistic, in the sense that it does not rely on the
linguistic representation of the arguments, just on their gross meaning. It is quite different, then, from the effects of the linguistic
mapping and checking involved in later stages. An interesting question is whether some of the effects described in the literature
on 'syntactic bootstrapping' are really due to this (non-linguistic) process, see Pinker (this volume). At Step 4, however, the
procedure is now linguistic in character. If R is a causal relationship between two entities, a causative lcs will be constructed.
The system is now working with a linguistic representation, rather than just with construals of events, and conceptual properties
of relationships and entities. With respect to the final steps, note that an incorrect representation will be entered into the lexicon,
if the match discovered in R6 is accidental. We must assume that general principles of decay will eventually result in eradication
of mistaken entries (cf. Braine 1971), since they will result in matches less often than correct entries. Moreover, the notion of a
'match' requires further explication. The definition could require identity of form between the observed sentence and the
predicted sentence, or it could allow for limited inconsistencies, in particular with respect to various ways of reducing the
number of arguments that are actually expressed in a clause. Imperatives lack an overt subject, for example, there are elliptical
forms, and there are verbs like wash and eat, which can be syntactically intransitive while apparently maintaining a two-place
semantic structure. 'Cognate objects' (e.g. to die a peaceful death) and 'fake reflexives' (e.g. to behave oneself) pose the opposite
problem, and a more detailed treatment is required than I will give here.
The reconciliation model incorporates aspects of both semantic and syntactic 'bootstrapping'. It crucially involves mapping from
a posited meaning to a syntactic form. It also exploits the surface syntax to constrain solutions.
A simple result of the model is that the number, the position, and the form of the syntactic arguments of a predicate will
constrain the semantic representation it is given. This is because of the grammatical principles regulating the lcs-syntax relation.
Suppose for example, that a learner hears a sentence containing the verb give: Mary is giving the package to the boy, and
observes an event in which one individual hands a package to another individual. Suppose the learner interprets the event as
involving a three-place logical relationship of transfer of possession. This 'R' will be consistent with the content of the candidate
argument expressions in the sentence: Mary, the package, and the boy, since
these are the right kind of entities to participate in such a relationship. Now the learner constructs an lcs for R (step 4), say x
acts to transfer y to z, and assigns the candidate argument expressions in the sentence to positions in the lcs of the predicate.
Suppose the learner assigns Mary to x, the package to y, and the boy to z. Then given the semantics-to-syntax mapping
principles as they work out in English, the predicted s-structure is Mary verbs the package to the boy. Since this is the observed
s-structure, this experience yields an lcs for give. (Note that the wrong assignment of candidate argument expressions to lcs
positions would not have yielded a match, hence no learning would have occurred.)
Suppose, on the other hand, that the learner construes the event as one of holding or getting, interpretations that are equally
consistent with gross observation. Now the learner will posit a two-place 'R'. At step 3 the procedure might already break down,
if the learner can decide that all of the candidate arguments in the clause must be actual arguments, since there are 3 candidate
arguments but only two actual arguments. Perhaps this is not so easy, however, since one or other of the phrases might be
involved in some adjunct role. Assuming then, that the procedure will not halt at step 3, what happens? The learner now
constructs one of the two lcs's, for hold or get, and examines the predicted s-structures that correspond to them. Let us assume
that the lcs's are something like: x have y, x come to have y. There is no way for these lcs representations to yield the observed
s-structure. They have the wrong number of arguments and since there is no way to treat the PP to the boy as an adjunct there is
no way to reconcile the observed s-structure with the predicted s-structure. In addition, the arguments will be in the wrong
syntactic position. Semantics-to-syntax mapping will place the 'getter' or 'holder' (the boy) in subject position, but this will
contradict the observed s-structure which has the giver (Mary) in subject position.
In general, the number of arguments in the observed sentence will have to match the number of arguments of R: if a predicate
expresses an n-place relationship it will have n syntactic arguments, and if it has n syntactic arguments it will be a logical n
place predicate. (I set aside here the disruptions to this generalization mentioned above.) This is, more or less, what the Theta
Criterion and the Projection Principle guarantee (Chomsky 1981): the number of phrasal arguments in the syntax is the same as
the number of (open) logical positions in the lexical representation of the predicate, and the syntactic derivation cannot change
the number of arguments.
By the same reasoning Reconciliation resolves the kill/die problem discussed above. A learner can conclude that die can be a
change-of-state predicate but
not a causative, that kill can be causative but not change-of-state, and that melt can be both. This can be determined just on the
basis of a single observation each for kill and die, and two observations of melt, one in each complement structure. A verb
which means to cause something to change state must have two arguments, a verb which means to change state must have one.
Hence, since kill has two syntactic arguments, it must have two semantic arguments and cannot have a change-of-state meaning.
Since die has one syntactic argument it must have one semantic argument and cannot be causative. The syntactic form provides
the information that there are two semantic arguments, which provides the necessary information about possible verb meanings.
As we saw in section 3, in order to properly identify instances of the alternation between causative and change-of-state
meanings, it is necessary to take into account the fact that the subject of one case of the verb corresponds to the object of the
other: We melted the ice, The ice melted. The properties of the arguments are essential to distinguishing these verbs from verbs
like leave and eat. If intransitive eat is mistakenly analyzed as having a change of state meaning, its predicted s-structure will
have what is eaten in subject position The ice ate. The observed s-structure will have the eater in subject position We ate. Hence
the wrong analysis will be rejected. Similarly, if a learner fails to assign a change-of-state analysis to melt, treating it instead as
having an lcs like intransitive leave or eat, the predicted s-structure will have the agent in subject position (We melted), while
the observed s-structure will have the entity undergoing the change of state in subject position (The ice melted). Once again, the
error will be avoided.
The syntactic form of an argument will similarly constrain meaning. The syntactic form depends on the semantic properties of
the argument, hence the syntax of an argument can provide information about its semantics, and hence about the semantics of a
predicate. Sentential complements, for instance, will occur with verbs of propositional attitude (e.g. believe), verbs of logical
relation (e.g. entail) and verbs of saying (e.g. announce). Reconciliation has the desired effect that this fact will prevent certain
types of errors by learners. Suppose the event is one in which a child is playing roughly with a dog and an adult says either I
think that you are being mean to the dog or That dog will kill you. If the verb means 'think' and the learner thinks it means 'kill',
the predicted s-structure will contain an NP complement while the observed s-structure will contain a clausal complement.
Similarly if the verb means 'kill' and the learner thinks it means 'think', the predicted s-structure will contain a sentential
complement while the observed s-structure will contain an NP.
In sum, under Reconciliation, the number, position, and form of the arguments of a predicate will all constrain the interpretations
that can be assigned to that predicate. Of course, the same general point can be made for many other kinds of linguistic
information: hearing a verb in the passive form, in the progressive, with a particular aspectual modifier and so forth, will
similarly provide constraining information about the verb's semantics, eliminating many posited, but incorrect, lexical semantic
representations. This captures the essence of the issue addressed by syntactic bootstrapping: it provides a more precise
characterization of the idea that the language can be used to map from observation to verb meanings. In so doing, however, it
makes crucial use of the notions behind semantic bootstrapping, concerning the mapping between semantics and syntax.
5. Conclusion
The important properties of Reconciliation are these:
It does not depend on exposures to multiple sentence types, in the sense that neither
cross-situational analysis, nor cross-sentential analysis is involved in setting up the
lcs representations. Therefore the problems posed by the variable senses of a
morpheme discussed in section 3 do not arise.
It uses semantics to predict syntax where Universal Grammar makes this possible.
It uses syntax to eliminate wrong semantic candidates where possible.
I emphasize again that I have addressed here the question of learning basic lexical semantic representations, and not the
learnability problem discussed in much of the literature, which concerns the problem of determining which verbs participate in
which 'alternations', see especially Pinker (1989) and references therein. Within the terms of the present discussion we could see
this as the question of how the system should proceed when a single morpheme would be assigned multiple representations,
whether multiple lcs's, or multiple syntactic configurations, but I will not explore the issue here. Also unexplored is the issue of
how morphologically complex items are analyzed.
Even with a procedure like Reconciliation, which exploits a full set of grammatical principles, there is no way to save a learner
from having to learn some word meanings simply from observation. There are many are sets of
words which have absolutely identical linguistics. Such sets include the sets of causative verbs (kill, melt, burn) and change-of-
state verbs (die, melt, burn). Semantics-to-syntax mapping guarantees that each member of the set will be syntactically
indistinguishable (just as the names of animals, cat versus dog for example, are not syntactically distinct). So examination of the
surface syntax will not inform a learner as to whether a verb means 'to become liquid' or 'to become solid'. The general situation
is that it is possible to use the surface syntax to constrain analyses of the semantic structure of a verb, but not its semantic
content: the fact that a verb is a change-of-state verb, but not the fact that it expresses a particular change of state. This is a
matter of semantic content only and is not reflected in the syntax of a verb at all.
Presumably, then, the semantic differences among members of these sets must be learned from observation about the world in
some sense. However, this does not necessarily mean that the differences are perceptual in character. A vast quantity of
information about the world can be encoded linguistically but is not linguistic itself. Thus a child can observe that melt is used
of, for example, ice, while burn is used of, for example, paper. This is sufficient for the child to conclude that the meanings are
as they are rather than the other way around. In this way, it is possible for a child to know what a word means without ever
having observed an event which would count as an occurrence of what the verb describes. Indeed if this were not the case it
would be impossible to understand how meaning differences among unobservable verbs are acquired: think, hope, imagine.
For this reason, language is a source of essential information for lexical learning in two respects. As just discussed, language can
convey information about word meaning which is orders of magnitude more informative than observation of the world can be.
Second, by virtue of the grammatical principles that govern it, language constrains the possible representations of words in ways
that learners can exploit in word learning. Reconciliation is one way in which this might happen.
References
Baker, C.L., 1979. Syntactic theory and the projection problem, Linguistic Inquiry 10, 533581.
Baker, M., 1988. Incorporation: A theory of grammatical function changing. Chicago, IL: University of Chicago Press.
Bowerman, M., 1974. Learning the structure of causative verbs: A study in the relationship of cognitive, semantic and syntactic
development. Papers and Reports on Child Language Development 8. Stanford University Department of Linguistics.
Bowerman, M., 1990. Mapping thematic roles onto syntactic functions: Are children helped by innate linking rules? Linguistics
28, 12531289.
Braine, M.D.S., 1971. On two types of models of the internalization of grammars. In: D. Slobin (ed.), The ontogenesis of
grammar: A theoretical symposium, 153186. New York: Academic Press.
Burzio, L., 1986. Italian syntax: A government-binding approach. Dordrecht: Reidel.
Choi, S. and M. Bowerman, 1991. Learning to express motion events in English and Korean. Cognition 41, 83121.
Chomsky, N., 1981. Lectures on government and binding. Dordrecht: Foris.
Clark, E.V. and K.L. Carpenter, 1989. The notion of source in language acquisition. Language 65, 130.
Fisher, C., H. Gleitman and L.R. Gleitman, 1991. On the semantic content of subcategorization frames. Cognitive Psychology
23, 331392.
Fisher, C., D.G. Hall, S. Rakowitz and L. Gleitman, this volume. When it is better to receive than to give: Syntactic and
conceptual constraints on vocabulary growth. Lingua 92, 333375 (this volume).
Gleitman, L., 1990. The structural sources of verb meanings. Language Acquisition 1, 355.
Grimshaw, J., 1981. Form, function, and the language acquisition device. In: C.L. Baker, J.J. McCarthy (eds.), The logical
problem of language acquisition, 165182. Cambridge, MA: MIT Press.
Grimshaw, J., 1989. Getting the dative alternation. In: I. Laka and A. Mahajan (eds.), Functional heads and clause structure,
113122. MIT Working Papers in Linguistics Volume 10.
Grimshaw, J., 1990. Argument structure. Linguistic Inquiry Monograph 18. Cambridge, MA: MIT Press.
Grimshaw, J., 1993. The least lexicon. Colloquium of the Institute for Research in Cognitive Science, University of
Pennsylvania.
Gropen, J., S. Pinker, M. Hollander and R. Goldberg. 1991. Affectedness and direct objects: The role of lexical semantics in the
acquisition of verb argument structure. Cognition 41, 153195.
Jackendoff, R., 1990. Semantic structures. Cambridge, MA: MIT Press.
Landau, B. and L.R. Gleitman, 1985. Language and experience. Cambridge, MA: Harvard University Press.
Levin, B. and M. Rappaport Hovav, 1992. Unaccusativity: At the syntax-semantics interface. Ms.
Naigles, L., 1990. Children use syntax to learn verb meanings. Journal of Child Language 17, 357374.
Perlmutter, D., 1978. Impersonal passives and the unaccusative hypothesis. Berkeley Linguistic Society 4, 157189.
Pesetsky, D., 1992. Zero syntax, Vol. 1: Experiencers and cascades. Ms., MIT.
Pinker, S., 1984. Language learnability and language development. Cambridge, MA: Harvard University Press.
Pinker, S., 1989. Learnability and cognition: The acquisition of argument structure. Cambridge, MA: MIT Press.
Pinker, S., 1994. How could a child use verb syntax to learn verb semantics? Lingua 92, 377410 (this volume).
Randall, J., 1990. Catapults and pendulums: The mechanics of language acquisition. Linguistics 28, 13811406.
Wilkins, W., 1993. Lexical learning by error detection. Ms., Arizona State University.
Williams, E., 1992. Remarks on lexical knowledge. Ms.
Williams, E., in press. Thematic structure in syntax. Cambridge, MA: MIT Press.
Zubizarreta, M.-L., 1987. Levels of representation in the lexicon and syntax. Dordrecht: Foris.
Section 6
Procedures for verb learning
1. Introduction
Individual words differ in the syntactic types of the phrases that can represent their semantic arguments. For example, watch
takes a noun-phrase argument while look does not.
(la) Jane watched Bob
(lb) *Jane looked Bob
* This work benefited enormously from conversations with Bob Berwick, Lou Ann Gerken, Lila Gleitman, Jane
Grimshaw, and Elissa Newport. Any remaining faults they probably tried to talk me out of.
Similarly, pretend takes a tensed clause or an infinitive argument while play does not.
(2a) Jane is pretending to be grown up
(2b) *Jane is playing to be grown up
(2c) Jane is pretending she is grown up
(2d) *Jane is playing she is grown up
Chomsky (1965) referred to these properties of words as their subcategorization frames. In general, a word may have several
subcategorization frames, just as it may have several syntactic categories.
Within a given language, the subcategorization frames and the meanings of words are strongly correlated (Fisher et al. 1991,
Levin 1993, Zwicky 1970). For example, all English words that take three semantic arguments of which one is realized as a
direct object and one as a tensed clause, as in ''I told him I'm happy", are verbs of communication (Zwicky 1970). Examples
include tell, write, fax, inform, warn, advise, and so on. It has been hypothesized that the correlation between subcategorization
frames and meanings plays an important role in lexical acquisition. Pinker and colleagues have proposed that children typically
learn the meaning of a word first, then exploit the regular correspondence between meaning and subcategorization to infer its
subcategorization frames (Pinker 1984, 1989). Conversely, Gleitman and colleagues have proposed that children often learn the
subcategorization frame first, then exploit the correspondence to restrict their hypotheses about its possible meanings (Landau
and Gleitman 1985, Gleitman 1990). These two proposals are known as the semantic bootstrapping and syntactic bootstrapping
hypotheses, respectively.1
1.1. The problem
Both semantic and syntactic bootstrapping depend on two-year-olds' ability to learn the subcategorization frames of some words
without relying on their meanings. This is obvious in the case of syntactic bootstrapping if children are to rely on the
subcategorizations of some particular word to infer its meaning they cannot also rely on its meaning to infer its
subcategorizations. In the case of semantic bootstrapping, the problem is that languages vary
1Semantic bootstrapping also refers to the more general hypothesis that lexical semantics provides the basis for children's
acquisition of lexical syntax (Grimshaw 1981, Pinker 1984).
substantially in which particular subcategorization frames correspond to which particular meanings. This variation implies that
children must learn the language-particular correspondence before using it to infer the subcategorizations from meanings. To
learn the correspondence they must learn the meanings and subcategorizations of some words independently (Pinker 1989,
Gropen et al. 1991).
Although the semantic and syntactic bootstrapping hypotheses have generated a substantial literature, no non-semantic procedure
by which children might learn subcategorizations has been developed. There have been insightful proposals about the general
type of cues that might be exploited in such a procedure, notably prosodic cues to syntactic structure such as voice pitch,
pausing, and vowel duration (Fisher and Tokura 1993, Lederer and Kelly 1992, Morgan 1986, Fernald and Kuhl 1987, Hirsh-
Pasek et al. 1987, Jusczyk et al. 1993, Kemler-Nelson et al. 1989, Mehler et al. 1988). This proposal, known as the prosodic
bootstrapping hypothesis, holds that children can recover partial syntactic bracketing (but not category labels) from prosodic
cues. However, no explicit procedure has been proposed for recovering bracketings nor for extracting linguistic regularities from
them on the basis of prosodic cues.
It seems that young children must infer some of the lexical syntax of their languages the syntactic facts about individual words
from larger syntactic structures. But it is difficult to see how children could identify syntactic structure in an utterance without
already knowing the syntactic functions of some of the words in the utterance. This poses an apparent paradox: to learn lexical
syntax, children must recover the syntactic structure of the input; to recover syntactic structure, they must already know lexical
syntax the bootstraps need bootstraps.
1.2. Hypotheses
This paper investigates the possibility that children first learn the syntactic functions of a few words that are extremely common
and highly informative about syntactic structure, then exploit these words as probabilistic cues to key syntactic structures in the
input utterances. The function morphemes prepositions, determiners, inflection, pronouns, auxiliary verbs, and complementizers
are typically the shortest, most common, and most syntactically informative words in a language ideal starting points for
learning syntax (Morgan et al. 1987, Valian and Coulson 1988). The fact that children do not use function morphemes
consistently until quite late
has suggested to some that young children do not know them. Logically, however, children must know some function
morphemes by the time they are learning subcategorization frames they cannot observe that look subcategorizes for a
prepositional phrase without the ability to recognize even one preposition. Further, the next section reviews a substantial body of
empirical evidence that English-speaking children know the syntactic privileges of a number of function morphemes before they
are two-and-a-half years old.
Although function morphemes provide a great deal of information about syntactic structure, they do not provide enough for
complete, unambiguous syntactic parsing. This paper investigates the hypothesis that young children can learn lexical syntax on
the basis of partial and uncertain syntactic analyses of input utterances, and that they can cope with the resulting misconstruals
using statistical inference.
1.3. Context and methodology
From one perspective, the current approach resembles parameter-setting models (Chomsky 1981, 1986; Lightfoot 1991): it
assumes a fixed, finite menu of subcategorization frames from which a lexical entry is selected for each verb. For each verb V
and each subcategorization S, the presence or absence of S in V's lexical entry can be seen as a binary-valued lexical parameter.
The function-morpheme cues proposed here can be seen as triggers for the subcategorization parameters. For present purposes it
does not matter whether the menu of possible subcategorization frames is innate or acquired only that knowledge of the menu is
independent of the mechanisms children use to select from it. The statistical inference component of the current proposal can be
seen as redeeming a promissory note left (often unsigned) by work on parameter-setting models. As Lightfoot (1991: 19) puts it:
'We shall need to characterize the robustness of the data which may act as a trigger. Robustness is presumably a function
of saliency and frequency. One can be sure that parameters are not always set by single events; that would make the child
too "trigger happy" and inclined to draw long-term conclusions (a metaphor) from insufficient data. However, some
parameters may require more triggering experience than others.'
A precise algorithm for determining the amount of data required to set each 'parameter' is represented in section 3 (and in the
Appendix). This appears to be the first explicit proposal in this domain. This work suggests that weighing linguistic evidence is
by no means a trivial problem; indeed, as
a major component of any realistic mechanism for language acquisition, robust inference deserves much more attention than it
has received to date. Further, robust inference procedures make available new approaches to language learning problems and
new methods for evaluating them. Ultimately, this gives the work presented here a character quite different from that of both
parameter-setting models and learnability models (e.g. Wexler and Culicover 1980, Morgan 1986).
The ability to cope with occasional input errors implies the ability to use probabilistic or approximate cues. In English, for
example, the word to followed by a word that can be interpreted as an uninflected verb usually marks an infinitive. However,
the sentence John drove to fish markets all over town is an exception to this generalization. If such exceptions are rare enough
(and distributed appropriately) the generalization can still be used as a cue; an inference procedure can discount evidence
generated exclusively by the exceptions. The infinitive cue is local, in the sense that it can be used without regard to the rest of
the sentence, if there is a mechanism for handling the resulting errors. It can also be thought of as a surface cue (Kimball 1973).
By proposing that children use surface cues, the present model comes closer to contacting the actual input than parameter-
setting and learnability models, where it is typically assumed that the input to syntax learning includes bracketed deep-structure
or surface-structure representations. Subcategorization frames can be read directly off the deep structure, so assuming such
inputs is tantamount to assuming away the problem under investigation here.
Closer contact with the input makes it natural to test models of this kind by computer simulation on naturally occurring input
(Bever 1991). This paper reports on a simulation using transcripts of the child-directed English from the CHILDES database
(MacWhinney 1991). The results of the simulation support the notion that it is possible to learn subcategorization frames
without parsing, without knowing much lexical syntax aside from a few function words, and without relying on semantics.
As a method of investigation, simulation is quite different from the theoretical arguments presented by Wexler and Culicover,
Lightfoot, Morgan, and others. One important difference is that it takes account of the quantitative structure of naturally
occurring, child-directed language. It has been suggested that the presence of a few, high-frequency syntactic markers like the
common function words may be a critical property that makes languages learnable (Morgan et al. 1987, Valian and Coulson
1988). For example, if English had fifty equally frequent determiners, fifty verbal inflections, fifty complementizers, and so on,
it might not be learnable.
Simulation experiments are responsive to such quantitative properties of language, while learnability theory is not.
Similarly, child-directed speech tends to contain shorter sentences and more questions than adult-directed speech, and to contain
repeats of the same phrase in different contexts (Morgan 1986, Newport et al. 1977). It has been suggested that this may either
aid or disrupt language acquisition. In either case, simulation experiments using transcripts of child-directed speech reflect the
distributional properties of child-directed sentences, whereas theoretical arguments do not.
Although simulation has a number of advantages, no one mode of investigation resolves all questions. While some of the
strategies embodied in the cues for English will carry over to some other languages, the cues themselves are necessarily specific
to English, and this limits the conclusions one can draw from simulation experiments that use them. Further, the present
simulation takes as its input orthographic transcripts of child-directed speech. Such transcripts are closer to the input children
actually receive than deep-structure trees, but they are much more abstract than an acoustic signal.
1.4. Organization
The remainder of this paper is organized as follows. Section 2 reviews the evidence that children younger than two-and-a-half
years old are able to recognize specific function morphemes and to exploit them in understanding, despite their failure to use
them reliably in speech. Section 3 develops one possible implementation of the current proposals. Section 4 presents
experiments using a computer model and transcripts of child-directed speech from the CHILDES corpus. Finally, section 5
summarizes the results of this investigation and draws conclusions.
2. Children's resources
One of the major hypotheses advanced here is that children rely on the distribution of function morphemes in the early
acquisition of subcategorization frames. This hypothesis is only plausible if young children recognize the syntactic privileges of
function morphemes. In addition, the model presented in section 3 assumes that young children know some proper names by the
time they need to learn syntactic frames. Finally, the model assumes that children can detect the ends of utterances using
prosodic cues such as pause
length, pitch, vowel duration, and volume. This section reviews the empirical bases for these assumptions.
2.1. Function morphemes and names
The fact that young children do not use function words reliably has sometimes been explained by claiming their grammars lack
function morphemes (Radford 1988, 1991). It has been further proposed that children's putative ignorance of function
morphemes results from a perceptual failure they do not hear or cannot distinguish function morphemes in most languages
because they are unstressed (Gleitman et al. 1988). However, a substantial body of convergent evidence has accumulated against
each of these hypotheses. Briefly, it has been shown that:
(1) Even infants can distinguish unstressed syllables as well as stressed syllables
(Jusczyk and Thompson 1978, Williams and Bush 1978). This refutes the
perceptual 'explanation' of children's putative ignorance of function morphemes.
(2)Constraints on the metrical and prosodic phonology of children's speech provide
better explanations of their patterns of omission than simple ignorance (Demuth
1993, Gerken and McIntosh 1993, Gerken 1992, Wijnen et al. 1992). This removes
the motivation for the ignorance hypothesis.
(3)Children are sensitive to the presence and correct usage of function morphemes in
adult speech, as assessed by a variety of techniques (see below). This directly
refutes the ignorance hypothesis.
The direct evidence that children are sensitive to the present and correct usage of function morphemes is summarized below:
(1) On imitation tasks, young children (age 1;112;6, MLU 1.302.00, M = 1.73) omit
English function morphemes more often than unstressed nonsense syllables
(Gerken et al. 1990). This effect persists when the sentences are synthesized
automatically to control for possible prosodic differences between grammatical and
ungrammatical sentences spoken by humans (age 2;02;6, MLU 1.572.60, M = 2.07),
and when the nonsense words are composed of phonemic segments similar to those
of English function morphemes (age 2;02;6, M MLU = 2.20). This indicates that
children younger than two-and-a-half distinguish between English function
morphemes and similar nonsense syllables i.e. they know the
specific segmental content English function morphemes, if not their syntactic and
semantic functions.
(2)Young children tend to interpret a novel noun as referring to a class of objects when
it is taught with an article ("This is a dax"). When it is taught without an article
("This is dax") they tend to interpret it as referring to a unique individual (Katz et
al. 1974, Gelman and Taylor 1984). (GT subjects age 2;23;0, mean 2;6. KBM:
girls, mean age 1;5.) This indicates that young children recognize some English
determiners and that they distinguish between common and proper nouns both
syntactically and semantically.
(3)Young children understand grammatical English sentences significantly more often
than sentences in which the function morphemes are deleted (Shipley et al. 1969,
Petretic and Tweeney 1976), or in which function words are replaced by nonsense
syllables (Petretic and Tweeney 1976) (SSG: age 1;72;5 mean 2;1, median-LU 1
1.85. PT: mean age 2;3, MLU 1.071.66). The effect persists when function
morphemes are replaced by others of the wrong category, as in *Find was bird for
me, where was is substituted for a determiner (Gerken and McIntosh 1993). (Least
mature group, MLU 1.50, M = 1.15, produced no articles during session.) This
indicates that two-year-olds who almost never use function morphemes can
nevertheless distinguish their syntactic privileges.
Taken together, these experiments demonstrate that children younger than two-and-a-half years old, who almost never produce
function words, know the segmental, syntactic, and semantic properties of determiners and auxiliaries. In light of this evidence,
there is no reason to doubt that they have similar facility with pronouns and inflectional suffixes.
2.2. Prosody
The prosodic properties of a sentence are linguistically significant properties of its sound including intonation and rhythmic
properties like vowel duration, pausing, and volume. These properties are believed to be governed by rules that are distinct from
but sensitive to syntactic structure (Nespor and Vogel 1982, Selkirk 1981).
It has been hypothesized (under the rubric of prosodic bootstrapping) that young children recover a sybstantial amount of
syntactic structure, perhaps even an unlabeled bracketing of all major syntactic constituents, on the basis of prosodic information
(Morgan 1986). Substantiating this hypothesis requires, at least, establishing that young children attend to prosody and that it is
possible
to recover syntactic structure from the prosodic properties of sentences without already knowing the language. There is ample
evidence for the former (Fernald and Kuhl 1987, Hirsh-Pasek et al. 1987, Jusczyk et al. 1993, Kemler-Nelson et al. 1989,
Mehler et al. 1988), but little for the latter. A number of studies have been done, but they used carefully constructed pairs of
ambiguous sentences, read aloud (Lederer and Kelly 1992, Morgan 1986). As Fisher (in press) points out, the prosodic
properties of reading aloud may well be different from those of fluent speech, and the syntactic properties of the constructed
sentences are certainly different from those of the very short sentences that typify speech directed at young children. Further, the
readers in some of these experiments may have been aware that the sentences were ambiguous, and hence tried to disambiguate
them prosodically.
However, Fisher and Tokura (1993) have shown that the boundaries of utterances sentences or sentence fragments can be
predicted from the prosodic properties of natural, infant-directed speech in both English and Japanese. For each syllable in
samples of natural speech by mothers to their 8- and 14-month-old children, they measured pause duration, vowel duration,
fundamental frequency excursion within each vowel, and average amplitude within each vowel. Each syllable was labeled as
utterance-final, phrase-final, word-final, or non-final and a discriminant analysis was performed on one-half of the sample in
each language. When the resulting classification was applied to the other half of the sample, it correctly predicted 61% of all
utterance boundaries (false negative error 39%). More importantly, 93% of all the utterance boundaries it predicted were correct
(false positive error 7%), even though only 28% of all syllables were utterance-final. Similar results were obtained for predicting
utterance-final syllables in Japanese, using a classification function based on the Japanese sample. Remarkably, the classifier
based on the English sample performed almost as well at predicting utterance-final syllables in Japanese as did the one based on
the Japanese sample. It is possible, then, that a single classification procedure for finding utterance boundaries is innate and
universal. When the same method was applied to find a classification function for predicting syllables that end either an NP or a
VP, but not a whole utterance, the results were roughly at chance.
3. An implementation
This section describes one possible implementation of a learning strategy based on function morpheme cues and statistical error
reduction. This
implementation serves as a carefully worked out example that clarifies some of the implications of such a strategy. It also serves
as the instrument for the simulation experiment presented in the next section.
3.1. Overview
This subsection explains how the model works in general terms; the following subsections fill in the details.
First, note that subcategorization frames, as described by Chomsky (1965), do not distinguish between tensed clauses and
infinitives both are described as sentences. Sentential complements are realized as tensed clauses or infinitives as a result of
lexical features and transformations. The cues described below, however, recover only the syntactic forms of arguments as they
occur in utterances. (Boguraev and Briscoe, 1987, provide evidence that it is usually possible to recover the lexical features
required in Chomsky's analysis from surface argument types mechanically, but that question is not addressed here.) In addition,
this implementation recovers some information that is usually thought of as function-word selection, rather than
subcategorization. To avoid confusion, the representations recovered in this implementation are referred to as syntactic frames,
or simply frames. The syntactic frames investigated here, excluding selection for prepositions and complementizers, are shown
in table 1.
Table 1 The syntactic frames studies in the simulation
SF description Good example Bad example
NP only greet them *arrive them
tensed clause hope he'll attend *tell he'll attend
infinitive hope to attend *greet to attend
PP only listen to me *put on it
NP & clause tell him he's a fool *yell him he's a fool
NP & infinitive want him to attend *hope him to attend
NP & NP tell him the story *want him the story
NP & PP put it on the table *listen him to me
The task of learning syntactic frames for verbs can be broken down as follows:
(1) Collect observations about the syntactic frames manifested in each input utterance:
(a) Identify a verb, V, in the utterance
(b)Identify phrases in positions where complements of V can occur
(c) For each potential complement, C, determine whether C is in fact a
subcategorized complement of V
(2)Make inferences about the lexicon by analyzing the observations in such a way that
the effects of ungrammatical and misconstrued input are minimized
Step 1 and Step 2 could be interleaved, so the lexicon is potentially updated after each input utterance, but the implementation
presented here draws conclusions only after a body of observations has been collected, since this approach is conceptually
simpler.
The implementation presented here includes cues for collecting observations in English and a novel inference method that can
be applied to other cues and to other languages. The cues assume no knowledge of lexical items except for proper names and a
few function morphemes. Although they refer to categories like determiner and pronoun, the initial lexicon of items in each
category need not be complete.
The cues used for detecting verbs in this implementation have two logically separable components: one that identifies words that
can be verbs, and another that excludes occurrences of such words in contexts where they are unlikely to be functioning as
verbs. As each utterance is processed, any word that has already occurred both with and without the suffix -ing is assumed to
have a verbal sense. All occurrences of such a word are taken to be verbal occurrences except when preceded by a known
determiner or preposition verbs are very rare in those positions.
The cues used for identifying phrases in this implementation are as follows (see table 3, page 450). Proper nouns and pronouns
are taken as noun phrases (NPs). (This class of words is called lexical NP, or LNP, hereafter.) Determiners are also taken as
signaling NPs. (The class of determiners and LNPs is referred to as NPLE, for NP left-edge, in table 3.) Prepositions followed
by NPs are taken as PPs. To followed by a word that has already occurred both with and without the suffix -ing is taken as
marking an infinitive. A complementizer (such as that) followed by a determiner or NP is taken as the beginning of a tensed
clause. So, too, is an NP followed by a previously identified verb.
Phrases are only relevant when they are in a position to serve as part of a syntactic frame that the learner is looking for. For
example, there is no
frame that contains a tensed clause followed by an NP, so the implementation does not look for that sequence of phrases.
The problem of whether a phrase is in fact a subcategorized complement of a given verb is resolved in an approximate manner,
leaving the errors to be cleaned up by the inference component. In particular, a phrase is taken to be a complement of a verb if it
follows that verb immediately, or if it follows a proper name or pronoun that follows the verb immediately. Further, NPs are
taken as complements only if they are unlikely to be subjects specifically, if they are followed by an utterance boundary or a cue
for another complement phrase. Although the details of this strategy are specific to English, there appears to be a cross-linguistic
tendency for subcategorized complements to occur near their verbs.
The statistical inference component takes as input the number of times each verb has occurred with cues for each frame, along
with the total number of times each verb has occurred in any context. Thus, the implementation keeps track of three things as it
collects observations:
(1) Which words have occurred both with and without -ing, not counting occurrences
preceded by a preposition or a determiner. These words are treated as having verbal
senses.
(2)How many times each word with a verbal sense has occurred, not counting
occurrences preceded by a preposition or a determiner.
(3)How many of those occurrences were followed by cues for each of the syntactic
frames.
This information is kept in a data structure called an observations table. An alphabetically contiguous portion of the
observations table from an experimental run is shown in table 2. Each row represents data collected from one pair of word
forms, an -ing form and its stem form. The first column, titled V, contains the total number of times the word occurs in
positions where it could be functioning as a verb. Each subsequent column represents a single frame. 'Cl' stands for a tensed
clause argument with that or null complementizer, 'in', 'on', and 'to' for PPs headed by those prepositions, 'inf' for infinitive VPs,
and 'wh' for clauses headed by what or where. The number appearing in each row and column represents the number of times
that the row's verb co-occurred with cues for the column's frame. Zeros are omitted. For example, pretend and pretending
occurred a combined total of 14 times, excluding those occurrences that followed determiners or prepositions. Two of those
occurrences were followed by a cue for a tensed clause and one by a cue
Table 2 Raw observations, before statistical processing, for the first 75 verbs in alphabetical order
V NP NPNP NPcl NPin NPinf NPon NPto NPwh cl in inf on to wh
add 2
ask 21 1 1 3
be 118 13 2 4
blow 17 2
bring 52 7 1 1 3 1
brush 54 10
button 25 1
buy 12 1
call 88 19 2 2
carry 6 3
catch 13 5
cheat 2 1
chew 15 1 4
clean 10
close 97 43
color 99 1 2 1
comb 25 6
come 298 6 2 1 1 5
cook 19 1 1
cry 42 5
cut 16 5
dance 21 1
do 1424 184 10 1 162 1 2 7
draw 23 1
drink 33 4 1
drive 19 4 3 1
eat 254 64 1
even 13
fall 46 1
feed 25 4 5
feel 55 15 6
find 103 34 2 3
fish 17
fix 32 15
fly 6
forget 6 1
get 398 84 3 3 9 3 1
give 91 4 8 4 1
go 982 5 24 113 3 26
grow 3
have 506 59 1 1 3 56 3
< previous page page_446 next page >
Page 446
for an infinitive. The observations table is the input to the statistical inference procedure.
The statistical inference component of this implementation has two parts: one estimates the error rate for each frame, and the
other uses this estimate to weight the evidence provided about that frame by multiple occurrences of each verb. In particular, the
latter component considers the number of times the
verb occurred with cues for the frame and the number of times it occurred without, much as one would consider the number of
heads and the number of tails turned up by a coin to determine whether or not it is fair. The method of estimating the error rate
was designed specifically for this implementation, while the method for weighing the evidence in light of the error rate is a
standard one.
The error estimation procedure is based on some simplifying approximations. First, for each frame, a single source of error
dominates all others in terms of frequency. Second, errors from this source are distributed evenly across verbs. Third, verbs that
in fact have a given frame occur with cues for that frame with significantly higher probability than verbs that do not have it. Put
another way, false alarms are less frequent than true alarms. To the extent that these approximations hold, the verbs whose co-
occurrence with cues for a given frame consist entirely of false alarms should have a relatively low rate of co-occurrence with
those cues, and their rates of co-occurrence should be distributed binomially. These facts can be used to identify a sample of
verbs whose co-occurrence with cues for a frame consist entirely of false alarms, or miscues. This sample can then be used to
estimate the error rate.
The remainder of this section provides a more detailed discussion of the implementation and the rationale behind its design.
Readers interested primarily in the experimental results can proceed to section 4 without loss of continuity.
3.2. Collecting observations
3.2.1. Finding verbs
Why find verbs? Although verbs are not the only words with subcategorization frames, their syntactic properties are sufficiently
different from those of other words that they must be analyzed separately. One important difference between verbs and nouns is
illustrated in (3):
(3a) John liked to pretend [CP he was at the theater]
(3b) *John liked to play [CP he was at the theater]
(3c) John liked the play [CP he saw at the theater]
(3a) is grammatical because the verb pretend takes a tensed clause (CP) argument; (3b) is ungrammatical because the verb play
does not; (3c) is grammatical because play is functioning as a noun and the following clause is
a relative clause rather than an argument of play. Any common noun can have a relative clause, so noticing a tensed clause is
not informative about the syntactic frames of a preceding noun, but it is informative about those of a preceding verb.
A similar difficulty arises from the fact that, in English, verbs can take NP complements but nouns cannot:
(4a) We planned to [V purchase] [NP the world's largest bank]
(4b) *Our [N purchase] [NP the world's largest bank] was profitable
Further, the NP complement of a verb tends to become a PP headed by of in the verb's nominalization:
(5) Our [N purchase] of [NP the world's largest bank] was profitable
Thus if observations of purchase the noun and purchase the verb were not distinguished, they would provide contradictory
evidence about whether or not purchase takes an NP complement. This has consequences for nouns that are not ambiguous as
well cues indicating that a word has an NP complement should be disregarded in the presence of evidence that the word is a
noun.
The proposed statistical inference method adds to the importance of distinguishing between nouns and verbs. It is based on the
simplifying assumption that miscues for a given frame are distributed fairly evenly across words. This assumption is violated if
data from nouns and verbs are pooled. To see this, suppose that no discrimination were made between nouns and verbs. Further,
suppose that a clause following a word W were taken as evidence that W takes a tensed clause argument. Example (3) shows
that the likelihood of that evidence being incorrect is much higher if W is a noun than if W is a verb. Thus, distinguishing
between nouns and verbs is important for obtaining a relatively uniform distribution of errors. This would be true even if there
were no lexical ambiguity.
Cues for finding verbs. In English, there are several function-morpheme cues that provide evidence about major category, but
none of them is ideal. In general, the simplest cues tend to be either less reliable or less common than more complex ones. For
example, a simple, reliable cue is that words following a form of be and ending in -ing (progressives) are very likely to be verbs.
Unfortunately, a number of words occur in the progressive construction
very rarely, if ever. These include some of the most common and most syntactically interesting words in child-directed speech,
such as know, want, like and see.
The following two-part procedure for identifying verbs is a reasonable compromise among simplicity, frequency, and reliability:
first, identify words that occur as verbs; second, when such a word occurs, assume it is functioning as a verb, except in the
presence of evidence to the contrary. All words that can occur as verbs can occur both with and without the final syllable -ing,
and all but a few words with that property can occur as verbs. Even mental state verbs like know, which rarely occur in the
progressive, have nominalizations ending in -ing (Knowing how to read a map is useful). A learner who has identified potential
verbs can now proceed by trying to weed out their non-verbal occurrences. Occurrences following determiners, prepositions, or
non-auxiliary verbs, for example, are unlikely to be verbal.
The simulations are based on the -ing cue for potential verbs and the determiner and preposition cues for non-verbal
occurrences.2 However, these cues are not without difficulties. Nominalizations of mental states like knowing, linking, and
wanting appear to be rare in child-directed speech, perhaps because they are semantically abstract. A second problem is that
some ambiguous words occur as nouns more often than they occur as verbs, even when occurrences following determiners and
prepositions are excluded.
3.2.2. Identifying potential complements
This implementation focuses on four complement types: Noun phrase (NP), prepositional phrase (PP), infinitive verb phrase
(VP), and tensed clause or 'complementizer phrase' (CP). Within these major categories, PPs and CPs are distinguished by their
head prepositions and complementizers, respectively. For example, a CP headed by that is treated as a separate complement type
from one headed by what or where.
The goal is to find representatives of each complement type that occur frequently in the input, and that can be identified without
the need to know a lot of other words. The words that must be known to identify these
2 In speech there is only one allomorph of the morpheme /ing/. In written English the 'ing' suffix can induce spelling
changes at the end of the stem, and the simulation includes some techniques for negotiating those spelling changes.
Description of these techniques has been omitted since the orthographic problem they solve does not occur in aural
learning.
representatives should be very common and regular, preferably without morphological variation. In terms of these criteria,
pronouns and proper names (i.e., lexical NPs) are ideal representatives of the noun phrases. Determiners are also in identifying
NPs. Given these NPs, a simple cue for prepositional phrases is a preposition followed by an NP.
The best cues for infinitive VPs and tensed clauses (CPs) take advantage of the verb list that is being learned from the -ing cue.
Specifically, any verb that follows to is interpreted as an infinitive. Similarly, any lexical NP followed by a verb is taken to be
the subject of a CP. In addition, a complementizer followed by a lexical NP as in that he or where it is taken to head a CP. The
complementizer that is frequently omitted, so the first CP cue, which does not rely on explicit complementizers, is important.
The English cues outlined above are summarized in table 3.3 NP left-edge (NPLE) refers to the class of lexical NPs and
determiners.
Table 3
The cues used by the simulation for identifying complements. LNP (Lexical NP) includes pronouns and personal names. NPLE
(NP left edge) includes LNP and the determiners
Phrasal category Cue Example
NP NPLE Don't eat that.
I saw a cat.
PP P NPLE Put your toy on the floor.
VP (inf) to V Do you like to dance?
CP C NPLE I hope that you like it.
(C) LNP V I know you drink vodka.
another clause. P is near enough to V if P follows V immediately or if one lexical NP (LNP) intervenes. It follows that a verb is
recognized as taking two complements in a single sentence only if the first is an NP. A phrase P is deemed unlikely to be a
subject if either, (i) it is an NP followed by an utterance boundary or a cue for another argument phrase type, or (ii) if it is not an
NP. The same two-part strategy may be applicable to many languages, although the specific criteria for nearness and non-
subjecthood may be different.
Like all syntactic cues, the nearness criterion is imperfect. Indeed it is incorrect for several types of grammatical sentences, such
as:
(6) John put [NP the toy Mart wanted] on the floor
In (6) the nearness criterion classifies 'on the floor' as a complement of wanted, when in fact it is a complement of put.
The argument-adjunct distinction poses another problem for the nearness criterion.
(7a) John wanted (*in order) to pursue a career in finance [Argument]
(7b) John resigned (in order) to pursue a career in finance [Adjunct]
In (7a) the infinitive VP 'to pursue ' is a subcategorized complement of want in the sense that: (i) removing it renders the
sentence ungrammatical, and (ii) its semantic function as an aspiration is a special lexical property of want and a few dozen
other verbs. Thus, the nearness criterion gives the correct result. The infinitive VP in (7b), by contrast, can be removed without
affecting the grammaticality of the sentence, and its semantic function as the purpose for which an action was carried out is not
special to a small class of verbs. The infinitive in (7a) is called an argument while the one in (7b) is called an adjunct. Purpose
adjuncts can be diagnosed by inserting 'in order' before the infinitive if the infinitive is an argument the result will be very bad,
but if it is an adjunct the result will be a perfect paraphrase. The nearness criterion fails to distinguish between subcategorized
arguments and non-subcategorized adjuncts. However, it should be noted that no syntactic analysis can distinguish arguments
from adjuncts, unless it already knows the syntactic frames of the verbs in question.
Although purpose adjuncts can be easily diagnosed in most cases, it is worth noting that there are no satisfactory, necessary and
sufficient conditions for distinguishing between arguments and adjuncts in general (Adams and Macfarland 1991).
The probability of coming up heads m or more times is given by the obvious sum:
Analogously, P(m + ,n, -s) gives the probability that m or more occurrences of a -S verb V will be followed by a cue for S out
of n occurrences total.
If m out of n occurrences of V are followed by cues for S, and if P(m + ,n, -s) is quite small, then it is unlikely that V is -S.
That is, the observed data would be quite unlikely if V were -S and hence had probability -s of being followed by a cue for S.
Traditionally, a threshold less than or equal, to 0.05 is set, such that a hypothesis is rejected if, assuming the hypothesis were
true, the probability of outcomes as extreme as the observed outcome would be below the threshold. The confidence attached to
this conclusion increases as the threshold decreases.
3.4. Estimating the miscue rate
As before, assume that an occurrence of a -S verb is followed by a cue for S with probability -s. Also as before, assume that
for each +S verb V, the probability that an occurrence of V is followed by a cue for S is greater than -s.
It is useful to think of the verbs in the corpus as analogous to a large bag of coins with various biases, or probabilities of coming
up heads. The only assumption about the distribution of biases is that there is some definite but unknown minimum bias -s.5
Determining whether a verb appears in frame S is analogous to determining, for some randomly selected coin, whether its bias
is greater than -s. The only available evidence comes from selecting a number of coins at random and flipping them. The
previous section showed how this determination can be made given an estimate of -s.
Suppose a series of coins is drawn at random and flipped N times. Each coin is assigned to a histogram bin representing the
number of times it comes up heads. At the end of this sampling procedure bin i contains the number of coins that came up heads
exactly i times out of N. Such a histogram is shown in figure 1, where N = 40. If N is large enough and enough coins are flipped
N times, one would expect the following:
(1) The coins whose probability of turning up heads is -s (the minimum) should
cluster at the low-heads end of the histogram. That is, there
5 If the number of coins is taken to be infinite then the biases must be bounded above -s.
The estimation procedure tries out each bin as a possible estimate of j0, the point of separation between the -S verbs at the low
frequency end and the +S verbs at the high frequency end (Item 1). Each estimate of j0 leads to an estimate of -s (Item 3),
and hence to an expected shape for the first j0 histogram bins (Item 2). Each estimate j of j0 is evaluated by comparing the
predicted distribution in the first j bins to the observed distribution the better the fit, the better the estimate.
The actual situation with verbs is slightly more complex than the one outlined above. The total number of occurrences of each
verb in any given input varies widely look may occur thousands of times in a corpus where jostle occurs only once. Thus, the
rates at which -S verbs co-occur with cues for S are not distributed according to a single binomial curve with sample-size N and
mean N × -s.6 Rather, they are distributed according to a superposition
6 In the context of machine learning, Brent (1993), presents an inference procedure that equalizes the sample size by
going through the input corpus twice, once for estimating -s, and
of binomial curves with different sample sizes N(V) and means N(V) × -s. This affects the details of the method but the
underlying idea is unchanged: evaluate hypotheses about the ±S boundary by comparing the observed distribution of -S verbs to
the expected distribution for each hypothesized boundary. A formal specification of the estimation procedure is given in the
Appendix.
Verbs with various numbers of occurrences in the input can be used to estimate the miscue rates, but there must be some
minimum number of occurrences below which verbs are not included in the estimation procedure. If verbs that occur very few
times were included then there would be large sampling error and hence the +S verbs and the -S verbs might overlap in their rate
of co-occurrence with cues for S. However, there is no way to know the minimum sample size needed to keep the overlap
between +S and -S verbs acceptably low. On the other hand, the number of observations of each verb increases with time, so the
learner can afford to keep raising its minimum. As this minimum is raised the degree of overlap between +S and -S verbs will
continue to decrease and the learner's conclusions will become increasingly stable and reliable. The learner will not know how
stable or reliable its conclusions are at any one time, but there is no reason to expect that child learners know or care.
Note that the child's inference problem is quite different from the scientist's. Scientists need to know whether or not the sample
is big enough that the experiment can be stopped and the results published. Children learning language do not face any
analogous choice they never stop collecting data and they may draw provisional conclusions as necessary, even when the
likelihood of error is not known. This continual raising of the minimum sample size resembles Gold's identification in the limit
(Gold 1967), where the learner always converges on the right grammar but has no way to know when it has done so.
4. Experiment
This experiment investigates whether the implementation described in the previous section, and by extension the hypothesis of
surface functional cues
once for drawing lexical conclusions. On the first pass, a verb is ignored after some preset sample size has been reached.
Multiple passes are not possible for a child learner, so a new variant of the procedure was developed for the current
problem.
plus statistical inference, constitutes an effective strategy by which two-year-olds could learn subcategorization frames. The
method is to simulate the proposed implementation on a computer. The input to the simulation is transcripts of English speech
by mothers to their young children.
4.1. Methods
The input corpus for the experiment consists of 31,782 utterances by adult caretakers to children between 1;0 and 2;6. The
utterances were taken from the CHILDES database (MacWhinney 1991). The particular trtanscripts were as follows: all of
Bates's 'free play' and 'snack' transcripts, but not the reading aloud transcripts (Bates et al. 1988); all of the Bernstein-Ratner
transcripts (Bernstein-Ratner 1987); all of the Higginson transcripts for children between ages 1;0 and 2;6 (Higginson 1985);
and the Warren-Leubecker (Warren-Leubecker and Bohannon 1984) transcripts for children 2;6 and younger. In all cases,
exactly one adult and one child were present throughout. All lines except the adult's speech were removed. The pause markers
(#) were left in and taken as utterance boundaries, along with the ordinary punctuation marks. All diacritics between square
brackets were removed. Angle brackets (used as scope markers for diacritics) were removed but the words between them were
not. A few changes were made to enforce consistent use of the compound marker '+', including the placement of a '+' rather than
a space between two-part names like Santa Claus and Mickey Mouse. Otherwise, transcription errors were not corrected.
Collecting observations. The observations were collected exactly as described in section 3.
Statistical inference. The algorithm given in the Appendix was used with I = 40 and minsample = 40. Given the estimated error
rate for each syntactic frame, verbs are reported as +S if their rate of co-occurrence with cues for S would have probability
below 0.02 under the null (-S) hypothesis.
Vocabulary. In order to make use of the cues, the learner needs to know some function morphemes. In this experiment the
program made use of an initial vocabulary chosen from the most common words in the corpus (table 4). Besides the four
utterance-boundary markers, '.', '?', '#', and '!', these include the pronouns you, that, it, this, I, we, he. and they; the determiners
the, a, and your; the complementizers what, that, and where; and
Table 4
The 50 most common orthographic forms in the corpus
12050 . 2447 , 1174 do 739 we 611 don
10909 ? 2384 a 1168 can 738 on 581 like
5479 # 1720 this 1043 ok 737 look 576 yeah
5250 's 1641 is 962 put 734 he 535 +
4548 you 1627 oh 957 and 732 no 534 're
3491 the 1549 there 956 one 730 uhhuh 514 good
3478 what 1445 to 905 are 712 your 503 huh
3471 that 1344 in 830 't 702 right 489 at
3336 ! 1251 here 805 go 689 where 487 they
2593 it 1226 i 757 see 662 want 479 all
the prepositions to, in and on. Auxiliaries and some modals are also common, and these can be exploited by cues. However, one
of the goals of this experiment was to determine how little lexical knowledge the child could make do with, so these function
morphemes were not exploited. The demonstrative 'pro-PPs' here and there were not exploited either, though they would
probably be very useful for learning about locative PPs. Interjections, negations, and conjunctions are of little use. The fifty
most frequent words also include the open class verbs put, look, want, and like, but the simulation does not know any open class
words at the outset.
In addition, the simulation started with a lexicon of all 158 proper names that occurred at least twice in the corpus.
4.2. Results
This simulation identified a total 126 verbs, of which 76 were assigned at least one syntactic frame. The output of the simulation
for those 76 verbs is shown in table 5, while the 50 verbs that were not assigned any frame are listed in table 6. Each row of
table 5 represents a single verb. The symbols appearing in each row represent the frames assigned to that row's verb in the
simulation. The symbol NP stands for an NP complement, in, on and to stand for PPs headed by those prepositions, wh stands
for a tensed wh- clause, cl for a tensed clause with that or a null complementizer, and if for a tensed if clause. When one of
these symbols follows NP, it signifies a two-argument frame in which the first argument is realized as an NP. For easy reference
by frame, all the symbols for a given frame are aligned in one column.
Table 5 The 76 verbs that were assigned at least one syntactic frame
NP NPNP NPin NPon NPto NPwh cl in inf on to wh
ask NPNP NPwh
be NP on
blow NP
bring NP NPto
brush NP
buy NPNP
call NP
carry NP
catch NP
cheat NP
chew on
close NP
comb NP
cry NP
cut NP
do NP NPNP cl
drink NP
drive NP in
eat NP
feed NP NPto
feel NP inf
find NP wh
fix NP
get NP inf
give NP NPNP NPto
go in inf to
have NP inf
hold NP
hurt NP
leave NPon
lick NP
lift NP
listen to
live in
look in wh
make NP cl
miss NP
open NP
pat NP
play in
pretend cl inf
< previous page page_460 next page >
Page 460
Table 5 (Cont.)
NP NPNP NPin NPon NPto NPwh cl in inf on to wh
pull NP
push NP
put NP NPin NPon
read NP NPNP
record wh
ride NP in
rock NP
roll NP NPto
say NP
set NP
sew NP
show NP NPNP NPto
sing NP NPNP
sit in on
sleep on
smell NP
spill NP
splash NP in
start inf
take NP
talk on to
tell NP NPwh
think cl
throw NP NPto
touch NP
trot inf
try NP inf
turn NP
understand NP wh
wait NP
walk in on
wash NP
watch NP wh
water on
wear NP
Table 6 The 50 words that were identified as verbs but were not assigned any syntactic frames
add button clean color come cook dance fraw even fall fish fly forget grow hide invent jump keep kick knock laugh lay
lie move mow nap pick pitch pour rain ring rub run scratch share shave snow stack stand star step stick swim swing
umm wake wave work write zip
These results should be interpreted in light of the fact that the statistical inference technique does not answer 'yes' or 'no' as to
whether a given verb has a given frame. Rather, it answers 'yes' or 'no reliable indication of yes so far'.
It should also be noted that a number of the verbs discovered have auxiliary or modal functions, including be, do, go and have.
These are generally omitted from the following discussion, since they have special properties that must in any case be learned by
some process that does not directly involve subcategorization frames.
Now consider the results for each syntactic frame.
Direct object (NP). 59 verbs of the 126 verbs were deemed acceptable with a direct object and no other complement (NP). Most
of these are correct. Most common transitive verbs were assigned NP, while most intransitives, including come, fall, laugh,
listen, lie, live, look, nap, pretend, rain, run, sit, sleep, snow, step, talk and trot, were not. Many other words that are intransitive
except in specialized senses (swim the channel, work the metal) are not assigned NP either. It may be that these senses are rare
in child-directed speech.
The only error among the verbs assigned NP is put, which requires a location phrase (typically a PP). This is caused by
extraction in where questions. Cry is assigned NP for the wrong reasons: the cues do not detect inversion (''Oh no," cried
Pinocchio); and the transcripts are inconsistent about pause marks before names used in address (The bug'll cry June).
Double-object (NPNP) and the dative alternation. Ask, buy, do, give, read, show and sing are deemed acceptable with two NP
complements (NPNP). These are all correct.
The other half of the dative alternation is the NPto frame, which is correctly assigned to give, show, bring, roll, and throw.
Thus, the dative alternation is observed for give and show. Bring, feed, roll, throw and read can alternate, but they are assigned
only one of the two frames.
Infinitive VP (inf). Feel, get, go, have, pretend, start, trot, and try are all assigned the inf frame. Of these, feel is clearly
incorrect, resulting from six repetitions of the sentence You don't know how good it feels to wash my ears and scrub my heels. In
fact, feel requires an adjective as well as an infinitive, but the adjective has been fronted with how in this sentence. The inclusion
of trot also results from repetition of an unusual nursery-rhyme construction as off we trot to play. The correct analysis of this
utterance is not clear. The remaining verbs appear to license an infinitive complement, although in
semantically diverse capacities. In particular, go, have, and possibly get license an infinitive in their capacity as auxiliaries or
modals.
Tensed clauses (NPwh, wh, cl). Tell and ask are both deemed acceptable with both a direct object and a clause headed by a wh
word, as in Ask Daddy what he wants for lunch. As expected, these are both communication verbs.
Find, look, record, understand and watch are deemed acceptable with a clause headed by a wh word as their sole argument, as
in Look what I've got! Look is interesting because it is in fact acceptable with a wh clause but not with a direct object nor with a
that clause.
Do, make, pretend and think are judged to take a clause headed by that or a clause with no overt complementizer (cl). Pretend
and think are canonical mental state verbs. The relative poverty of such verbs in the output is discussed below. Make is mistaken
as taking a tensed clause because it takes an NP and a bare infinitive, as in I can't make John eat his peas. Because the cues are
check agreement, John eat is mistaken for a tensed clause. In many cases this construction could be distinguished from a tensed
clause, using either pronoun case or subject-verb agreement. However, such knowledge was not provided in the cues for this
simulation. Finally, do is judged to take a tensed clause because inverted questions like Do you know how to swim? are mistaken
for tensed clauses.
NPin, NPon. The NPto frame was discussed under the dative alternation. Put was assigned NPon and NPin, while leave was
assigned NPon. These are correct, though leave is acceptable with NPin as well.
in, on, to. Go, listen and talk were correctly assigned the to frame. For on and in it is difficult to determine which cases
represent subcategorized arguments and which represent adjuncts. Some uses of on and in clearly mark participants in the
action, as in drive in the car and talk on the phone. Others clearly represent locational adjuncts, as in sleep in the baby bed and
splash in the pool. A number of examples were difficult to classify.
4.3. Discussion
Overall, this experiment suggests that syntactic frames can be identified in child-directed English using relatively simple cues
based on function morphemes, proper nouns, and utterance boundaries, in combination with statistical inference. However, the
cues suggested in section 3 are slightly too simple to achieve high accuracy. Specifically, they do not distinguish between
interrogative and declarative sentences. This led to the erroneous assignment of NP to put, inf to feel, and cl to do (although do
is independently marked as idiosyncratic).
Since the declarative/interrogative distinction is clearly marked prosodically, children probably have access to it.
One potential problem is suggested by the fact that locational adjuncts were sometimes mistaken for arguments in the
simulation. However, it is not clear whether anyone can distinguish the two cases reliably in naturally occurring sentences.
Until the nature of the argument/adjunct distinction is better understood, it is difficult to see how the extent of the problem can
be assessed and cues for resolving it investigated.
In contrast to the generally positive results for identifying complements, this simulation failed to demonstrate that simple cues
based on function morphemes can identify all verbs that occur frequently in child-directed English. A number of high-frequency
mental state verbs, notably know, want see, hear, like and love, are missing from tables 5 and 6. These words were not detected
because they do not occur in the input corpus with the suffix -ing. Mental state verbs do not generally occur in the progressive
aspect and their nominalizations (e.g. knowing, liking) appear to be very in child-directed speech.
The transcripts used in this experiment comprise about 56 hours of interaction, so it is possible that more text would reveal that
nominalizations of mental state verbs are not intolerably rare. Further, for most of the transcripts the child was under 2;0, and it
seems likely that the use of semantically abstract words increases with the child's age. Nonetheless, the -ing suffix seems a
precarious basis on which to identify mental state verbs children's true strategies for identifying verbs are doubtless more
complex. One possibility is that they use more complex function-morpheme cues that exploit other inflections, auxiliaries,
modals, and pronoun case (Brent 1991). Since these cues are not without exceptions, it would make sense to posit statistical
inference mechanism for lexical category cues as well as for subcategorization cues. In fact, such an inference system would be
useful even in the current simulation, where water and even are both erroneously recorded as having verbal occurrences. (Both
have verbal senses, but they do not occur as verbs in the input corpus.) Another possibility is that children understand utterances
well enough to recognize that certain words stand for actions or states, and that they have an innate predisposition to classify
these as verbs (Grimshaw 1981).
The importance of the statistical inference component can be seen clearly by comparing its input, the raw observation shown in
table 2, with its output, the lexicon shown in table 5. The segment of the observation table shown in table 2 contains a number
of cases where a verb co-occurs with cues for a frame that the verb does not in fact have in its lexical entry. These include come
and look with a cue for NP, color, come and get with cues for a tensed
clause, and come with a cue for an infinitive. As a result of the inference procedure, none of these miscues lead to errors in the
lexicon shown in table 5.
know that, for example, to followed by an uninflected verb is usually an infinitive, while I followed by an uninflected verb is
not. In the simulation it was assumed that the learner knows a priori, perhaps even innately, which sequences of phrases
constitute possible syntactic frames. To make use of this knowledge, it is not enough to know that, for instance, to followed by
an uninflected verb is an instance of some linguistically significant category; rather, it is essential to know that it is an instance
of the category represented by 'infinitive' in the a priori list of frames. An alternative possibility is that, at least at this early
stage, children are simply forming distributional classes of verbs that reflect their ability to co-occur with various sequences of
function morphemes. Thus, English-speaking children might not know that to followed by an uninflected verb is an infinitive,
per se. Rather, they might know only that co-occurrence with this sequence is a significant distributional property of verbs, and
that it is a different one from co-occurrence with I followed by an uninflected verb. To put this question in context, however,
note that English-speaking children do eventually come to know the relationship between sequences of function morphemes and
syntactic structure. This phenomenon needs explanation regardless of one's theory about the early acquisition of
subcategorization frames, so the current proposals incur no extra explanatory burden.
Finally, note that statistical inference mechanisms of the sort proposed here have an important role to play in language
acquisition research in general. Developing such mechanisms is challenging because there is often no a priori basis for
predicting frequency distributions in linguistic domains. This leaves simulation as one of the only tools for determining when an
inference mechanism is adequate to a particular language learning problem. More experiments with inference procedures are
needed to help explain the robustness of language acquisition, and to establish that individual learning strategies can be
implemented robustly.
whose bins correspond to equal-sized subintervals of the unit interval (e.g. figure 1, page 455). By assumption, there is some bin
j0 such that verbs in bins j0 and lower are -S while those in bins above j0 are +S. The estimated miscue rate is the weighted
average of the relative frequencies of -S verbs by assumption, the verbs with relative frequency below j0.
The procedure ESTIMATEMISCUERATE estimates the miscue rate for a frame S by first estimating the ±S separator, j0, for S.
The inputs to the procedure are S the frame, verbs the list of observed verbs, N a function returning the total frequency for each
verb, f a function returning the total frequency of each verb with cues for each frame, bin-count the number of histogram bins to
use in plotting the relative frequencies, and min-sample the minimum number of times a verb must occur to be used in this
estimation. Each bin is evaluated as a possible value for j0, and the one with the highest evaluation is selected. The variable
cutoff-bin stores hypotheses about j0 while they are being evaluated. The estimation procedure involves only the -S verbs by
hypothesis, those with relative frequency below cutoff-bin. Thus, the variables total-verb-occurrences and total-cue-occurrences
refer to the total counts for verbs with relative frequency below cutoff-bin. Each hypothesis about j0 (stored in cutoff-bin)
generates a hypothesis about the miscue rate -s, which is stored in the variable rate. In particular, the miscue rate is
hypothesized to be the ratio of total-cue-occurrences to total-verb-occurrences, where, again, these refer only to the -S verbs.
This hypothesis predicts the shape of the distribution of relative frequencies for -S verbs. In particular, it predicts that the
distribution should be a mixture of binomial distributions with a common mean, the miscue rate, but different different sample
sizes, corresponding to the number of times each verb occurs in the input.
The procedure evaluates each hypothesis about j0 (and the consequent hypothesis about the miscue rate) by comparing the
predicted distribution for -S verbs to the observed distribution. The comparison is done by summing the squares of the
differences between the observed and predicted distributions of -S verbs at each bin. At bins above cutoff-bin the observed
distribution of -S verbs is taken to be uniformly zero.
ESTIMATE-MISCUE-RATE (S, verbs, N, f, bin-count, min-sample)total-verb-occurrences =
0, total-cue-occurrences = 0best-sum-of-squares = infinityFor cutoff-bin = 1 to bin-
count observed-dist [cutoff-bin] = 0
The expected distribution is a mixture of binomials with a common mean, rate, but different sample sizes, N(V). Compute
probability
distribution for verb V. scale it to bin-count bins, and add it to the mixture.
For orig-bin = O to N (V)
Scale orig-bin to the mixture histogram.
Add the binomial distribution for rate, and N (V) to the mixture
References
Adams, L. and T. Macfarland, 1991. Testing for adjuncts. In: Proceedings of the 2nd Annual Meeting of the Formal Linguistics
Society of Midamerica. Formal Linguistics Society of Midamerica.
Bates, E., I. Bretherton and L. Snyder, 1988. From first words to grammar: Individual differences and dissociable mechanisms.
Cambridge, MA: Cambridge University Press.
Bernstein-Ratner, N., 1987. The phonology of parent child speech. In: K. Nelson, A. vanKleeck (eds.), Children's language.
Vol. 6. Hillsdale, NJ: Erlbaum.
Bever, T.G., 1991. The demons and the beast Modular and nodular kinds of knowledge. In: N. Reilly, N. Sharkey (eds.),
Connectionist approaches to language processing, 212252. Hillsdale, NJ: Erlbaum.
Boguraev, B. and T. Briscoe, 1987. Large lexicons for natural language processing: Utilising the grammar coding system of
ldoce. Computational Linguistics 13(3), 203218.
Brent, M.R., 1991. Automatic acquisition of subcategorization frames from unrestricted English. Ph.D. thesis, Massachusetts
Institute of Technology.
Brent, M.R., 1993. From grammar to lexicon: Unsupervised learning of lexical syntax. Computational Linguistics 19, 243262.
Chomsky, N., 1965. Aspects of the theory of syntax. Cambridge, MA: MIT Press.
Chomsky, N., 1981. Lectures on government and binding, the Pisa lectures, Volume 9 of Studies in generative grammar.
Dordrecht: Foris.
Chomsky, N., 1986. Knowledge of language. Convergence. Westport, CT: Praeger.
Demuth, K., 1993. Issues in the the acquisition of the sesotho tonal system. Journal of Child Language 20, 275301.
Fernald, A. and P. Kuhl, 1987. Accoustic determinants of infant reference of motherese speech. Infant Behavior and
Development 10, 279293.
Fisher, C., H. Gleitman and L.R. Gleitman, 1991. On the semantic content of subcategorization frames. Journal of Cognitive
Psychology 23(3), 331392.
Fisher, C. and H. Tokura, 1993. Acoustic cues to clause boundaries in speech to infants: Cross-linguistic evidence. Manuscript,
Department of Psychology, University of Illinois.
Gelman, S.A. and M. Taylor, 1984. How two-year-old children interpret proper and common names for unfamiliar objects.
Child Development 55, 15351540.
Gerken, L., 1992. Young children's representation of prosodic phonology: Evidence from English-speaker's weak syllable
productions. Department of Psychology, SUNY Buffalo.
Gerken, L., B. Landau and R.E. Remez, 1990. Function morphemes in young children's speech perception and production.
Developmental Psychology 26(2), 204216.
Gerken, L. and B.J. McIntosh, 1993. The interplay of function morphemes and prosody in early language. Developmental
Psychology 29, 448457.
Gleitman, L., 1990. The structural sources of verb meanings. Language Acquisition 1(1), 356.
Gleitman, L.R., H. Gleitman and E. Wanner, 1988. Where learning begins: Initial representations for language learning, 348.
New York: Cambridge University Press.
Gold, E.M., 1967. Language identification in the limit. Information and Control 10, 447474.
Grimshaw, J., 1981. Form, function, and the language acquisition device, 165210. Cambridge, MA: MIT Press.
Gropen, J., S. Pinker, M. Hollander and R. Goldberg, 1991. Affectedness and direct objects: The role of lexical semantics in the
acquisition of verb argument structure. Cognition 41, 153195. Higginson, R.P., 1985. Fixing-assimilation in language
acquisition. Ph.D. thesis, Washington State University.
Hirsh-Pasek, K., D.G. Kemler-Nelson, P.W. Jusczyk, K. Write and B. Druss, 1987. Clauses are perceptual units for young
infants. Cognition 26, 26986.
Jusczyk, P.W., K. Hirsh-Pasek, D.G. Kemler-Nelson, L. Kennedy, A. Woodward and J. Piwez, 1993. Perception of accoustic
correlates of major phrasal units by young infants. Cognitive Psychology (in press).
Jusczyk, P.W. and E. Thompson, 1978. Perception of a phonetic contrast in multisyllabic utterances by 2-month-old infants.
Perception and Psychophysics 23, 105109.
Katz, N., E. Baker and J. MacNamara, 1974. What's in a name? A study of how children learn common and proper names.
Child Development 45, 46973.
Kemler-Nelson, D.G., K. Hirsh-Pasek, P.W. Jusczyk and K.W. Cassidy, 1989. How the prosodic cues in motherese might assist
language learning. Journal of Child Language 16, 5568.
Kimball, J., 1993. Seven principles of surface parsing in natural language. Cognition 2(1), 1547.
Landau, B. and L. Gleitman, 1985. Language and experience. Cambridge, MA: Harvard University Press.
Lederer, A. and M.H. Kelly, 1992. Prosodic information for syntactic structure in parental speech. Manuscript, Department of
Psychology, University of Pennsylvania.
Levin, B., 1993. English verb classes and alternations: A preliminary investigation. Chicago, IL: University of Chicago Press.
Lightfoot, D.W., 1991. How to set parameters. Cambridge, MA: MIT Press.
MacWhinney, B., 1991. The CHILDES project: Tools for analyzing talk. Hillsdale, NJ: Erlbaum.
Mehler, J., P.W. Jusczyk, G. Labertz, H. Halsted, J. Bertoncini and C. Amiel-Tison, 1988. A precursor of language acquisition
in young infants. Cognition 29, 143178.
Morgan, J., 1986. Drom simple input to complex grammar. learning, development, and conceptual change. Cambridge, MA:
MIT Press.
Morgan, J., R.P. Meier and E.L. Newport, 1987. Structural packaging in the input to language learning: Contributions of
prosodic and morphological marking of phrases to the acquisition of language. Cognitive Psychology 19, 498550.
Nespor, M. and I. Vogel, 1982. Prosodic domains in external sandi rules. In: H. van der Hulst, N. Smith (eds.), The structure of
phonological representations, Volume 1. Dordrecht: Foris.
Newport, E., L. Gleitman and H. Gleitman, 1977. Mother I'd rather do it myself: Some effects and non-effects of maternal
speech style. In: C.E. Snow, C.A. Ferguson (eds.), Talking to children: Language input and acuisition, 109149. New York:
Cambridge University Press.
Petretic, P.A. and R.D. Tweeney, 1976. Does comprehension precede production? Journal of Child Language 4, 201209.
Pinker, S., 1984. Language learnability and language development. Cambridge, MA: Harvard University Press.
Pinker, S., 1989. Learnability and cognition: The acquisition of argument structure. Cambridge, MA: MIT Press.
Radford, A., 1988. Small children's small clauses. Transactions of the Philological Society 86, 143.
Radford, A., 1991. The syntax of nominal arguments in early child English. Language Acquisition 1, 195223.
Selkirk, E., 1981. On the nature of phonological representation. In: J. Laver, J. Anderson (eds.), The cognitive representation of
speech, 379388. Amsterdam: North-Holland.
Shipley, E.F., C.S. Smith and L.R. Gleitman, 1969. A study in the acquisition of language: Free responses to commands.
Language 45, 322342.
Valian, V. and S. Coulson, 1988. Anchor points in language learning: The role of marker frequency. Journal of Memory and
Language 27, 7186.
Warren-Leubecker, A. and J.N. Bohannon, 1984. Intonation patterns in child-directed speech: Mother-father speech. Child
Development 55, 13791385.
Wexler, K. and P. Culicover, 1980. Formal principles of language acquisition. Cambridge, MA: MIT Press.
Wijnen, F., E. Krikhaar and E. den Os, 1993. The (non)realization of unstressed elements in children's utterances: A rhythmic
constraint. Journal of Child Language (in press).
Williams, L. and M. Bush, 1978. The discrimination by young infants of voiced stop consonants with and without release bursts.
Journal of the Acoustical Society of America 63, 12231225.
Zwicky, A., 1970. In a manner of speaking. Linguistic Inquiry 2, 223233.
1. Introduction
The question of how children acquire lexical entries for verbs, and in particular their subcategorisation frames is one of the
central questions concerning the child's acquisition of syntax. Its importance is enhanced by the recent tendency in theories of
grammar to gravitate to a lexicalist position, and the role of verbs as the head of their clause. How do children do it, given the
non-determinacy and automata-theoretic complexity of the syntax itself, and the unsystematic presentation and error-proneness
of the linguistic data that they apparently have to make do with? Michael Brent's paper in this volume shows how the statistical
technique of binomial error estimation can be used to minimise the effect of contamination in the data available to the child
language learner, arising either from errors in the input itself or errors in the child's analyses of the input sentences. The
technique is demonstrated by applying it to the sentences of a corpus of actual adultchild conversations, to derive
subcategorisation frames for verbs from analyses based on imperfectly reliable local syntactic cues defined in terms of sequences
of inflectional morphemes, function words and lexical NPs. As Brent
* Thanks to Lila Gleitman and Jeff Siskind for reading the draft. The research was supported in part by NSF grant nos.
IRI90-18513, IRI90-16592, and CISE IIP, CDA88-22719, DARPA grant no. N00014-90-J-1863, and ARO grant no.
DAAL03-89-C0031.
points out, these two aspects of the work are quite independent: binomial error estimation could be used to minimise the
influence of errors arising from imperfect analysis procedures of any kind at all, including those based on semantic and prosodic
information, as well as syntactic. The present paper considers the part that all of these sources of information may play.
2. Syntax
The specific application of this technique to low-level syntactic cues, rather than these richer sources of information, in this and
related work in Brent's (1991) thesis can be argued to deliver two further important results. First, it demonstrates a practical
technique that actually can be used to automatically build lexicons on the basis of large volumes of text. Although this point is
not discussed in Brent's present paper, it is worth emphasising. Hand-built dictionaries are inevitably very incomplete with
respect to the exhaustive listing of subcategorisation properties that are needed for many computational applications. Techniques
based on simplified syntactic properties which probabilistically 'compile out' syntactic and semantic properties of what a linguist
would regard as 'the grammar', and working on the basis of statistical properties of their distribution over a large corpus, may
well represent the only practicable possibility for automatically extending such dictionaries. Full-blown deterministic parsing of
the corpora of the requisite size using linguistically respectable grammars and/or semantic interpretations and deterministic
parsing, is impracticably expensive computationally, using existing techniques, to the extent that it is possible at all.
Second, the present study demonstrates the important fact that the information needed to determine verb subcategorisations
actually is there in the distribution of these very low-level properties in input of the kind that children are actually exposed to.
For example, one of the apparent problems for acquisition of subcategorisation frames on the basis of syntactic information
alone is the systematic ambiguity in all languages between subcategorised arguments and nonsubcategorised adjuncts, illustrated
for English by the following pair of sentences:
(1a) We put Harry on the bus
(1b) We met Harry on the bus
How can the child avoid erroneously subcategorising meet like put? Brent points out that it doesn't matter if they do, because it
is (presumably universally) the case that the relative frequency with which the sequence V NP PPon occurs will be significantly
higher for verbs like put that subcategorise for NPs and on PPs than for those like meet which subcategorise for NP and only
allow PP as an adjunct. Binomial error estimation is able to distinguish the two distributions, and reject the child's spurious
evidence from analyses suggesting that meet subcategorises for the PP. Similar results seem to follow for spurious occurrences
of subcategorisations arising from extraction, as in who did you put on the bus, which might appear otherwise to suggest that put
might subcategorise for PP alone.
It is therefore reasonable to ask whether the child language learner actually makes use of such purely syntactic cues to learn the
lexical categories of verbs. Here Brent is extremely cautious, and goes out of his way to acknowledge the possible involvement
of prosodic and semantic cues as well. He notes in passing that there are a number of open questions that need answering before
we can be quite comfortable with the assumption that the child is using the closed-class cues. The most important is that both
the function words themselves and the cue sequences based on them are language-specific. The question arises of how the child
can possibly catch on to the fact that it, that and the are cue words, much less that the sequence V it the suggests that V is
probably a ditransitive verb, while the sequence V that the suggests that V is probably a complement verb. It is hard to see that
there is any alternative to knowing, besides the set of possible subcategorisation frames, (a) the precise syntactic significance of
each closed class word as NP, Spec of CP, etc., and (b) some statistics about possible corpora, including facts such as that
complement-taking verbs are more common than ditransitives.1
I shall remain equally cautious in the face of such open questions, and certainly would not wish to claim that the child cannot be
using such cues. However, as long as these questions remain open, it also remains unclear whether we have escaped what Brent
identifies as the 'chicken-or-egg problem' of apparently needing to know some syntax to apply this procedure. This suggests that
there may be some point to asking ourselves what other resources the child could call upon, and in particular whether the two
alternatives that Brent mentions, prosody and semantics, can help a child
1 It is not enough to assume that the child simply looks out for verbs followed by all possible sequences of cue words,
classifying verbs as 'it + the verbs', 'that + the verbs', etc. Such a classification does not determine a subcategorisation.
learn the first elements of syntax, including their first subcategorisations, in the face of the kind of uncertainties in the input
which he identifies.
3. Prosody
Although Brent shows how misanalyses arising from the argument/adjunct ambiguity can be overcome, the consequences of
some other quite similar sorts of ambiguity, such as that between prepositions and particles illustrated below, are not so easily
eliminable by distribution-based methods, since verbs subcategorise for ambiguous items like up in both its guises:2
(2a) We rang up the hotel
(2b) We ran up the hill
In the case of particles and prepositions it seems intuitively highly likely that prosody disambiguates the two. Lederer and Kelly
(1991) have shown that adults can reliably identify which of the two a speaker has uttered. They have shown similar effects for
the argument/adjunct alternation. Kelly (1992 and this volume) presents results which suggest that a number of further apparent
ambiguities are also correlated with prosodic distinctions. It is true that none of these prosodic discriminators are invariably
present. Nor does it seem at all likely that all the relevant ambiguities are marked in this way. For example, I know of no
evidence that the V-PP sequence arising from extraction in (a), below, differs in any prosodic repect from that in (b):
(3a) Who did you put on the bus
(3b) Who did you run up the hill with
However, where the information is marked, it may well be reliable enough to be used as evidence under appropriate distribution-
based techniques such as Brent's own, especially when we recall that adult's speech to children is characterised by exaggeration
of normal intonation contours.
2 Brent suggests that spurious analyses of verbs like ring as subcategorising for PP can be eliminated by observing sets of
subcategorisations, presumably meaning that we can reclassify verbs that have been assigned subcategorisations of both
PP and NP + P. However, this is a distinct (and language-specific) complication to the proposal, and appears likely to
conflict with the other uses that have been proposed for such sets. The prosodic cues discussed below would allow this
particular complication to be eliminated from his account.
However, a word of caution is in order here. Adult speakers do not actually use intonation to indicate syntactic structure, but to
convey the distinctions of discourse meaning that are variously described in terms of 'focus', or of oppositions such as
'topic/comment', 'given/new' and the like. While the elements that are marked in this way correlate with syntactic structure, this
is for semantic reasons, rather than for ease of processing. When adults exaggerate intonation contours in speaking to children, it
is extremely unlikely that they are using the intonational markers in any very different way. It is therefore quite possible that
children use this information as a semantic, rather than syntactic, cue, as part of the third strategy under consideration here.
4. Semantics
As soon as it was appreciated that even quite trivial classes of grammar cannot be learned by mere exposure to their stringsets,
and that there appears to be little evidence that any more explicit guidance is provided by adults, it was obvious that some other
source of information, 'innate' in the sense that it is available to the child prelinguistically, must guide them in acquiring their
grammar. As has often been pointed out, the only likely candidate is semantic interpretation or the related conceptual
representation.3 However inadequate our formal (and even informal) grasp on the child's prelinguistic conceptualisation of the
conversational situation, there can be no doubt that it has one, for even non-linguistic animals have that much. There can
therefore be no doubt that this cognitive apparatus, for reasons which have nothing to do with language as such, partitions the
world into functionally relevant 'natural kinds' of the kind investigated by Landau in this volume, individual entities, including
events, propositions, and such grammatically relevant notions as actual and potential participants and properties of those events,
as well as the attitudes and attentional focus of other conversational participants. Since the main thing that syntax is for is
passing concepts around, the belief that syntactic structure keeps as close as possible to semantics, and that in both
3 In the context of modern linguistics, the suggestion goes back at least to Chomsky (1965: 5659) and Miller (1967). But
of course it is a much older idea. See Pinker (1979) for a review of some proposed mechanisms, including the important
computational work of Anderson (1977), and see Gleitman (1990) for some cogent warnings against the assumption that
such semantic representations have their origin solely in present perception and the material world in any simple sense of
that term.
evolutionary and child language acquisition terms, the early development of syntax amounts to little more than hanging words
onto the preexisting armatures of conceptual structure is so simple and probable as to amount to the null hypothesis.4
Of course, as Chomsky has repeatedly pointed out, this realisation gets us practically nowhere. We have such a poor grasp of
the nature of the putative underlying conceptual structures that it is difficult to even design experimental tests of the claim (quite
apart from the other difficulties that arise in doing experiments with prelinguistic children).5 Gleitman and others in the present
volume have made considerable headway in the face of these difficulties, but there is a long way to go. For similar reasons to do
with limitations on current knowledge, it does not seem to constrain syntactic theory in any very useful way. Right now (and
this is Chomsky's substantive point), the most reliable entry to the human system of language and symbolic cognition that we
have comes from the linguists' phenomenological grasp of the syntactic epiphenomenon, which has only just begun to look as
though it is yielding some insight into the underlying conceptual structure.
Nevertheless, the claim that semantics is the precursor of syntax is not without content, and has consequences for the question at
hand. In particular, it immediately entails that if we are asking ourselves why children do not classify meet as subcategorising
for NP PP on the basis of sentences like (1b), we met Harry on the bus, then we are simply asking the wrong question. A child
who learns this instance of this verb from this sentence must start from the knowledge that the denoted event is a meeting, and
that this involves a transitive event concept. It usually never crosses the child's mind that meet might subcategorise like put,
because the conceptual representation usually doesn't suggest that.
Once again, taking this position raises more questions than it answers. We are only just beginning to make sense of the complex
mapping between surface grammatical roles like subject and object, and the underlying thematic roles that seem to be
characteristic of the conceptual level. (I am particularly thinking of recent work by Grimshaw 1990.) It also raises the question
of
4 The use of the words 'little more' rather than 'nothing more' is important. It would not be surprising to find that some
part of syntax perhaps the observed constraints upon consistent orders across heads and complements had its origin
elsewhere than in semantics.
5 I am not saying that logicians and my fellow computer scientists do not have interesting formalisms for representing
conceptual structures. In fact these systems are the main source of formal theoretical devices that linguists have to draw
on. But as knowledge representation systems, none of them as yet seem particularly close to the human one.
whether the child's conceptual representation really can be used reliably in this fashion, which Pinker (1989) has called 'semantic
bootstrapping', and, if not, how the child can cope with its unreliability.
as one of a number of alternatives that syntactic bootstrapping correctly disambiguates, if learning is to take place at all.
One piece of circumstancial evidence in support of this conjecture is that adults work this way too. There is increasing
experimental evidence that the adult sentence processing mechanism deals with the huge degree of nondeterminism that arises
from natural grammars by appealing to meaning, filtering out the myriad spurious paths that the grammar permits on the basis of
whether they make sense, both on the basis of sentence-internal semantics, and of reference and extension in the context. This
semantic filtering of spurious paths which would otherwise overwhelm the computational resources of the processor has been
claimed to go on continually at every point in the parsing process, with very fine 'grain', probably more or less word by word.
(See Steedman and Altmann 1989 and Clifton and Ferreira 1989 for references and arguments pro and contra this proposal,
which ultimately comes from computer science, particularly in work by Winograd 1972.)
Nevertheless, it may well be the case, as Gleitman suggests, that children are frequently much more at sea than this, and may
even have much larger sets of propositions in mind, most or even all of which are irrelevant to the adult meaning. However,
recent computational work by Siskind (1992) shows that a process of intersecting such sets on successive encounters with the
verb can be used to eliminate the spurious meanings.6
Of course, children are not adults, and neither are they mind readers, and a meaning that seems 'appropriate' to them over a
number of iterations of this process may not be the same as the adult's. The child's concept of 'chasing' (we may imagine as an
extension of Gleitman's example) may be overspecifically restricted to an activity of attempting to catch by running. In this case,
their own future use may be characterised by 'undergeneralisation' for example, they may be unwilling to agree that a similar
scenario involving cars is chasing. There is of course a huge literature that has revealed the fine detail of this process.7 There
are also instances of overgeneralisation, and possibly
6 Two problems which Siskind leaves open are the problem of polysemous verbs, and the problem that arises when the
set of putative meanings derived from an occurrence of the verb are all spurious. Both of these eventualities will lead to
empty intersections. One simple tactic that might serve to distinguish them and thereby be used to maintain a coherent
lexicon would be to respond to an empty intersection by keeping both entries, relying on a tactic like binomial error
estimation to distinguish between true polysemous lexical entries and spurious ones on distributional grounds.
7 For example, Brown (1973), Bowerman (1973) and Clark (1973), and Carey (1982), the last including an extensive
review.
even more bizarre 'complexes', revealed in non-standard lexical meanings. There is also evidence that children predict new
lexical entries that they have not actually encountered, via lexical rules such as the rule that generates causative verbs from
certain adjectives, such as cool. This process may on occasion give rise to non-standard lexical causatives, as in # It colds my
hand, either because of slightly non-standard lexical rules, or because standard rules are applied to slightly non-standard lexical
entries. (See Bowerman 1982 and references therein.)
The way in which children successively modify non-standard lexical items to approximate the adult lexicon is the most
challenging and least well-understood part of the process. But the undoubted fact that the processes of syntactic and semantic
bootstrapping appear to iterate in this way suggests that together they may constitute the process by which children gain access
to concepts which are not immediately available to pre-linguistic sensory-motor cognition, and may thereby provide the force
behind the explosive change in cognitive abilities that coincides, both in evolutionary and in child-developmental terms, with the
appearance of language.8 Computational models of the kind proposed by Brent and Siskind will continue to provide the only
way in which theories of this process, such as syntactic, prosodic and semantic 'bootstrapping', can be developed and evaluated.
8 See Vygotsky (1962) for some early speculations on the nature of this process and its relation to Piagetian sensory-
motor development, and see Oléron (1953) and Furth (1961) for some suggestive early studies of the effects of
deprivation.
References
Anderson, John, 1977. Induction of augmented transition networks. Cognitive Science 1, 125157.
Bowerman, Melissa, 1973. Early syntactic development. Cambridge: Cambridge University Press.
Bowerman, Melissa, 1982. Reorganisational processes in lexical and syntactic development. In: Eric Wanner, Lila Gleitman
(eds.), Language acquisition: The state of the art, 319346. Cambridge: Cambridge University Press.
Brent, Michael, 1991. Automatic acquisition of subcategorisation frames from unrestricted English. Unpublished Ph.D.
dissertation, MIT, Cambridge MA.
Brown, Roger, 1973. A first language: The early stages. Cambridge, MA: Harvard University Press.
Carey, Sasan, 1982. Semantic development. In: Eric Wanner, Lila Gleitman (eds.), Language acquisition: The state of the art,
347389. Cambridge: Cambridge University Press.
Clark, Eve, 1973. What's in a word? In: T. Moore (ed.), Cognitive development and the acquisition of language. New York:
Academic Press.
Clark, Eve, 1982. The young word maker. In: Eric Wanner, Lila Gleitman (eds.), Language acquisition: The state of the art,
390428. Cambridge: Cambridge University Press.
Chomsky, Noam, 1965. Aspects of the theory of syntax. Cambridge, MA: MIT Press.
Clifton, Charles and Fernanda Ferreira, 1989. Ambiguity in context. Language and Cognitive Processes 4, 77104.
Furth, Hans, 1961. The influence of language on the development of concept formation in deaf children. Journal of Abnormal
and Social Psychology 63, 186389.
Gleitman, Lila, 1990. The structural source of verb meanings. Language Acquisition 1, 355.
Grimshaw, Jane, 1990. Argument structure. Cambridge, MA: MIT Press.
Gropen, Jess, Steven Pinker, Micelle Hollander and Richard Goldberg, 1991. Affectedness and direct objects: The role of lexical
semantics in the acquisition of verb argument structure. Cognition 41, 153196.
Kelly, Michael H., 1992. Using sound to solve syntactic problems. Psychological Review 99, 349364.
Kelly, Michael H. and S. Martin, 1994. Domain-general abilities applied to domain-specific tasks: Sensitivity to probabilities in
perception, cognition, and language. Lingua 92, 105140. (this volume).
Lederer, Anne and Michael Kelly, 1992. Prosodic information for syntactic structure in parental speech. Paper presented to 32nd
Meeting of the Psychonomic Society, San Francisco, 1991.
Miller, George, 1967. The psychology of communication. Harmondsworth: Penguin.
Oléron, Paul, 1953. Conceptual thinking of the deaf. American Annals of the Deaf 98, 304310.
Pinker, Steven, 1979. Formal models of language learning. Cognition 7, 217283.
Pinker, Steven, 1989. Learnability and cognition. Cambridge, MA: MIT Press.
Siskind, Jeffrey, 1992. Naive physics, event perception, lexical semantics, and language acquisition. Unpublished Ph.D.
dissertation, MIT, Cambridge, MA.
Steedman, Mark and Gerry Altmann, 1989. Ambiguity in context: A reply. Language and Cognitive Processes 4, 105122.
Vygotsky, Lev, 1962. Thought and language. Cambridge, MA: MIT Press.
Winogard, Terry, 1972. Understanding natural language. New York: Academic Press.
Index
A
''a," toddlers' understanding of, 154-157
Abstract idioms, 13-16
Adams, L., 451
Adult-directed speech
characteristics of, 85
styles of, 82-85
Adult language, 301-302
Alegria, J., 99
Allan, K., 286, 288
Alloy, L. B., 108
Altmann, Gerry, 84, 88, 478
Ambiguities, referential overlapping, 180-185
Amiel-Tison, C., 435, 441
Anderson, A., 87
Anderson, John, 475
Anderson, S., 28
Anderson, S. R., 36
Anglin, J. M., 234
Antell, S., 159
Armstrong, S., 124, 172, 175, 267, 338
Assimilations, 83, 87
Attneave, F., 120, 127
Au, T. K., 210, 215, 220, 251, 271, 301, 304
B
Babbling, language specificity in, 98
Bach, E., 309
Backscheider, Andrea, 199
Bailey, L. M., 220, 231
Baillargeon, R., 158, 160
Baker, C. L., 412
Baker, E., 149, 218-219, 231, 246, 251, 301, 314, 440
Baker, G., 380
Baker, M., 55, 413
Balaban, Marie T., 213, 229, 251
Baldwin, D., 271
Baldwin, D. A., 204, 213, 215-216, 230, 239, 300, 324
Ball, S., 83
Banigan, R. L., 222, 251
Banks, M., 272
Bard, E. G., 84, 87, 88
Barik, H. C., 83
Barnett, R. K., 85, 98
Baron, J., 121
Baron-Cohen, S., 180
Barsalou, L. W., 185
Bartlett, E., 237
Barwise, J., 309
Basic level, defined, 234
Bates, E., 341, 457
Bates, F., 201, 202
Batterman, N., 177
Bauer, N. K., 215
Bauer, P. J., 204, 205, 208, 234
Beck, D., 259
Becker, A., 272
Becker, A. H., 216, 316
Beckwith, R., 335
Bedford, Felice, 297
Begun, J. S., 341
Bellugi, U., 236
Benedict, H., 237, 303
Benveniste, L., 247, 248
Berlin, B., 236
Bernstein-Ratner, N., 86, 87, 457
Bertelson, P., 99
Bertoncini, J., 98, 435, 441
Bertrand, J., 222, 301
Berwick, R. C., 390, 433
Best, C. T., 126
Bever, T. G., 437
Biederman, I., 276, 289
Bijeljac-Babic, R., 98
Blaauw, E., 83, 86
Blake, J., 98
Bloom, L., 261, 338
Bloom, Peter, 149-150, 186, 200, 201, 238, 242, 251
on Possible names: The role of syntax-semantics mappings in the acquisition of nominals, 297-329; 301, 303, 307, 308, 309,
310, 312, 313, 314, 318, 322, 377
Bloomfield, L., 309
Blumstein, S. E., 126
Boguraev, B., 442
Bohannon, J. N., 457
Bolinger, D., 307, 379, 389
Bond, Z. S., 87
Bootstrapping, 337. See also Prosodic bootstrapping; Semantic bootstrapping; Syntactic bootstrapping
Borer, H., 36
Bornstein, M. H., 230
Boster, J. S., 275
Bower, G. H., 108
Bowerman, Melissa, 81, 148, 277, 282, 285, 287, 336, 371, 412, 414, 478, 479
Bowman, L. L., 203, 220, 221
Boyes-Braem, P., 234, 235, 268
Braine, M., 358
Braine, M. D. S., 424
Breedlove, D., 236
Brent, Michael R., on Surface cues and robust inference as basis for early acquisition of subcategorization frames, 433-470; 455,
463, 471-474
Bresnan, J., 57
Bretherton, I., 201, 202, 457
Briscoe, E. J., 91
Briscoe, T., 442
Broughton, J., 201
Brousseau, A.-M. 46, 62
Brown, G., 83, 87
Brown, G. D. A., 89
Brown, P., 285
Brown, R., 86, 132, 231, 234, 237, 261, 301, 313, 342, 345, 380
Brown, Roger, 478
Bruner, J. S., 334, 335
Bruno, N., 123
Bryant, E. T., 83
Bundy, R., 96
Burzio, L., 36, 57, 422
Bush, M., 439
Butterfield, Sally, 81, 129, 130
C
Callanan, M. A., 251, 303
Calvanio, R., 290
Campbell, R., 220
Carey, Susan, on Does learning a language require a child to reconceptualize the world?, 143-167; 150, 151, 162, 166, 180,
181, 215, 216, 230, 231, 237, 246, 251, 267, 276, 297, 298, 299, 300, 301, 302, 304, 317, 390, 478
Carlson, G., 308
Carpenter, P. A., 106
Carrell, T. D., 126
Carter, David, 81
Carter, D. M., 88, 89, 90, 91, 95, 129
Carter, R., 47
Carter, R. J., 396
Cassidy, K., 356
Cassidy, K. W., 97, 435, 441
Categories, assignment to grammatical, 132-135
Categorization, 123-124, 169-170
Causative alternation, 36
Causative verbs, preliminary analysis of, in English, 35-77
Cave, K., 263
Chapman, J. P., 172
Chapman, L. J., 172
Cheshire, J., 83, 87
Chi, M. T. H., 136, 251
Chierchia, G., 171
CHILDES database, 437
Child language, 303-305
Choi, S., 237, 277, 303, 412
Chomsky, N., 230, 307, 319, 337, 357, 378, 394, 425, 434, 436, 442, 475
Chosak, J., 259
Clark, E. V., 133, 200, 218, 231, 268, 301, 354, 379, 389, 390
Clark, Eve, 478
Clark, H., 283
Clark, H. H., 88, 133, 341
Clifton, Charles, 131, 478
Cohen, L. B., 157, 230, 241
Cole, R. A., 88
Colombo, J., 96
Coltheart, M., 89
Computer-synthesized speech, 83
Comrie, B., 75
Concepts, explanation-based constraints on, 185-192
Concepts-in-theories view, 171-174
Condon, W. S., 98
Constraints
subsuming, 315-316
syntactic and conceptual, on vocabulary growth, 333-375
on word learning, 306-310
Content, A., 99
Cooper, M., 88, 91
D
Dale, P. S., 98
Davis, S., 133
Davy, D., 83
Day, R. H., 261
de Boysson-Bardies, B., 85, 98
Default assumptions, word learning contraints as, 202-203
Default Linking Rule, 57-58, 69, 73
Demany, L., 98
Demuth, K., 439
den Os, E., 439
Depth perception, 121-123
Detransitivization, defined, 61
Developmental hypothesis, 201-202
Dickinson, D. K., 231, 251, 304, 315
Directed Change Linking Rule, 56-57, 69
Directional phrases, and transitivity, 69-74
Di Sciullo, A., 306
Disciullo, A. M., 7
Discontinuity proposal, 144
Divided reference, 144, 145
Dixon, R. M. W., 50, 249, 250
Dockrell, J., 220
Dommergues, J.-Y., 92, 94
Doughtie, E. B., 88
Dowty, D., 341, 349, 379
Dowty, D. R., 43, 51, 56
Dromi, E., 148, 201, 230, 237, 336
Druss, B., 97, 355, 435, 441
Dunn, J., 85
Dupoux, E., 92, 94, 96
Durand, C., 98
E
Echols, C. H., 217, 250
Egeth, H., 110
Eimas, P. D., 126
Erbaugh, M., 288
Estes, W. K., 108
Extensiveness, 23
F
Farah, M., 289, 290
Farrah, M. J., 214
Feldman, H., 277
Fellbaum, C., 379, 389
Ferguson, C., 79
Fernald, A., 85, 86, 98, 239, 250, 355, 435, 441
Ferreira, Fernanda, 131, 478
Fiengo, R., 12
Fillmore, C. J., 36, 45, 56, 366, 368
Fisher, Cynthia, 118, 200, 262
on Syntactic and conceptual constraints on vocabulary growth, 333-375, 336, 355, 357, 359, 362, 364, 365, 366, 368, 369,
380, 383, 384, 389, 393, 406, 412, 415, 416, 417, 418, 434, 435, 441
Fivush, R., 230
Fodor, J., 277, 299, 317
Fodor, J. A., 338
Fodor, J. D., 106, 131
Fontonelle, T., 44
Forced choice procedure, 240-241
Formal idioms, 9, 15-16
Fowler, A., 366
Fox, K., 250
Frame ranges, use of, in acquisition, 364-366
Frames, informativeness of multiple, 360
Frauenfelder, U., 92, 94
Frazier, L., 106, 131
Frequency information, sensitivity to, 119-121
Frith, U., 180
Fritz, J. J., 340
G
Gallistel, C. R., 109
Garnica, O., 85, 86
Garnsey, S. M., 132
Gathercole, V. C., 203, 301, 309, 311
Gee, J. P., 355
Gelman, S. A., 181, 204, 210, 219, 220, 222, 231, 235, 245, 251, 252, 300, 301, 314, 317, 336, 440
Gentner, D., 237, 249, 277, 303, 335, 336, 338
Gerken, Lou Ann, 99, 355, 359, 433, 439, 440
Gernsbacher, M. A., 120
Geyer, J., 368, 369
Gibson, E., 276
Gilette, J., 262
Gilette, J. A., 384
Givón, T., 341
Gleitman, Henry, 118, 124, 127, 172, 175, 200, 231, 234, 259, 261, 262, 267, 333, 335, 336, 338, 353, 355, 362, 364, 365, 366,
368, 369, 380, 383, 391, 403-404, 412, 415, 418, 434, 438, 439
Gleitman, Lila R., 97, 118, 124, 127, 172, 175, 199, 230, 231, 234, 259, 261, 262, 267, 277, 300, 301, 323, 324
on Syntactic and conceptual constraints on vocabulary growth, 333-375, 334, 335, 336, 338, 353, 355, 361, 362, 364, 365,
366, 368, 369, 377, 379, 380-381, 383, 391, 401, 402, 403-404, 406, 411, 412, 415, 416, 417, 418, 433, 434, 438, 439, 440,
475, 477
Glenn, S. M., 96
Gluck, M. A., 108
Glusman, M., 210, 220, 301
Gold, E. M., 456
Goldberg, Richard, 117, 353, 370, 389, 396, 398, 412, 435, 477
Goldfield, B. A., 237
Goldin-Meadow, S., 277, 336
Golinkoff, R. M., 200, 222, 231, 301, 353, 354, 359, 380, 403-404
Goodman, N., 230, 298, 299
Gopnik, A., 230, 237, 250, 303
Gordon, P., 147, 238, 242, 251, 309, 311, 312, 370
Gottfried, G., 181
Gould, J. L., 202
Gould, S. J., 106
Gray, W. D., 234, 235, 268
Grieser, D. L., 86
Grimshaw, Jane, 238, 262, 301, 322, 337, 341, 356, 358, 384
on Lexical reconciliation, 411-430, 412, 415, 421, 433, 434, 463, 476
Gropen, Jess, 117, 353, 370, 377, 389, 396, 398, 412, 435, 477
Grosjean, F., 88, 355
Gruber, J. S., 368, 377
Guerssel, M., 48
H
Haith, M. M., 250
Hale, K. L., 37, 41, 43
Hall, D. Geoffrey, 218, 219, 220, 229, 231, 244-245, 246, 249, 251, 252
on Syntactic and conceptual constraints on vocabulary growth, 333-375, 335, 415, 416, 417, 418
Hall, G., 406
Halle, M., 22
Hallé, P., 98
Halliday, M. A. K., 201
Halsted, H., 435, 441
Hammond, K., 290
Hampson, J., 303
Hardy, J., 358
Harrington, J. M., 88, 91
Hasher, L., 119
Haspelmath, M., 36, 38, 53, 66
Hatano, Giyoo, 81, 93, 94, 181
Hayes, J. R., 88
Heath, S. B., 86
Heim, L., 211, 212, 250
Helm, A., 366
Hempel, C. G., 173, 175
Henderson, J. M., 131
Herskovits, A., 262, 278
Hintzman, D. L., 119
Hirsch, E., 161, 318
Hirsch-Pasek, K., 97, 200, 220, 231, 353, 354, 355, 359, 380, 403-404, 435, 441
Hoekstra, Teun, 4, 59, 70
Hollander, M., 117, 353, 370, 389, 396, 398, 412, 435, 477
Hornby, A. S., 44
Horowitz, F. D., 86
I
Idiom(s), 8-21
families, 18-21
with instances, 13-15
as well-formed structures, 10-13
Ihsen, E., 261
Immediate Cause Linking Rule, 55-56, 72
Inagaki, K., 181
Individuation, principles of, in younger infants, 158-160
Infant's speech environment, 85-87
Inhelder, B., 235, 271
J
Jackendoff, R. S., 47, 59, 260, 262, 268, 279, 280, 286, 307, 309, 317, 318, 319, 366, 368, 379, 384, 396, 412
Jacob, M., 213, 251
Jacquet, R. C., 354
Jakimik, J., 88
Jenkins, L., 250
Jespersen, O., 38, 41
Johns-Lewis, C., 83, 87
Johnson, D. M., 234, 235, 268
Johnson, I., 88, 91
Johnson, K. E., 275
Johnson-Laird, P., 262, 278
Johnston, J. C., 124
Johnston, J. R., 277, 281, 283
Jones, C. M., 119
Jones, S., 216, 261, 267, 269, 270, 271, 272, 273, 274, 297, 300, 301, 302, 305, 308, 315
Jones, S. S., 215
Jonides, J., 119, 120
Joshi, A., 358
Joyce, P. F., 96
Jusczyk, Peter W., 81, 97, 98, 355, 359, 435, 439, 441
Just, M. A., 106
K
Kako, E., 354
Kaplan, T., 250
Karmiloff-Smith, A., 311
Karzon, R. G., 98
Katz, B., 380
Katz, N., 149, 218-219, 231, 246, 251, 301, 314, 440
Kayne, R., 358
Keating, D. P., 159
Keeble, S., 230, 340
Keenan, E. L., 358
Keil, Frank C., 136
on Explanation, association, and acquisition of word meaning, 169-196, 173, 177, 178, 179, 181, 182, 184, 267, 297, 302
Kello, C., 131
Kelly, Michael H., 81
on Domain-general abilities applied to domain-specific tasks, 105-140, 133, 134, 435, 441, 474
Kemler-Nelson, D. G., 97, 355, 435, 441
Kennedy, L., 97, 355, 435, 441
Keyser, S. J., 36, 37, 41, 43
Kimball, J., 106, 437
Klatt, D. H., 83, 355
Klint, K., 215
Knowledge, influence of, 272-275
Kolinsky, R., 99
Kosowski, T. D., 204, 231, 240
Kosselyn, S., 263
Kowal, S., 83
Krikhaar, E., 439
Kroch, Tony, 35
Kruger, A. C., 335
Kuhl, P., 85, 86, 96, 98, 435, 441
L
Labertz, G., 435, 441
Labov, W., 83, 87
Lacerda, F., 96
Lakoff, G., 43, 44, 172, 177, 185, 298
Lamel, L., 88, 91
Landau, Barbara, 97, 99, 127, 185, 186, 199, 215, 230, 231
on Where's what and what's where, 259-296, 260, 261, 262, 263, 267, 268, 269, 270, 271, 272, 273, 274, 277, 279, 281, 283,
284, 286, 297, 300, 301, 302, 305, 308, 315, 317, 324, 333, 336, 355, 361, 366, 379, 380-381, 383, 401, 402, 411, 412, 415,
417, 418, 434, 439
M
Macario, J., 316
Macfarland, T., 451
MacKain, K., 85, 98
Maclay, H., 132
Macnamara, J., 143, 149, 161, 218-219, 231, 246, 249, 251, 300, 301, 304, 309, 314, 317, 322, 324, 380, 440
MacWhinney, B., 341, 437, 457
Madole, K. L., 241
Mandel, D. R., 355, 359
Mandler, J. 340
Mandler, J. M., 204, 205, 208, 230, 234
Manis, M., 120
Mann, V. A., 99
Mannle, S., 222
Manzini, R., 97
Mapping(s)
from subcategorization frames, 418
syntax-semantics, 297-329
Marantz, A. P., 55, 56, 59
Marcus, Gary F., 377, 390
Marentette, P. F., 97
Markman, Ellen M., on Constraints on word meaning in early language acquisition, 199-227, 202, 203, 204, 210, 213, 215, 216,
218, 220, 222, 230, 231, 235, 239, 246, 251, 252, 271, 297, 299, 300, 301, 303, 304, 306, 308, 309, 323, 324, 325, 353, 379,
390
Markow, D. B., 241, 250
N
Nagel, H. N., 132
Naigles, L. G., 200, 231, 301, 353, 357, 365, 366, 380, 403-404, 404-405, 412
Named places, representation of, 277-286
Nedjalkov, V. P., 36, 38, 52-53
Nelson, K., 201, 203, 230, 236, 237, 249, 301, 303, 305, 323, 335, 336
Nespor, M., 440
Newport, Elissa L., 236, 357, 433, 435, 437, 438
Nicol, Janet, 297
Ninio, A., 335
Nisbett, R. E., 118
Nishigauchi, T., 308
Nominals, and syntax-semantics mappings, 297-329
Norris, D. G., 89, 90, 93-94, 127, 130
Norris, Dennis, 81
Nouns, first, and word-to-world pairing, 355-356
Noun syntax, toddler sensitivity to, 149
Novelty-preference procedure, 241-246
Numerical identity, principles of, in younger infants, 160
O
Oakes, L. M., 241
Object categorization, and linguistic form class observation, 234-237
Objects
representation of, 267-277
rigid, and shape bias, 269-272
vs. space, language of, 259-296
spatial representation of, 289-291
O'Brien, E. A., 83
Observations table, 444
Ochs, E., 86
O'Connell, D., 83
Ohala, J. J., 83, 86, 87
< previous page page_488 next page >
Page 488
P
Paccia-Cooper, J., 355
Paden, L., 250
Pang, K., 110
Papousek, M., 85
Paradigms, 21-31
abstractions of, 24-29
learning structure, 29-31
Parsing, 130-132
Parsons, T., 307
Partial sentential representation (PSR), 356
Pattern of syncretism, 25
Peery, M. L., 215
Perception, influence of, 272-275
Perlmutter, D. M., 35, 37, 58, 60, 73, 422
Pesetsky, D., 413
Peters, Ann M., 81, 99
Peterson, Mary, 297
Petretic, P. A., 440
Pettito, L. A., 97
Phillips, A., 177, 180
Phoneme sequences, 127-129
Phonological elisions, 83, 87
Phrase(s), lexical categories vs., 306-308
Phrase structure, and first verbs, 356-359
Piaget, J., 235, 271
Pignot, E., 99
Pilon, R., 88
Pinker, Stephen, 35, 36, 41, 47, 53, 55, 67, 71, 114, 115, 117, 143, 230, 238, 262, 301, 325, 337, 340, 341, 353, 356, 358, 367,
368, 370
on Using verb syntax to learn verb semantics, 377-410, 379, 384, 385, 386, 387, 389, 390, 396, 398, 400, 401, 411, 412,
415, 417, 427, 434, 435, 475, 477
Piwez, J., 435, 441
Piwoz, J., 97, 355
Place, spatial representations of, 289-291
Polka, L., 96
Polysemy, problem of, 368-369
Porac, C., 119
Postal, P., 73
Prasada, S., 238, 242, 251, 304
Premack, D., 298
Prince, Alan, 411
Principle of contrast, 301
Probabilistic (approximate) cues, 437
Probabilistic information
as domain-general, 115
and language processing, 125-135
Proctor, P., 49
Projection Principle, 357, 425
Prosodic bootstrapping, 386-387
Prosodic structures, and word segmentation, 129-130
Prosody, 440-441, 474-475
and prelinguistic infant, 97-99
"Protoagent" classification scheme, 349
Putnam, H., 177
Pye, C., 86
Q
Quine, W. V. O., 144, 145, 146, 152, 177, 199, 230, 260, 298, 309, 335, 378
Quirk, R., 89
R
Radford, A., 439
Rakowitz, Susan, on Syntactic and conceptual constraints on vocabulary growth, 333-375, 406, 415, 416, 417, 418
Rambow, O., 358
Randall, J., 412
Rappaport, M., 366
Rappaport Hovav, Malka, on Preliminary analysis of causative verbs in English, 35-77, 40, 47, 51, 52, 54, 56, 57, 58, 59, 60, 68,
70, 73, 411, 412
Raven, P., 236
Rayner, K., 131
Read speech, 83
word recognition in, 84
Reconciliation, 422-427
Redanz, N. J., 98-99
Reicher, G. M., 124
Reinhart, T., 72
Remez, R. E., 83, 99, 439
Repp, B. H., 126
Rescoria, B. A., 105, 111, 112, 113, 114
Resnick, P., 363
Rey, G., 175
Reyes, G., 301, 309
S
Saegert, S., 119
Sagart, L., 98
Salapatek, P., 272
Salmon, W. C., 194
Sander, L. W., 98
Schachter, P., 272
Schaffer, C. A., 83
Scheuneman, D., 250
Schieffelin, B. B., 86
Schlesinger, I. M., 311, 341
Schull, J., 202
Segmentation
by default, 88-89
stress-based, 95
Segmentation problems, rhythmic solutions, 81-104
Segui, Juan, 81, 92, 94, 96, 97
Self, P. A., 250
Seligman, M., 336
Seligman, M. E. P., 136
Selkirk, E., 440
Semantic bootstrapping, 337, 415, 434. See also Prosodic bootstrapping; Syntactic bootstrapping
Semantic idioms, 9
Semantics, 475-477
interaction of, with syntax, 351-352
language-specific linkage of, to syntax, 368
Senghas, A., 222, 231, 247, 248, 251
Shaklee, H., 119
Shannon, B., 118
Shape, as basis for judgment, 187-191
Shape bias, non-linguistic construal of, 317
Shapiro, B. J., 120
Shapiro, L. P., 132
Shaw, L. L., 303
Shedler, J. K., 120
Shepperson, B., 251, 315
Shettleworth, S. J., 202
Shibatani, M., 36
Shillcock, R. C., 84, 88
Shipley, Elizabeth F., 229, 251, 315, 440
Shipman, D. W., 91
Shoben, E. J., 173
Shockey, L., 87
Shute, H. B., 86
Siegler, R. S., 181
Silnitsky, G. G., 36
Simon, T., 85, 86, 355
Simons, Dan, 169
Simpson, J., 59
Siskind, Jeffrey, 471, 478
Slater, A., 177
Slobin, D. I., 246, 277, 281, 283, 334
Smiley, P., 148, 213, 323
Smith, C. S., 45, 48, 61, 67, 440
Smith, E., 268
Smith, E. E., 123, 171
Smith, L., 261, 271, 272, 273, 274
Smith, L. B., 215, 216, 267, 268, 269, 270, 272, 297, 300, 301, 302, 305, 308, 315
Snow, C., 83
Snyder, L., 457
Snyder, L. S., 201, 202
Soja, Nancy N., 150, 151, 154, 155, 215, 216, 231, 246, 251, 297, 299, 300, 302, 304, 313, 315, 317
T
Tabachnik, N., 108
Taeschner, T., 85
Talmy, L., 14, 70, 249, 262, 278, 368, 379, 403
Tanenhaus, M. K., 131, 132
Tanz, C., 277
Taxonomic assumption, 203-215
Taylor, M., 222, 231, 245, 251, 252, 300, 301, 314, 440
Tenny, C., 56
Terrace, H. S., 202
Theta Criterion, 425
Thompson, E., 98, 439
Tinker, E., 335
Toddler lexicon, composition of, 148
Tokura, H., 355, 359, 435, 441
Tomosello, M., 203, 214, 222, 230, 335
Transitivity, directional phrases and, 69-74
Trueswell, J. C., 131, 132
Tuaycharoen, P., 86
Tweeney, R. D., 440
U
Unaccusativ(ity) Hypothesis, 37, 58-61
Underwood, B. J., 127
Ungerleider, L. G., 260, 289
Universal Grammar (UG), 413
principles of, 419
Utman, J. G. A., 98
V
Valian, V. V., 238, 242, 251, 435, 437
Vanandroye, J., 44
Vela, E., 215
Vera, A., 184
Verb(s)
dyadic, 35, 74
of emission, 73
first, and phrase structure, 356-359
internally caused, 66-69
of manner of motion, 70
< previous page page_491 next page >
Page 491
monadic, 35, 74
semantic biases in interpretation of, 348-351
Verb categories, acquisition of, 471-480
Verb learning, syntactic support for, 337-340
Verb mapping
power of frame ranges for, 360-363
search space for, 369-360
Verb meanings, learning of, from verb frames, 403
Verb semantics, and verb syntax, 377-410
Verb syntax, and verb semantics, 377-410
Vihman, M. M., 98
Vocabulary, role of, 286-288
Vocabulary growth, syntactic and conceptual constraints on, 333-375
Vogel, I., 440
Voice onset time (VOT), 125
Vurpillot, E., 98
Vygotsky, Lev S., 148, 176, 479
W
Wachtel, G. F., 210, 218, 220, 231, 246, 251, 252, 300, 301, 304
Wagner, A. R., 113, 114
Wakefield, J. A., 88
Walker, A., 276
Walker, E., 319
Walley, A. C., 126
Wang, Q., 98
Wanner, E., 97, 127, 355, 439
Ward, B. T., 216
Ward, T., 272
Ward, T. B., 215, 216
Ware, R., 309
Warren-Leubecker, A., 457
Wasow, J., 222
Wasow, T., 36
Wasserman, E. A., 119
Watson, G., 91
Wax, N., 181
Waxman, Sandra R., 204, 210, 211, 212, 222
on The development of an appreciation of specific linkages between linguistic and conceptual organization, 229-257, 230,
231, 235, 236, 240, 241, 244-245, 246, 247, 248, 249, 250, 251, 300, 301, 335
Weinreich, U., 309
Wenger, N. R., 220
Wenger, R. N., 231
Werdenschlag, L., 222
Werker, J. F., 96
Werner, H., 177
Wexler, K., 97, 390, 437
Whalen, D. H., 98
Whole-object assumption, 215-217
Whorf, B., 309
Wickberg, John, 35
Wierzbicka, A., 309
Wiggins, D., 145, 161
Wijnen, F., 439
Wilkins, W., 423
Williams, Edwin, on Remarks on lexical knowledge, 7-34, 341, 412
Williams, J. N., 127
Williams, K. A., 96
Williams, L., 439
Wilson, M. D., 89
Wilson, R., 117
Winograd, Terry, 478
Woodfield, A., 181
Woodward, A., 97, 177, 180, 190, 355, 435, 441
Woodward, A. L., 202, 203, 216-217
Word(s)
learning, 31-34
for novel objects/non-solid substances, 150-154
Word boundaries, identification of, 126-127
Word learning constraints, 305-310
as default assumptions, 202-203
as domain-specific, 203
Word meanings
constraints on, in early language acquisition, 199-227
explanation, association, and acquisition of, 169-196
Word segmentation, 129-130
Word-spotting, 90
Word-to-world pairing, and first nouns, 355-356
Wright, J. C., 267
Wright, L., 181
Write, K., 435, 441
Wynn, Karen, 159, 297, 315, 317
X
Xu, F., 162
Y
Yom, B.-H. L., 88
Younger, B. A., 157, 230
Young infants
conceptual differences vs. adults, 160-166
individuation and, 158-160
numerical identity and, 160
Z
Zacks, R. T., 119
Zaenen, A., 57, 60
Zajonc, R. B., 119, 120
Zoom lens aspect, 370
Zubizarreta, M.-L., 412
Zue, V. W., 88, 91
Zwicky, A., 362, 434