Rules vs. Analogy in Mandarin Classifier Selection
Rules vs. Analogy in Mandarin Classifier Selection
2:187-209, 2000
James Myers
National Chung Cheng University
1. Introduction
Work done over the last decade on the Mandarin noun classifier system has taught
us much about the nature of nominal semantics and human categorization. In this paper,
however, we are not particularly interested in nominal semantics or human
categorization. Instead, we aim to use the Mandarin noun classifier system as a source
of data for a different issue, something that might be called the rule/analogy debate.
This is the debate between researchers who maintain the classical generativist line that
human language is processed by symbol-manipulating rules (e.g. Pinker and Prince
1988, Marcus, Brinkmann, Clahsen, Wiese, and Pinker 1995), and connectionists who
claim that human language is processed solely by analogy (e.g. Rumelhart and
*
The research in this paper was supported by a National Chung Cheng University seed grant.
Much of the corpus analysis for this paper was carried out with the help of Chiang Cheng Chih
(蔣承志) and Gong Shu-ping (龔書萍). We’d like to thank Huang Chu-ren, two anonymous
reviewers, and members of the NCCU Research Center of Cognitive Science, and blame David
Kemmerer for introducing us to the rule/analogy debate in the first place. Errors are of course
our own responsibility.
James Myers
McClelland 1986, Hare, Elman, and Daugherty 1995). Here we use the term ‘analogy’
in the sense used in historical linguistics, where patterns are generalized to new cases
by referring to examples, as for instance when dive became irregular (i.e. dived became
dove) by analogy with pairs like drive-drove. The connectionists claim that all language
processing involves analogy of this sort. Their opponents concede that analogy is
involved in irregular inflection, but insist that the most interesting parts of language
(e.g. regular inflection) are rule-driven.
For such a deep issue, it is strange that this battle has so far been fought only over
inflection (especially past-tense inflection in English). We suggest that the arguments
that have been used to support the existence of rules in inflection work equally well to
support the existence of rules in the Mandarin classifier system. Specifically, we argue
that the classifier 個 ge is the unique ‘general’ classifier, selected by a default rule; the
remaining ‘specific’ classifiers are selected by analogy with exemplars. This is
essentially what has been long assumed in different terms (e.g. Li and Thompson 1981),
though the assumption has recently come under fire (e.g. Loke 1994, Tyan 1996).
Rather than being merely a defense of the status quo, however, this paper brings a new
way to understand the claim that ge is ‘general,’ and moreover, we also provide new
evidence for it.
We begin the discussion in section 2 by sketching out the highlights of the
rule/analogy debate in inflection. In section 3, we highlight in an equally sketchy way
some properties of the Mandarin classifier system, defining the basic kinds of
classifiers that we will be comparing. In section 4 we describe why we think most
classifiers are processed by analogy. The heart of the argument then comes in section 5,
in which we show how the behavior of ge implies that it must be selected by rule, and
not by analogy. New evidence for this claim comes from analyses carried out with the
Academia Sinica Balanced Corpus (described in Chen, Huang, Chang, and Hsu 1996,
public Web access is at http://www.sinica.edu.tw/ftms-bin/kiwi.sh). The evidence all
boils down to one observation: ge is used in a variety of situations that have nothing in
common except the impossibility for speakers to form analogies with examples in
memory. That is, ge is truly selected by default. Finally, in section 6 we point the way
to future research.
Analogy says that dived became dove because dive is similar to drive; if words are
similar in some ways (e.g. they rhyme), they should be similar in others as well (e.g.
their past tense forms should rhyme). Generative linguists, and until recently most
cognitive psychologists, have not been too enamored of analogy as an explanation,
188
Rules vs. Analogy in Mandarin Classifier Selection
however, since there was no clear way to decide when it worked and when it didn’t. For
example, if ears hear, why don’t eyes heye (Kiparsky 1988)?
Connectionism provides a way of formalizing analogy. It does this by encoding
similar forms as overlapping representations in a network. The result is that forms that
are ‘similar’ (i.e. overlap often enough) will tend to behave the same way.
The first connectionist model of linguistic analogy was Rumelhart and McClelland
(1986), a simple network that was taught to directly associate English present tense
forms with past tense forms. With enough training the network was able to generalize
to new forms by implicitly referring to the examples that it had learned. For example,
given untrained irregular and regular verbs like weep and drip, it correctly responded
with wept and dripped. Moreover, the model seemed to treat regular past tense as a
special case, overregularizing many irregular verbs, just as children acquiring English
do. Rumelhart and McClelland (1986) thus drew the reasonable though dramatic
conclusion that human language (at least English tense inflection) did not require rules.
There was no regular rule such as ‘add -ed’; instead, all past tense forms were derived
by analogy.
However, subsequent work by the psycholinguist Steven Pinker and colleagues
(e.g. Pinker and Prince 1988, Pinker 1991, Prasada and Pinker 1993) found several
weaknesses in this model. The most fundamental was that the model was incapable of
creating a true default, a category that is defined negatively as the ‘elsewhere’ case for
all ‘miscellaneous’ items that don’t fit into any of the other categories. The Rumelhart
and McClelland (1986) model treated the -ed class as special only because it was so
large; otherwise the -ed class was just one similarity-defined class among many.
This is not to say that connectionism is inappropriate for irregular inflection. There
are good reasons for believing that people do in fact process irregular inflection by
analogy. The first argument for this is that they can’t be processing it by rule; any
general rule you might come up with (such as ‘ive → ove’ to account for drive-drove)
will both overgenerate (e.g. arrive-*arrove) and undergenerate (e.g. rise-rose will be
unexplained, though it shows the same i~o alternation). More importantly, people tend
to extend irregular patterns more readily when there are more exemplars in the lexicon.
For instance, Bybee and Moder (1983) found that speakers give the nonsense word
spling the past tense form splung (by analogy with cling, fling, sling, sting, string, and
wring) more often than they give shink the past tense form shunk (which is only similar
to shrink, slink and stink). In short, real people, like connectionist models, use irregular
inflection by making analogies with exemplars in memory.
By contrast, real people treat regular inflection as a unique default case, as if
processed by an exemplar-independent general rule. All of the arguments for this are
based on evidence that regular inflection is used precisely when speakers cannot access
189
James Myers
examples in memory with which to form analogies. This can happen in a bewildering
variety of ways that have nothing in common except that lexical access cannot be
involved. Marcus et al. (1995) list twenty-one such ways, including: when forms are
too different from exemplars in memory (e.g. unusual sounding words as in He
out-Gorbacheved Gorbachev); when lexical entries are too weak (e.g. low-frequency
words like chided, originally chid); when forms are being mentioned rather than used
(e.g. quotations like There are two ‘man’s in the phrase ‘man to man’); when forms are
derived from another category (e.g. denominal verbs like striked ‘went on strike’); and
when memory problems make lexical access difficult (e.g. with children or anomic
aphasics, both of whom overuse regular inflection). Such evidence has led Pinker and
colleagues to support a hybrid model of inflection, whereby only irregular inflection is
handled by analogy; regular inflection is handled by rule.
Connectionists have not remained idle in the face of such evidence. Work by
MacWhinney and Leinbach (1991), Plunkett and Marchman (1991, 1993) and Hare et
al. (1995) all argued that the failure of the McClelland and Rummelhart (1986) model
was primarily caused by its overly simple structure. In order to achieve success with
English inflection, however, the newer models all have overly complex structures. For
example, the Hare et al. (1995) model seems capable of learning defaults even when
they do not form the largest class, but it does this by building in the assumption that -ed
is special, reserving nodes just for this ending that ‘inhibit’ nodes for the vowel (which
should change in irregular verbs but not in regular ones). It appears, then, that
connectionism is technically able to simulate default effects, but only by hard-wiring
more ‘rule-like’ structures.
Why is there such a fuss over what seems at first to be a minor technical issue
about the mechanics of inflection? The reason is that the debate really concerns what is
special about human cognition. Are people (and other animals) merely associationist
machines as the behaviorists believed, infinitely moldable by experience? Or do people
have built-in mental structures of some sort that give them the ability to jump beyond
similarity-driven analogy into the domain of general symbol-manipulating rules? Pinker
(1999) provides a book-length meditation on such questions, and his perspective is
made clear in the title of one of its chapters: ‘A Digital Mind in an Analog World.’ As
he wrote in an earlier book:
People think in two modes. They can form fuzzy stereotypes by
uninsightfully soaking up correlations among properties.... But people can
also create systems of rules... that define categories in terms of the rules
that apply to them....(Pinker 1997:127)
190
Rules vs. Analogy in Mandarin Classifier Selection
As examples of rules in human cognition, he lists not just grammar, but also
kinship systems, laws, arithmetic, folk science, and social conventions. This is a rather
ambitious vision, but it raises a much less ambitious but still intriguing question. If the
rule/analogy dichotomy is found in inflection because this dichotomy is fundamental to
the makeup of the human mind, then shouldn’t we expect to find it in other aspects of
language as well? Could it even be found in a language that is famed for its virtual lack
of overt inflection?
Thus we are led to something that at first seems completely different: the
Mandarin noun classifier system. Many languages mark semantically-defined noun
classes with special morphemes (see Allan 1977, Aikhenvald 1997). In Mandarin, this
involves the requirement that NPs containing numbers or determiners must include a
monosyllabic morpheme called a CLASSIFIER (or sometimes MEASURE WORD). In this
section we give a general overview of the Mandarin classifier system, ending with an
observation that sets the stage for the rest of the paper.
There are actually several different kinds of morphemes that fall under the
umbrella term ‘classifier’ that vary considerably in their semantic properties. Some
kinds of classifiers are typically used with mass nouns, such as standard measures (e.g.
一磅肉 yi-bang rou ‘a pound of meat,’ 一斤肉 yi-jin rou ‘a catty of meat’), container
measures (e.g. 一杯茶 yi-bei cha ‘a cup of tea’, 一碗飯 yi-wan fan ‘a bowl of rice’)
and partitive measures (e.g. 一塊蛋糕 yi-kuai dangao ‘a piece of cake’, 一片土司
yi-pian tusi ‘a slice of toast’). As can be seen by the glosses, English has this sort of
thing too; also found both in Mandarin and English are group measures (e.g. 一群狗
yi-qun gou ‘a pack of dogs’, 一雙筷子 yi-shuang kuaizi ‘a pair of chopsticks’). The
syntactic similarity of such cases with the English of construction is hinted at by the
fact that all of these classifiers allow the appearance of the modifier marker 的 de, as
in yi-bang de rou ‘a pound of meat’ (Tai 1994, Kuo 1998). In addition to these
classifiers, Ahrens and Huang (1996) propose the recognition of kind classifiers and
event classifiers, which quantize kinds and events, respectively (e.g. 那種馬 na-zhong
ma ‘that kind of horse’; 這場電影 zhe-chang dianying ‘this (showing of a) movie’).
The semantics of the above classifiers are quite subtle and complex, and we will
have to talk about some of them later, but we will spend most of this paper discussing
another sort, the individual classifiers. These are what linguists typically think of when
they think of noun classifiers: morphemes that are selected by individual entities on the
basis of their inherent semantics. Such classifiers fail the de test, suggesting that they
are distinct from standard measures, container measures, partitive measures, and group
measures; they quantize individual entities, suggesting that they are distinct from event
191
James Myers
and kind classifiers as well. In the following table we list the individual classifiers that
we have examined the most carefully (they are the most common ones), along with
some examples of nouns they cooccur with. We’ve also given simplified semantic
descriptions for the noun classes, but the actual role of semantics in the use of these
classifiers is quite complex, as we will shortly illustrate (see also Kuo 1998, Shi 1996,
Tai 1994, Tai and Chao 1994, Tai and Wang 1990 and many other sources for fuller
discussion).
Table 1.
Classifier Examples Semantics
個 ge 人 ren ‘person’ humans
國家 guojia ‘country’ abstractions
西瓜 xigua ‘watermelon’ 3D objects
太陽 taiyang ‘sun’
位 wei 老師 laoshi ‘teacher’ humans (respectful)
張 zhang 紙 zhi ‘paper’ flat, broad objects
桌子 zhuozi ‘table’
條 tiao 路 lu ‘road’ flexible oblong objects
魚 yu ‘fish’
件 jian 事情 shiqing ‘thing, affair’ abstractions
衣服 yifu ‘clothing’ clothes
片 pian 葉子 yezi ‘leaf’ flat objects
隻 zhi 狗 gou ‘dog’ animals
鞋子 xiezi ‘shoe’ one of a pair
枝 zhi 鋼筆 gangbi ‘pen’ cylindrical rigid oblong objects
顆 ke 牙齒 yachi ‘tooth’ small objects
粒 li 米 mi ‘rice grain’ very small objects
面 mian 牆 qiang ‘wall’ flat objects
根 gen 棍子 gunzi ‘stick’ rigid oblong objects
把 ba 刀子 daozi ‘knife’ things with handles
椅子 yizi ‘chair’
One important way in which this table is misleading is that it treats ge as just one
individual classifier among many. Actually, it has traditionally been held that ge is
unique among individual classifiers in that it may be substituted for any of the others.
For instance, speakers don’t have to say 一張桌子 yi-zhang zhuozi (‘a table’); they
can (and many do) say 一個桌子 yi-ge zhuozi instead. For this reason, ge has been
192
Rules vs. Analogy in Mandarin Classifier Selection
called the GENERAL CLASSIFIER, with all of the other individual classifiers called
SPECIFIC (or SPECIAL) CLASSIFIERS (e.g. Li and Thompson 1981).
Now that the basic terminology is clear, we can turn to our main interest: showing
that the general classifier ge acts like regular inflection (i.e. is processed by rule) while
the specific classifiers act like irregular inflection (i.e. are processed by analogy). We
start with the specific classifiers, since they have naturally been the focus of much of
the classifier literature.
As explained in the previous section, specific classifiers are sensitive to semantic
features, but as it turns out, not in a way that can be expressed by general, exceptionless
rules (Tai 1994, Tai and Wang 1990, Tai and Chao 1994). For instance, take the
semantic characterization of tiao in Table 1 above. If we proposed a rule that read ‘Use
tiao for all flexible oblong objects’, this rule, like proposed rules for irregular inflection,
would work much of the time but would also both overgenerate and undergenerate.
Thus it would falsely predict that 一條頭髮 yi-tiao toufa ‘a hair’ is acceptable, and
that 一條板凳 yi-tiao bandeng ‘a bench’ and 一條新聞 yi-tiao xinwen ‘a piece of
news’ are unacceptable.
Specific classifiers, like irregular inflection, are also influenced by similarity (in
this case, semantic similarity) with lexical exemplars. For example, consider paper,
beds, tables and sofas; these are all ‘flat’ in some sense, but clearly some are flatter than
others. Ahrens (1994) has shown that this affects the likelihood that speakers will
actually use zhang with these objects (most likely with paper, least likely with sofas),
suggesting that paper is the prototype for the zhang class (in the sense of Rosch 1973).
Another way to say this is that paper is a privileged exemplar of the zhang class;
speakers seem to decide whether to use zhang with an object on the basis of that
object’s similarity to paper. In other words, speakers use analogy.
Another property that suggests analogy is the fact that speakers extend the use of
classifiers on a case-by-case basis. In the domain of vegetables and fruit, for instance,
tiao is consistently used for objects that are in fact oblong; one says 一條黃瓜 yi-tiao
huanggua ‘a cucumber’ but usually not 一條西瓜 yi-tiao xigua ‘a watermelon’. As
Wiebusch (1995) points out, however, in the domain of clothing tiao is extended quite
freely; one says 一條褲子 yi-tiao kuzi ‘a pair of pants’ (because pants are in fact long),
but also 一條短褲 yi-tiao duanku ‘a pair of shorts’ (which by definition are not).
Similarly, when they are oblong, both tables and paper remain flat, but only tables
continue to require the use of zhang; strips of paper may take tiao (Shi 1996). Other
objects, such as towels, are both oblong and flat, but the required classifier is tiao, not
zhang. Prototypical fish are oblong, and so they always take tiao, never zhang, even if
193
James Myers
they are as flat as a flounder (Kuo 1998). Distinguishing tiao from gen and 枝 zhi,
which also mark oblong objects, can really only be done by citing many examples, and
the same is true about distinguishing zhang from the other ‘flat object’ classifiers pian
and mian (Tai and Wang 1990, Tai and Chao 1994).
Finally, evidence from language acquisition suggests that specific classifiers are
learned on an analogical basis. Erbaugh (1986) reports that children extend their use of
specific classifiers from the most prototypical exemplars outwards to the peripheries of
the category. Children also make extensions by association; Hu (1993) cites the
example of a child who used the clothing classifier jian both for clothes and for
washing machines (which actually require the ‘machine’ classifier 台 tai).
It appears that to explain such complexities with rules alone, we’d need almost as
many rules as there are lexical items. This is precisely the sort of situation that calls for
analogy. On top of all this, of course, specific classifiers are sensitive to the lexical
semantics of nouns, an area of language where connectionist modeling has had
particular success (e.g. Collins and Loftus 1975, McRae, de Sa, and Seidenberg 1997).
However, we should make it clear that the fact that analogy plays a major role in
the selection of specific classifiers does not mean that the classifier system is as as
arbitrary as irregular inflection. While it may be natural to extend the drive-drove
pattern to dive-dove by analogy, there is no synchronic reason (from the perspective of
native speakers) why the drive-drove pattern should exist in the first place: it is merely
an accident of the history of English. By contrast, there are important cognitive factors
in the selection of specific classifiers that go beyond accidents of the history of
Mandarin and consequent analogical spread. For example, the ‘flat object’ classifier
zhang seems intuitively appropriate for tables even for non-native speakers, although
objectively speaking tables are not merely flat. We will allude to such cognitive factors
later in section 5.2.
194
Rules vs. Analogy in Mandarin Classifier Selection
If ge is a true default, then it should not be allowed to have any special meaning of
its own. In this subsection we argue that this is indeed the case, first responding to Loke
(1994), and then adding two new arguments from our corpus analysis.
One of Loke’s reasons for supposing that ge has semantic content is founded on
the observation that there are well-defined semantic classes that take only ge. These
include humans (or more precisely, humans for whom it would be inappropriate to use
the polite wei, e.g. 小 偷 xiaotou ‘thief’, 小 孩 xiaohai ‘child’), solid
three-dimensional objects above a certain size (e.g. 西瓜 xigua ‘watermelon’) and
abstractions (e.g. 希望 xiwang ‘wish’). However, these semantic categories for ge are
so disjoint that they may be more easily defined as representing all nouns that do not
require a specific classifier. That is, the reason why xiaotou, xigua, and xiwang all take
ge is not because thieves are people, watermelons are 3D, and wishes are abstractions,
but because none of these are in the categories ‘people to be polite to’ (requiring wei),
‘animals’ (requiring zhi), ‘flat objects’ (requiring zhang), and so forth. The only
classifier left over after eliminating all the inappropriate ones is the default classifier ge.
This complement function of ge is in fact one of the three suggested by Zubin and
Shimojo (1993) as being served by a general classifier. The reason the semantic
categories ostensibly marked by ge are disjoint is because they represent the negative
space left by removing the more coherent categories marked by the specific classifiers.
Of course, some specific classifiers also mark quite distinct semantic categories;
thus 隻 zhi is used both for animals and for one of a pair. When this happens with
specific classifiers, however, there are historical reasons, and indeed reasons that are
consistent with the idea that specific classifiers are processed by analogy, since
historically they have spread from semantic class to semantic class on an exemplar
basis or else have involved the orthographic merging of two distinct morphemes. Zhi,
for instance, was first used for individual birds (the character 隻 zhi, showing a bird in
a hand, was in opposition to 雙 shuang ‘pair’, showing two birds; see e.g. Wieger
1927). This narrow category was later extended to all animals, but without totally
eliminating the original meaning ‘one of a pair’. In spoken modern Mandarin it’s even
195
James Myers
more complex, since 隻 zhi is pronounced the same way as 枝 zhi, the classifier for
short oblong objects, which has a separate etymology. Since ge marks disjoint
categories, Loke (1994) suggests that it too arose from the merging of originally
distinct classifiers, but if so, this happened very early on. According to Wang (1989),
ge was already used for classes as disjoint as animals, plants, money, and people even
before it became the dominant classifier in the Tang dynasty (618-907 CE). The most
we could say from such historical considerations is that ge came to serve the
complement function in modern Chinese because in ancient times it marked too many
disjoint semantic categories, and was therefore reinterpreted as a default classifier.
Another sign that ge is semantically vacuous comes from its patterning in headless
NPs. Usually a head noun is only dropped when it is clear from context (as when a
waiter asks 幾位? Ji-wei? ‘How many (people)?’), but it may also occur when the
speaker does not know what to call the object (Yau 1986). Dropping the head noun is
not such a great loss when the classifier itself provides enough semantic information to
recover it. If ge carries as much semantic information as the specific classifiers, we
would expect it to appear in headless NPs as often as they do, but this is not the case. In
our corpus study, we found that ge appears proportionally less often before punctuation
marks (e.g. periods or commas), and thus presumably in headless in NPs overall, than
any of the other specific classifiers we examined (with the one curious exception of gen,
which for some reason never appears before punctuation). The proportions reached
significance by chi-squared tests for comparisons with all individual classifiers except
for 枝 zhi (the number of observations was too low to make the test valid) and pian.
The fact that pian also appears to be semantically vacuous by this test may be explained
by its not being a very good individual classifier; as noted above in section 3, it can
also be used as a partitive classifier, and thus has less tight semantic restrictions from
the noun. This conclusion is bolstered by the finding that the kind classifier zhong also
appears as rarely before punctuation as does the individual classifier ge (as a kind
classifier, zhong of course has virtually no semantic linkage with the noun at all). These
findings are summarized in Table 2.
Table 2.
Classifier Number of tokens Total1 Proportion
before punctuation
個 ge 55 2000 0.0275
位 wei 90 1104 0.0815*
張 zhang 55 425 0.1294*
1
The public Web interface to the Sinica Corpus only allows access of up to 2000 tokens per
item.
196
Rules vs. Analogy in Mandarin Classifier Selection
Nevertheless, we are sure that many readers will be uncomfortable with the strong
claim that ge has no meaning whatsoever. A common description of ge is that while it
may be used as a default, it still has a core meaning of ‘human’ (e.g. Zubin and Shimojo
1993), and we must confess that this is consistent with our own intuitions as well.
Yet what precisely would it mean for ge to have a core meaning but also serve as a
default? If we hypothesize that words for people take ge because ge is a default, and
someone else hypothesizes that words for people take ge because ge is a default but also a
‘person’ classifier, do these two hypotheses make any testably different predictions?
Probably not, and indeed, our hypothesis is to be preferred for parsimony reasons.
It won’t help to settle the matter to ask people to list nouns that go with ge, and
then call the most common choice evidence of its core meaning (as is done in Zubin
and Shimojo 1993 for Japanese). Surely the most common choice for ge will be 人 ren
(‘person’) (and pilot studies we have done have indeed found this), but this is probably
because ren is the highest-frequency noun that collocates with ge. The high frequency
of ren may help explain why some speakers believe that ge has the core meaning of
‘human’, but it doesn’t prove that ge actually does have this core meaning in the sense
of having privileged exemplars.
A better test would be to examine the distribution of the different semantic classes
that collocate with ge (e.g. humans, abstractions, 3D objects) to determine which has
the highest proportion of privileged exemplars. This can be measured by calculating the
MUTUAL INFORMATION value (MI), whose formula is given in (1). Essentially, the MI
describes how common a collocation is when the lexical frequencies of each word have
been factored out. If two words x and y are distributed randomly, MI(x,y) ≤ 0; if they
form meaningful collocations, MI(x,y) >> 0; and if they are in complementary
distribution, MI(x,y) << 0 (see Church and Hanks 1990).
197
James Myers
⎛ prob( x, y ) ⎞
MI ( x, y ) = log⎜⎜ ⎟⎟
⎝ prob( x) ⋅ prob( y ) ⎠
We used the MI calculations automatically provided by the public Web interface
to the Sinica Corpus, given a window size of five words (i.e. all instances where a
classifier appeared within five words before a given noun). All examples were screened
to make sure that the classifier and noun weren’t actually in unrelated clauses. The
result was a list for each classifier we examined showing all collocating nouns with
positive MI values. To give a flavor of these lists, Table 3 shows some sample items in
the ge list, including the first and last items.
Table 3.
Rank Noun MI with ge Semantic category
1 定點 dingdian ‘fixed point’ 6.645 abstraction
2 梯子 tizi ‘ladder’ 6.576 3D object
12 婆子 pozi ‘hussy’ 6.000 human
34 終了 zhongliao ‘completion’ 5.277 deverbal noun
368 學生 xuesheng ‘student’ 1.099 human
To deal with the issue of ge’s core meaning, we compared the number of
collocating nouns of different semantic classes with an MI value above vs. below the
median. If ge has a core meaning of ‘human’, we expect a greater proportion of
‘human’ nouns to appear above the median MI value, while the proportions for
abstractions and 3D objects should be the same. As shown in Table 4, this is not what
we found. None of the proportions reached significance by chi-squared tests, and the
only one that came close was for abstractions (p=0.068). Thus contrary to what is
standardly thought, nouns for humans do not have any special status in the ge category.
Table 4.
humans abstractions* deverbal nouns** 3D objects
27:37 121:104 8:15 27:29
*such as 社會 shehui (‘society’) **such as 希望 xiwang (‘wish’)
It’s also worth noting that the most common semantic class by far is the set of
abstractions. Does this mean that the core meaning of ge is actually ‘abstraction’? We
think not. After all, an abstraction by its very nature is something that has rather vague
198
Rules vs. Analogy in Mandarin Classifier Selection
199
James Myers
A related observation is that of Loke (1994), who notes that while ge does not
replace shape-based classifiers like zhang very often, it is quite common to use ge
instead of function-based classifiers (e.g. the ‘vehicle’ classifier 輛 liang). To Loke
this implies that ge has semantic content, suggesting that ge cannot be substituted for
shape-based classifiers since ge itself marks shape (namely, solid roundish objects
above a certain size).
An alternative analysis is simply that function-based classifiers, for whatever
reason, are just not as robust as shape-based classifiers. They certainly aren’t as
common, and research on language development has found that there is a strong
preference to use shape rather than other characteristics to classify objects (Pinker
1989). Hence an object that can be classified by shape, such as a saliently flat thing,
will have many similar exemplars in memory to analogize to, making the selection of
zhang very likely. The same is expected of objects that can be classified as animals, 隻
zhi also being a very common classifier, and animacy also being a very salient semantic
property in word learning (Pinker 1989). Thus objects, like vehicles, that cannot be
classified by shape or as animate, will have fewer analogous exemplars in memory, and
the exemplars that are present may not seem very similar (if shape and animacy are
innately more salient than function). The result is that function-based classifiers are
more likely to be replaced by the default ge. In other words, the different neutralization
patterns of shape-based and function-based classifiers tells us more about the
processing of those specific classifiers than it does about the processing of ge.
Recognizing the fact that some specific classifiers are more ‘robust’ than others
helps us understand another phenomenon, namely ‘neutralization’ to classifiers other
than ge. Some researchers have used this fact to conclude that ge is not the only default;
other classifiers can serve as defaults within particular semantic classes. For example,
Tyan (1996) observed that judgments for shape-based classifiers were also inversely
correlated with those for li and ke in the category of small objects. Similarly, within the
category of animals, some researchers (e.g. Hu 1993, Ahrens 1994) note that the
‘default’ appears to be 隻 zhi, not ge, so that when adults or children fail to use the
specific ‘horse’ classifier 匹 pi (e.g. 一匹馬 yi-pi ma ‘a horse’), they say yi-zhi ma
instead, never yi-ge ma. Ahrens (1994:228) even claims that zhi (for her, including both
隻 and 枝) is ‘on its way to becoming a neutral classifier’ in Mandarin. If such
conclusions are right, we cannot say that ge is produced by a unique lexicon-independent,
semantics-independent default rule.
Some minor quibbles can be made; we will comment further on Ahrens (1994)
below in 5.4.3, and we could note that the experiment in Tyan (1996) involved
judgments, not classifier selection in production, which is our primary focus. However,
a general response is more appropriate: such cases of ‘neutralization’ to specific
classifiers really only show that classifier selection by analogy can override the ge rule
200
Rules vs. Analogy in Mandarin Classifier Selection
One interesting form of the argument that ge is used for nouns without analogous
exemplars concerns cases where ge is used for nouns that aren’t even really nouns.
First, our corpus study confirmed that ge is freely used with deverbal nouns,
whether derived from active transitive verbs like 體 驗 tiyan (‘learn through
experience’), stative intransitive verbs like 自由 ziyou (‘be free’), or any other kind.
This differs markedly from the specific classifier pian which also can take deverbal
nouns, but when it does, they are all of one type, namely stative intransitives like 空白
kongbai ‘be blank’. The specific classifier jian, which also may be used with abstract
nouns, cooccurs with deverbal nouns in a significantly smaller proportion of its noun
collocations than does ge.
Second, ge is the only classifier used with mentioned language, where linguistic
units are treated as objects of discussion rather than referential symbols. (For example,
this sentence treats ‘This sentence is being treated as an NP’ as an NP to illustrate this
phenomenon in English.) When such ad hoc NPs receive a classifier in the corpus,
this classifier is always ge. An example is given below.
(2) 你對我好,我也對你好,這個「好」就變得具有生命力。
Ni dui wo hao, wo ye dui ni hao, zhege ‘hao’ jiu biande ju you shengmingli.
“You’re good to me, I’m good to you, this-GE ‘good’ comes to have vitality”.
Finally, ge and only ge can be used to ‘classify’ linguistic constituents that are not
even nouns at all. Thus speakers can use ge with adverbial resultative complements, as
in 吃了個飽 chi-le ge bao ‘ate until stuffed’, or entire idioms, as in 問了個水落石出
wen-le ge shuiluoshichu ‘asked until everything was clear’ (Wu and Li 1997).
Admittedly the use of ge in such contexts is somewhat anomalous, since it tends not to
201
James Myers
The above arguments illustrate the default use of ge with forms when there are no
good exemplars to form analogies; ge is also used as a default when such exemplars
exist, but speakers have difficulty accessing them for various reasons.
Table 5.
Classifier Mean MI Classifier frequency
個 ge 3.53 0.28447
位 wei 4.34 0.10062
件 jian 5.69 0.03676
條 tiao 5.86 0.03639
隻 zhi 6.11 0.01072
片 pian 6.32 0.02324
張 zhang 6.56 0.11932
顆 ke 7.13 0.00170
把 ba 7.33 0.05952
根 gen 7.85 0.00849
面 mian 8.11 0.02526
粒 li 8.38 0.00252
枝 zhi 9.42 0.00472
202
Rules vs. Analogy in Mandarin Classifier Selection
203
James Myers
Overall, the way children acquire the Mandarin classifier system is quite
consistent with our claims: children do overuse ge (since they have memory access
problems), and they do extend the use of specific classifiers from prototypes to
peripheral exemplars (since they acquire specific classifiers by analogy). Still, there is a
curious fact that needs some comment. Children acquiring the English past tense show
a U-shaped pattern in development (e.g. Marcus, Pinker, Ullman, Hollander, Rosen,
and Xu 1992): at first all irregulars are correct as they are simply parroted back, then
errors increase as they are overregularized, and finally accuracy improves again as
exceptions to the regular rule are learned. By contrast, according to Erbaugh (1986), the
stage where children overuse ge is not preceded by a stage where they use individual
classifiers correctly. Thus classifier development in Mandarin seems to follow more of
an S-shaped curve, where at first accuracy is low because children use only ge, and then
gradually improves as they learn the specific classifiers. This is a bit mysterious. If
examples of specific classifiers are in children’s memory, why aren’t they simply
repeated back verbatim at the earliest stage?
One answer is that child-directed speech may contain such an overwhelming
majority of ge tokens that children don’t even notice at first that specific classifiers
exist (Erbaugh 1986 notes that the adults in her study tended to use specific classifiers
extremely rarely). To test this hypothesis, we are currently looking (with Jane Tsay) at
the early acquisition of classifiers in Taiwanese, where e, the cognate of ge, also seems
to behave as a default but is not used by adults in as high a proportion as ge is in
Mandarin. As expected, here we do find a U-shaped learning curve, with initial correct
production of specific classifiers before the age of twenty-six months followed by an
extended period where the default classifier is overused.
204
Rules vs. Analogy in Mandarin Classifier Selection
On the face of it, the view of Ahrens (1994) is more in line with the central claims
of this paper, since both types of aphasic patients end up overusing defaults.
Unfortunately, it isn’t necessary to suppose that the Wernicke’s patients were
code-switching into Taiwanese to explain their overuse of zhi. First, the most common
classifier in Taiwanese is not the cognate of zhi but rather the cognate of ge, just as in
Mandarin (in fact, the cognates of 枝 zhi and 隻 zhi, which Ahrens lumps together
because they rhyme in Mandarin, don’t sound a bit alike in Taiwanese). More
interestingly, Hu (1993) found that children acquiring Mandarin also overuse zhi, and
the subjects in her study were the children of Mainland Chinese parents, not bilingual
Taiwanese-Mandarin speakers. In Hu’s study, as in Tzeng et al.’s (1991) original
analysis of the Wernicke aphasics, the overused classifier pronounced zhi is 隻 zhi, the
classifier for animals.
Nevertheless, the overuse of zhi by children or aphasics does not prove that it is
also some sort of default. For the reasons discussed in 4.2 above, zhi is a prime
candidate for analogical overextension. We therefore concur with the conclusion of
Tzeng et al. (1991), interpreting their results as meaning that aphasics overuse the
default ge rule due to memory access problems, while Wernicke’s aphasics have
additional problems with lexical semantics that cause them to overextend exemplar-rich
specific classifiers.
This discussion of aphasia raises a more general issue. Supporters of the hybrid
model of inflection have also made much of aphasia evidence (e.g. Pinker 1991). Since
we support a hybrid model of classifier production, are we claiming that the brain
processes inflection and classifiers in precisely the same way?
The answer to this must be no. The fact that the Broca’s aphasics studied by Tzeng
et al. (1991) overused ‘the ge rule’ conflicts with the claim made by Pinker (1991) that
Broca’s aphasics lose the ability to process all grammatical rules. Broca’s area may be
used in the processing of inflection (see also Jaeger et al. 1996), but as Tzeng et al.
(1991) themselves conclude, it is unclear what its role is in the processing of classifiers.
The difference in the acquisition patterns of inflection (U-shaped) and in classifiers
(S-shaped) also hint at as yet unknown processing differences. Such observations are
not fatal to our assertion that ge is treated as a default, but they also should serve as a
check on more ambitious speculations that all aspects of human language obey rules of
the same sort.
6. Concluding remarks
Given the direction of the arguments in this paper, we hope our readers agree with
us about what the next steps should be. First, versions of the experimental studies on
205
James Myers
inflection need to be carried out on Mandarin classifiers, in particular the ones relating
to similarity and other lexical effects (Pinker 1991, Prasada and Pinker 1993, Marcus et
al. 1995). Second, a connectionist model of the Mandarin classifier system should be
attempted, to confirm that the specific classifiers can in fact be processed by analogy
but that ge cannot. We are carrying out both of these steps right now. We expect that
the results will confirm the basic conclusions of this paper, although surely they will
also reveal many new complexities of the Mandarin classifier system as well. We may
even end up increasing our understanding of nominal semantics and human
categorization after all.
References
206
Rules vs. Analogy in Mandarin Classifier Selection
207
James Myers
208
Rules vs. Analogy in Mandarin Classifier Selection
209