Chinese Word Order
Chinese Word Order
This paper shows how Mandarin Chinese word order can be described in topological terms. After
discussing the difficulties in using syntactic dependency and basic word order phenomena of
Chinese, we provide the foundations of a formal topological grammar that links a syntactic depen-
dency tree to all of the possible corresponding word orders. We present the formal rules that allow
the generation of simple sentences, as well as the more complex ba and bei-constructions and se-
rial verb constructions.
1 Introduction
This paper is an account of the work in progress regarding a topological description of some basic word
order phenomena of Mandarin Chinese. The topological model (Gerdes Kahane, 2001) is a powerful
model for linearization. In the Meaning-Text Theory, it can model the interface between surface syntax
and the next level, traditionally called morphological representation, that we refer to as the “topological
level”, because we not only construct word order and prosodic breaks of different sizes, but a fully-
fledged constituent tree. This tree can then provide the basis for the computation of the prosodic groups
and pauses. The topological model is a formalization of the traditional analysis of German (Drach 1937)
and has been shown to allow for an elegant description of word order phenomena like scrambling in V2
or verb final languages that mix syntactic and communicative constraints. The basic idea is that the sen -
tence is constructed from fixed places, also called positions or fields, which have different constraints on
the number of their occupants. All word order constraints are described in this manner: If a word has to
precede another word, we don’t use relative placement rules, but these words are placed into different
fields. If their mutual order is free (under the given communicative information), they go into the same
field.
At first sight, it may seem like an overkill to apply this model to a language like Chinese, sometimes
described as having a very restricted word order because of its limited morphology. Alternatively, Her
2003 describe “inversion constructions” in LFG with “simplified Lexical Mapping Theory”. In this
approach, a change in argument order is considered as a lexical operation, putting the full burden on a
multitude of lexical entries. However, following (LaPolla 95), we will show that Chinese has a fairly
complex word order, depending mainly on communicative constraints. Placing this work in the Meaning-
Text Framework allows us to have unordered dependency trees at the syntactic level; the linearization
process can then be described in topological terms. All the rules and examples we provide have been
implemented and tested with the DepLin software (http://gerdes.fr/soft/deplin/), which assures that
obtrusive interaction between different rules or surplus word orders have not been overseen. The rules
have, however, not been tested in parsing, although this is feasible (for example by transcoding the rules
in Lexical Functional Grammar (LFG), Clément et al. 2002). Aside from the difficulty involved in
1
The authors, not being Chinese natives, are deeply indebted to the helpful comments and innumerable grammati-
cality evaluations by Hsieh Shu-kai, Liu Yeh-hsin, and Jun Miao. We have also benefited greatly from Sylvain Ka-
hane’s and three anonymous reviewers’ comments on our work. Any shortcomings remaining after help from these
colleagues are, of course, entirely our own responsibility.
working on written text when we want to include many oral word orders in our account, we would
encounter the word separation and ambiguity problem that most rule-based approaches face when parsing
Chinese, often obscuring the underlying analysis of word order phenomena (cf. for example the
importance that segmentation takes in the development of the Chinese Lexical Functional Grammar in
Fang&King 2007). We believe that this is another example of the usefulness of the prevalence of the
synthesis direction in the Meaning-Text Theory. It allows concentrating on the non-coincidental properties
of language, while keeping in mind the bidirectional character of the rules provided.
(1) 你愛她 / 你 對 她 的 愛 有 多少
nǐ ài tā nǐ duì tā de ài yǒu duōshao
you love she you to her DE love ~have many-few
‘You love her.’ 'How deep is your love for her?'
Contrary to English, however, where we have morphological tests (changing person, time, and number)
for a clear distinction between the two categories, in Chinese, the only observable difference between the
two 'ai' is the syntactic context, for example the appearance of DE, a genitive particle, when ai could be
called a noun phrase. More generally, the semantic-syntax interface remains the role to provide function
words, appearing on the syntactic level (like de, ba, and bei presented in section 4) and to choose pro-
nouns (or, more often, the absence of pronouns) when realizing predicates.
Yet, the main reason for stipulating doubt on the appropriateness of MTT is the central position this
model gives to dependency, including the prominent place of syntactic functions. Although categorical
borders may be very different in Chinese (see for example Huang 1997), to our knowledge, nobody
doubts the existence of categories as a whole. Things are different with syntactic functions: LaPolla 1993
convincingly shows that the usual criteria for subjecthood or objecthood do not exist in Chinese and ar-
gues in favor of a completely semantic and pragmatic analysis of the language, meaning that semantic
roles such as agent and beneficiary, coupled with communicative values like topic and focus, are suffi-
cient to describe word order constraints in Chinese. At this time, we cannot discuss whether Chinese has
truly grammaticalized the subject role, and it is possible that the term “agent”, even in the surface depen -
dency, would be more appropriate. However, we remain with the usual functional terms subject and ob-
ject whenever we have the syntactic realization of an agent in a dependency tree. We will nevertheless use
semantically tainted terms like goal if a common equivalent for the syntactic relation cannot be found
among the usual syntactic functions.
In this approach, we follow the common practice in computational and formal description of Chinese
such as the work on a Chinese LFG in the Palo Alto Research Center (Fang&King 2007) or the work of
Haitao Liu 2007 on syntactic dependency structures for Chinese, using the “European” terms as function
names wherever possible. His work on a Chinese dependency treebank has demonstrated that the depen -
dency approach can give important insights into the structure of the Chinese language.
3 Simple Structures and first formalization
We start our description with a simple dependency
structure with a transitive verb: 買 buy V
tmp
suj obj asp
N N
(2) 我 昨天 買 了 書 我 I 書 book 了 -LE
Asp AdvT
昨天 yesterday
Wǒ zuótiān mǎi le shū
I yesterday buy ASP book Figure 1: Simple dependency tree
I bought books yesterday.
Note that we have the two arguments, the subject and the object, realized as a pronoun and a bare noun.
Temporal and spatial relations behave slightly differently than other modifiers and we have to introduce a
specific modifier relation, loc, which hints further at the close connection between semantic and syntactic
relations in Chinese. The aspectual marker LE, marking the accomplishment 3 will be treated it as a
separate word with a special function: asp.
3.1 Topicalization, word order possibilities, and communicative structure
Chinese is said to be an SVO language, which may be misleading considering that S and O functions
are potentially irrelevant. LaPolla 95 suggests that that Chinese should be described as “verb medial” lan-
guage where “Topical or non-focal NPs occur preverbally and focal and or non-topical NPs occur post-
verbally.” The typical order given in (2) is in fact the most communicatively neutral, corresponding to Li
& Thompson 81 (chapter 4.1.3 D) “sentences with no topic”, i.e. it can constitute an answer to the thetic
question: What is going on?. Topicalization of various dependents of the verb is possible with different
communicative structures. Multiple topicalization is possible, too, in particular in spoken language. This
can lead to very different word orders for the same dependency structure.
(3) 這本書我買了 / 昨天 書 我 買 了
Zhè běn shū wǒ mǎi le zuótiān shū wǒ mǎi le
this Classifier(Cl) book I buy ASP yesterday books I buy ASP
This book, I bought (it). Yesterday, books, I bought (some).
Li & Thompson describe this possibility for shu (book) to be in the topic position as in (3). They also
remark that the topic position cannot be occupied by indefinite NP and that the interpretation of bare
nouns is constrained to be either definite or generic. Interestingly enough, we should add that when in ob-
ject (post-verbal) position, a bare noun is either generic or indefinite but cannot be interpreted as definite.
These differences of possible interpretations seem to be closely related to the communicative value born
by the bare noun.
3
It is generally agreed upon the fact that Chinese has two different markers LE, the other type is called “Current rel-
evant state” (CRS) which always has to go in the last position in the sentence. This place is the last position of our
micro domain. LE is sometimes designated as a verbal suffix or as an auxiliary, the lack of segmenting characters
making those two explanations plausible.
Note that our analysis differs slightly from Li&Thompson's presentation of Chinese simple declarative
sentences. We allow the subject to be placed in a topical position, creating a different constituent struc-
ture, whereas Li&Thompson talk about “Sentences in Which the Subject and the Topic are Identical” vs.
“Sentences with no subject”. The difference lies in the definition of the subject position, the one they give
making it impossible to distinguish those two positions when the topic is an agent. We consider, although
we cannot show this here, that the communicative difference also appears prosodically, and we capture
this kind of (spoken) word order possibility by allowing more than one element in topic position.
The aspectual marker le occupies a position in close proximity to the verb, from which it can only be
separated by a verbal resultative (in so called Verb-Resultative compounds) or a specific kind of object in
Verb-Object compounds, which are collocational or idiomatic and thus lexically constrained. (5) is an ex-
ample of a Verb-Object Compound where the bare noun fàn can appear before the aspect marker, but it
can also be topicalized or appear after le as in (6). In the latter cases, fan could also have dependents that
would specify the meal. This is not possible when fan occupies the position between chi and le where it
can only appear as a bare noun.
(5) 我吃飯了
Wǒ chīfàn le
I eat meal ASP
I ate.
(6) 我吃了飯 / 飯我吃了
wǒ chī le fàn fàn wǒ chī le
I eat ASP meal meal I ate ASP
I ate. I’ve eaten (more like “lunch, I already had”).
For the dependency tree presented above, the topicalization possibilities amount to 8 different word or -
ders (of the 120 theoretically possible orders). They correspond to 16 different communicative structures,
which reflect different possibilities for the intonation structure in spoken language.
3.2 Domains and placement rules
Topological grammars can include communicative constraints directly in the rules. In this work, however,
we provide a grammar that gives all the possible word orders, independently of the communicative parti-
tion, but it is straightforward to specialize the proposed rules with communicative restrictions. The terms
we use for the description of these possibilities stem from the syntactic description of oral French
(Blanche-Benveniste 1990) where we distinguish the “macrosyntactic” domain providing places for all
extraction and topicalization phenomena from a core syntax, called “microsyntax”, with the common or -
der constraints and places for all verbal arguments (used when the arguments are rhematic). Moreover, we
consider that Chinese verbs provide places for some of its closer dependents. We call this the “verbal
domain”.
The macrosyntactic domain only has two fields: The thema-field and the main field. Note that this
macrosyntactic division in two main fields roughly corresponds to Chao 1968's description of the Chinese
clause structure as simply topic and comment. The micro domain distinguishes four places to express the
ordering constraints: subject field, verbal field, object field, and SVC field. The verbal domain has the fol-
lowing fields: circ(umstantial) field, ba-bei-field, negative field, verbal field, verbal object field, and the
field for the aspectual (marker). We obtain the following domain descriptions including the placement
constraints for each field:
Initial field Category Domain Final Governor Governor's relation Dependent Dependent comment
created field POS field POS field
I V macro-d Micro-field V verb subj N subject Neutral subject
Micro-field V micro-d verbal V verb obj N object Neutral object
verbal V verbal-d verb V verb suj N Topic Topicalized subject
V verb obj N Object Topicalized object
The aspect marker LE will be placed by the first rule, the nominal dependents by the following two rules,
and the temporal adverbial by the remaining rules:
Governor POS Governor's field relation Dependent POS Dependent field comment
V verb asp ASP Asp Aspect marker
N noun atr CL Cl Classifier in NP
CL Cl qc Num num Numeral in NP
V verb loc AdvT circ circumstancial
V verb loc AdvT Topic Topicalized circumstancial
Topic N nd noun
chN chSuj chVerbal chObj chSVCGOAL
1. i[macro [micro fSuj[barenoun fN 我 I] fVerbal[verbal fCirc 昨天 yesterday fV 買 buy fAsp 了 -LE] fObj[barenoun fN 書 book] ] ]
fMicro
6. i[macro fThemes
[barenoun fN 我 I] fMicro[micro fVerbal[verbal fCirc 昨天 yesterday fV 買 buy fAsp 了 -LE] fObj[barenoun fN 書 book] ] ]
8. i[macro fThemes
[barenoun fN 我 I] fThemes 昨天 yesterday fMicro[micro fVerbal[verbal fV 買 buy fAsp 了 -LE] fObj[barenoun fN 書 book] ]
]
Number of structures with the same word order: 3
2. i[macro fThemes
昨天 yesterday fMicro
[micro fSuj[barenoun fN 我 I] fVerbal[verbal fV 買 buy fAsp 了 -LE] fObj[barenoun fN 書 book] ] ]
7. i[macro fThemes
昨天 yesterday fThemes
[barenoun fN 我 I] fMicro[micro fVerbal[verbal fV 買 buy fAsp 了 -LE] fObj[barenoun fN 書 book] ]
]
Number of structures with the same word order: 2
4. i[macro fThemes 昨天 yesterday fThemes[barenoun fN 書 book] fMicro[micro fSuj[barenoun fN 我 I] fVerbal[verbal fV 買 buy fAsp 了 -LE] ]
]
10. i[macro fThemes 昨天 yesterday fThemes[barenoun fN 書 book] fThemes[barenoun fN 我 I] fMicro[micro fVerbal[verbal fV 買 buy fAsp 了
-LE] ] ]
Number of structures with the same word order: 2
3. i[macro fThemes[barenoun fN 書 book] fMicro[micro fSuj[barenoun fN 我 I] fVerbal[verbal fCirc 昨天 yesterday fV 買 buy fAsp 了 -LE] ] ]
9. i[macro fThemes[barenoun fN 書 book] fThemes[barenoun fN 我 I] fMicro[micro fVerbal[verbal fCirc 昨天 yesterday fV 買 buy fAsp 了 -LE] ]
]
12. i[macro fThemes[barenoun fN 書 book] fThemes[barenoun fN 我 I] fThemes 昨天 yesterday fMicro[micro fVerbal[verbal fV 買 buy fAsp 了
-LE] ] ]
Number of structures with the same word order: 3
5. i[macro fThemes[barenoun fN 書 book] fThemes 昨天 yesterday fMicro[micro fSuj[barenoun fN 我 I] fVerbal[verbal fV 買 buy fAsp 了 -LE] ]
]
11. i[macro fThemes[barenoun fN 書 book] fThemes 昨天 yesterday fThemes[barenoun fN 我 I] fMicro[micro fVerbal[verbal fV 買 buy fAsp 了
-LE] ] ]
Number of structures with the same word order: 2
Note that a bare noun would have to be interpreted as definite or generic just like topics. In other
words, they cannot introduce new information to the discourse. This confirms the idea that new informa -
tion has to be postverbal. The position of negation adverbs leads us to locate these constructions inside the
verbal domain, just between the verb and the negation adverb:
(9) 我 沒 把 那 本 書 買走了
Wǒ méi bǎ nà běn shū mǎizǒu le
I have-not BA this Cl book buy
‘I did not buy this book.’
(10) *我 把 那 本 書 沒 買走了
Wǒ bǎ nà běn shū méi mǎizǒu le
I BA this Cl book have-not buy
An important point is that Ba and Bei cannot be topicalized, neither can the depending NP:
(11) *書 我 把 買走 了 / *把 書 我 買走 了
shū wǒ bǎ mǎizǒu le bǎ shū wǒ mǎizǒu le
book I BA buy ASP BA shu I buy ASP
The position for Ba and Bei is opened by the verb and already included in the rules we have presented in
section 3.2. Now we need to define their placement rules and their own domain that will hold the depen -
dent NP. We have two domains: bei-d = bei subject and ba-d = ba object
Domain creation and placement rules:
Governor Governor's relation Dependent Dependent
Initial Categor Domain Final field
POS field POS field
field y created
V verb pat-obj BA ba-bei
ba-bei BA ba-d ba
V verb agt-obj BEI ba-bei
ba-bei BEI bei-d bei
BA ba comp N object
BEI bei comp N subject
These additions to our grammar suffice to generate the more restricted word orders: With a 6 words tree,
only two different word orders are possible, corresponding to 5 different topological trees (for 720 theo -
retical possibilities) :
i fMicro fSuj fN fVerbal fCirc fBaBei fBa fObj fN fV fAsp
1. [ macro [ micro [ barenoun 我 I] [ verbal 昨天 yesterday [
babox 把 BA [barenoun 書 book] ] 買 buy 了 -LE] ] ]
i fThemes fN fMicro fVerbal fCirc fBaBei fBa fObj fN fV fAsp
3. [ macro [ barenoun 我 I] [ micro [ verbal 昨天 yesterday [ babox 把 BA [ barenoun 書 book] ] 買 buy 了 -LE] ]
]
i fThemes fN fThemes fMicro fVerbal fBaBei fBa fObj fN fV fAsp
5. [ macro [ barenoun 我 I] 昨天 yesterday [
micro [
verbal [
babox 把 BA [ barenoun 書 book] ] 買 buy 了
-LE] ] ]
Number of structures with the same word order: 3
5
Among them are some structures that should not be called SVC because they resemble phenomena very common
in various languages including languages without SVC, like sentential subjects. Nevertheless, some so-called SVC
in Chinese are comparable with structures of African languages well known for their SVC (But even in this case, a
close look to characterize structural differences amongst languages is needed, see Wu 2002, Paul 2004)
6
We have to note here that when asked about this sentence, some native speakers (of Mandarin spoken in Taiwan)
don't even notice the ambiguity (in favor of 17b) or said to have a strong preference for the SVC interpretation.
討論 V V
開會 hold a
discuss
meeting
suj circ obj su
purpose
j
N V N
我們 開會 hold- 問題 我們
N
討論
V
we meeting problem nous discuss
spe ob
j
Spe
N
個 Cl 問題
dem
problem
spe
這 Dem
Sp
this 個 Cl
e
dem
De
這 m
this
Figure 1: The two dependency trees: circonstantial and SVC
Reduced verbal domain: rvd = verb! Object? and Domain creation rule :(SVC,V,rvd,verb)
These additions to our grammar give a different interesting result: Starting with two different dependency
structures, we obtains various word orders, some of them common to the two different dependency struc-
tures, attesting that the surface form is ambiguous. We also noticed that all the word orders (but none of
the topological trees) generated by the SVC dependency tree can be generated from the circumstantial de -
pendency tree, while the contrary does not hold. This observation seems to suit the preferences of our na -
tive speaker informants.
Below we show all possible word orders for the first dependency tree (with the circumstantial depen -
dency, 8 word orders):
i fThemes fN fThemes fQuantification fDem fSpec fN fMicro fVerbal fCirc fV
8. [ macro [ barenoun 我們 we] [ bSN [
bQntDef 這 this 個] 問題 problem] [ micro [ verbal 開會 hold meeting 討論 dis-
cuss] ] ]
i fThemes fN fThemes fQuantification fDem fSpec fN fThemes fMicro fVerbal fV
15. [ macro [ barenoun 我們 we] [ bSN [ bQntDef 這 this 個] 問題 problem] 開會 hold meeting [micro [ verbal 討論
discuss] ] ]
Number of structures with the same word order: 2
7
Some informants don't accept the topicalization of a bare verb, or find it unnatural. If we add the postposition 時
shí to the verb, however, the verbal topicalization becomes generally acceptable. This particle appears at the syntac-
tic level and can be dealt with a small amendment to our grammar adding a constraint on the topic field. We don't
want to stress this point here for clarity reasons.
i fThemes fN fThemes fMicro fVerbal fV fObj fQuantification fDem fSpec fN
13. [ [ macro barenoun 我們 we] 開會 hold meeting [ micro [ verbal 討論 discuss] [ bSN [ bQntDef 這 this 個] 問題
problem] ] ]
Number of structures with the same word order: 3
This is a list of structures obtained for the second dependency tree (with the SVC, 3 word orders):
fV
討論 discuss ] ] ]
Number of structures with the same word order: 1
討論 discuss ] ] ]
i fThemes fQuantification fDem fSpec fN fThemes fN fMicro fVerbal fV fSVCPURP
4. [ macro [ bSN [ bQntDef 這 this 個] 問題 problem] [ barenoun 我們 we] [ micro [ verbal 開會 hold a meeting ] [ verbal-inf
fV
討論 discuss ] ] ]
Number of structures with the same word order: 2
6 Conclusion
We have shown that various simple and more complex syntactic phenomena of Chinese find a straight-
forward formalization in terms of dependency and topology, and thus in the framework of MTT. In spite
of some doubts on the usefulness of the commonly used syntactic functions, it is possible to translate into
this type of topological formalization some analyses of syntactic phenomena stemming from different the -
oretical frameworks, even from “distant” approaches like generativist theories. Contrary to analysis based
reasoning that focuses on ambiguities, we believe that this “synthetic” approach explains naturally the un-
derlying linguistic processes. Our approach differs thus in providing the complete set of paraphrases for a
given dependency tree, a computation that, as soon as we go beyond the simple examples given in this
paper, requires the implementation of the grammar in a computer system.
Our grammar includes some more complex phenomena like for example relative phrases, not presented
here for lack of space, and we are working on covering further syntactic details. It would be interesting to
explore the connection of this grammar with an implementation of a semantic-syntax interface that could
provide the input for our system. On the other end of the pipeline, it remains to be shown that the result -
ing topological structures have a raison d’être in providing a smooth basis for the computation of
prosodic groups even for tone language like Chinese.
References
Blanche-Benveniste C. 1990, Le Français Parlé: Etudes Grammaticales, CNRS, Paris.
Chao, Yuen Ren. 1968. A grammar of spoken Chinese. Berkeley & Los Angeles: University of California press.
Clément, L., K. Gerdes and S. Kahane. 2002. “An LFG-type grammar for German based on topological model”, in
Miriam Butt and Tracy Holloway King, editors, Proceedings of the LFG02 Conference. CSLI Publications, 2002,
Stanford
Drach, Erich. 1937. Grundgedanken der deutschen Satzlehre, Diesterweg, Frankfurt/M.
Fang, Ji and King, T.H. 2007. “An LFG Chinese Grammar for Machine Use”. In. T.H. King and E. M. Bender, eds.,
Proceedings of the GEAF07 Workshop.
Gerdes K., S. Kahane. 2001. “Word Order in German: A Formal Dependency Grammar Using a Topological Hier -
archy” in: Proceedings ACL 2001, Toulouse
Gerdes K., S. Kahane. 2007. “Phrasing it differently”, in L. Wanner (ed.), Selected lexical and grammatical issues in
the Meaning-Text Theory, Benja-mins, 297-335.
Her, One-Soon 2003. Chinese inverted constructions within a simplified LMT. Journal of Chinese Linguistics,
monograph series 19 Lexical-Functional Grammar Analysis of Chinese
Huang, Chu-Ren. 1997. Corpus on web: Introducing the first tagged and balanced Chinese corpus. In: Proceedings
of the Annual Conference of the Pacific Neighborhood Consortium.
Li, Charles N, & Thompson, Sandra A.1981. Mandarin Chinese: A functional Reference Grammar. University of
California press.
Liu, Haitao. 2007. “Dependency Relations and Dependency Distance: a statistical view based on Treebank”. Pro-
ceedings or the Third International Conference on Meaning Text Theory (MTT), Klagenfurt, Austria.
LaPolla, Randy J. 1993. “Arguments against ‘subject’ and ‘direct object’ as viable concepts in Chinese”, in Bulletin
of the Institute of History and Philology 63.4:759-813.
Mel’čuk, Igor A. 1988. Dependency Syntax: Theory and Practice. SUNY Press, Albany, NY.
Mel’čuk, Igor A. 2001. Communicative Organization in Natural Language: The Semantic-Communicative Structure
of Sentences. John Benjamins, Amsterdam.
Paul, Waltraud 2005. The “serial verb construction” in Chinese: A Gordian knot. In. Oyharçabal B. & Paul W. 2005.
Proceedings of the workshop La notion de « construction verbale en série » est-elle opératoire ? December 9,
2004, Ehess, Paris.
Wu, Ching-huei Teresa 2002. Serial Verb Construction and Verbal Compounding. In Sze-Wing Tang & Chen-Sheng
Luther Liu, eds., On the Formal Way to Chinese Languages. CSLI, Stanford, California.