Svary
Svary
Svary
Agata Savary
Abstract
Similarly to simple words, compounds and other multi-word units (MWUs)
are subject to inflection. A correct and exhaustive treatment of this issue has
an important impact on natural language applications. However it raises some
nontrivial questions such as: the role of separators in MWUs, morphological
non-compositionality of MWUs, their syntactic and semantic variation, huge
sizes of inflection paradigms in highly inflected languages, etc. Due to such
problems, the inflectional description of MWUs must be, at least partly, lexi-
calized. We present a comparative review of eleven lexical approaches to this
issue, with respect to linguistic properties of those units. The review is based
on case studies of several natural languages. It allows us to put forward some
recommendations for a cross-language standard morphological description of
MWUs.
1
2 / L I LT VOLUME 1, ISSUE 2 J ULY 2008
1 Introduction
As shown by Habert and Jacquemin (1993), multi-word units encompass a
number of hard-to-define and controversial linguistic objects: compounds,
complex terms, multi-word named entities, multi-word lexemes and ex-
pressions, collocations, frozen expressions, etc. They may be contiguous
or non-contiguous, compositional or non-compositional sequences of words,
and may admit graphical, morphological, syntactic and semantic variation.
Numerous linguistic and pragmatic definitions of compounds and other
MWUs (Benveniste (1974), Downing (1977), Levi (1978), Bauer (1983),
Gross (1990), Anscombre (1990), Corbin (1992), Cadiot (1992), Silberztein
(1993b), Gross (1996), Sag et al. (2002), etc.) invoke three major points:
. they are composed of two or more graphical words
. they show some degree of morphological, syntactic, distributional or se-
. mantic non-compositionality
they have unique and constant references
However, the basic notions (a word, a reference, the non-compositionality)
and measures (degree of non-compositionality), used in those definitions are
themselves controversial. For instance, as shown below, the notion of a graph-
ical word may be application-dependent and/or language-dependent. Thus
(in accordance with the approaches whose comparative study we present be-
low), we consider a MWU as a sequence of graphical units which, for some
application-dependent reasons, has to be listed, described and processed as
a unit. In most cases the graphical units composing a MWU are themselves
morphologically analyzable. A broader discussion on how to define a MWU
is out of this paper’s scope.
The quantitative and qualitative importance of multi-word units in natural lan-
guages is now widely acknowledged. They are placed on the frontier between
morphology and syntax because of their hybrid nature: some of their proper-
ties are idiosyncratic (which suggests a lexicalized description), while some
others are productive (which is more easily reflected by a grammar). In this
study we are particularly interested in the inflectional properties of MWUs,
which are however often connected to phenomena on the graphical, syntactic
and semantic level.
Obviously, a reliable inflection processing of single words is a necessary
condition for the inflection processing of MWUs. However, this condition
is rarely a sufficient one. For example, in order to obtain the plural form of
chief justice and lord justice in English not only do we need to know how to
generate the plural of chief, lord and justice but also to know how different in-
flected forms of these constituents combine. For instance the following plural
forms are correct:
C OMPUTATIONAL I NFLECTION OF M ULTI -W ORD U NITS / 3
but not *chiefs justice and *chiefs justices. There are however few automat-
ically accessible hints indicating that the former compound is morphologi-
cally a standard English Noun-Noun phrase taking an s at its last constituent
in plural, while the plural of the latter one has three variants. Obviously, some
lexicalized description is needed in order to account for this idiosyncratic be-
havior.
A correct and exhaustive inflectional analysis and generation of MWUs is one
of the conditions for a high-quality natural language application. Studies con-
cerning automatic treatment of MWUs have been performed for two decades.
Some in-depth linguistic and computational approaches to word composition,
aiming at general language modeling, have co-existed with numerous robust
statistical methods, sometimes augmented with some linguistic knowledge.
Nowadays, there is a growing conviction in the NLP community that large
linguistic lexicons and grammars of MWUs are needed, due to their two char-
acteristics: (i) they represent a high percentage of items in natural language
corpora, (ii) most of them, taken separately, appear very rarely in corpora.
For instance, Gross and Senellart (1998) showed that more than 40% of all
tokens in a one-year corpus of the French journal Le monde belong to multi-
word units or expressions, and should not be analysed individually. Savary
(2000) proved that 85% of all graphically distinct compound noun forms ap-
pear less than twenty times in a one-year corpus of the Herald Tribune. Bald-
win and Villavicencio (2002) experimented with a random sample of two
hundred English verb-particle constructions and showed that as many as two
thirds of them appear at most three times in the Wall Street Journal corpus.
Sag et al. (2002) cite some studies considering the number of multi-word ex-
pressions as high as the one of single words, and argue that these figures are
an underestimate, especially in terminological sublanguages.
The main aim of our study is analyzing the state of the art in the lexicon-
oriented computational treatment of the (largely understood) inflectional mor-
phology of MWUs. This paper is organized as follows. In section 2 we per-
form a study of linguistic properties of MWUs, with a particular focus on
inflection, which we illustrate with examples in English (EN), French (FR),
Polish (PL), Serbian (SR), German (DE) and Turkish (TU). In section 3 we
study eleven existing lexical approaches to MWUs inflection in several nat-
ural languages. In section 4 we compare these approaches with respect to
how well they account for the linguistic properties shown. In section 5 we
conclude with some recommendations concerning cross-language universal
lexicalized description of MWUs.
4 / L I LT VOLUME 1, ISSUE 2 J ULY 2008
Squeezed MWUs
On the other hand, the obligatory presence of separators within MWUs is
questionable because some contiguous sequences of letters behave morpho-
logically as compounds:
(11) (EN) passerby, passersby
(12) (DE) Schul|kind, Schul|jahr, Schul|lehrer, . . . (‘pupil, school year, teacher’)
(13) (FR) bon|homme, bons|hommes (‘fellow’)
(14) (PL) chciał|bym, chciała|bym (‘I would like, in masculine and feminine’)
In the two former examples, perce is a genderless verb form, neige and or-
eille are feminine, while the compounds themselves are masculine. In the two
latter ones in and four cannot be considered as regular headwords because, as
individual words, they don’t admit plural. In the last example, if any of the
two nouns were the headword, it would always have to agree in number with
the whole compound, which is not the case.
Agreement Irregularities
As said before, in perfectly compositional MWUs the morphosyntactic struc-
ture of the multi-word lemma determines the agreement and government rules
imposed by the headword. These rules may be defied in three kind of situa-
tions:
. An agreement does not occur when it normally should. For instance, the
compound noun:
(23) (FR) grand-mère, grand-mères, grands-mères (‘grandmother’)
C OMPUTATIONAL I NFLECTION OF M ULTI -W ORD U NITS / 7
(26) (PL) majster klepka, majstra klepki, etc. (literally ‘master floorboard’ = ‘an
incompetent’)
do not admit singular forms, even if a bit and a piece, as well as zimna noga,
zimnej nogi, etc., are syntactically correct sequences (in singular these phrases
loose their particular sense). Note that the above examples differ from the
ones whose inflection is fixed but not defective, such as cross-roads:
(30) (EN) The bits and pieces he usually kept in his pocket were now on the table.
(31) (EN) *The bits and pieces he usually kept in his pocket is now on the table.
(32) (EN) All cross-roads in the main street were blocked by the police.
(33) (EN) The cross-roads in front of my house was blocked due to an accident.
Note also that the non-existence of a particular inflected form is not always
a proof of the inflectional non-compositionality of a compound, as it may
simply result from the inflection restrictions of the headword. For instance,
the following compounds:
(34) (EN) security police
(35) (FR) funerailles nationales (‘national funeral’)
(36) (PL) krótkie spodnie (‘shorts’)
do not admit a singular form due to the fact that their head nouns police,
funerailles and spodnie are themselves plural-only nouns.
2.4 Inflection and variation
According to Savary and Jacquemin (2003), inflected forms of compounds
belong to a more general phenomenon of terminological variation. In particu-
lar, variants may result from separator alternation, as in (8), as well as a large
range of other linguistic transformations:
. Insertions:
(37) (FR) moniteur temps réel, moniteur en temps réel (‘real-time monitor’)
. Omissions:
(38) (SR) profesor engleskog jezika, profesor engleskog (‘teacher of the English
language’)
. Order change:
C OMPUTATIONAL I NFLECTION OF M ULTI -W ORD U NITS / 9
approach. For the sake of human efficiency large numbers of forms should be
describable by compact rules. At the same time the formalism should be pre-
cise enough to avoid overgeneralization and overlooking of exceptions. See
appendix 2.4 to appreciate the size of an inflectional paradigm of a Serbian
compound noun.
The notion of a base form is essential in the morphological analysis and gen-
eration of inflected forms. It may be seen either as the canonical representative
of the inflection paradigm, or just its identifier. In the first case the base form
belongs itself to the paradigm (i.e. it is a linguistically correct form, called a
lemma). In the second case it may well be an abstract (linguistically incorrect)
form. Consider for instance:
(48) (EN) customs barrier, customs barriers
(49) (FR) mémoire vive, mémoires vives (literally ‘live memory’=‘random access
memory’)
where mémoire is a feminine noun and vive is the feminine form of the ad-
jective vif. These sets of compound forms may be represented either by their
first elements or by “abstract” forms custom barrier and mémoire vif. For
an efficient usage and treatment of MWUs by humans (e.g. consulting MWU
lexicons, or validation of automatically extracted candidate terms), the former
solution is more appropriate.
2.6 Noncontiguous MWUs
Multi-word expressions (MWEs), particularly those containing verbs, are
MWUs which may appear in the corpus as noncontiguous sequences of items,
as in:
(50) (EN) He has finally made up his bloody mind. (the MWE’s components are
underlined)
lemma: cousin
(53) cousin ⇒ gender: masc
number: sing
lemma: cousin
cousins ⇒ gender: masc
number: pl
lemma: cousin
cousine ⇒ gender: fem
number: sing
lemma: cousin
cousines ⇒ gender: fem
number: pl
gender: masc
cousin germain number: sing
(54) ⇒
code: N32 code: A32 gender inflection: yes
number inflection: yes
Inflection variants, such as in (22) through (24) and (37) through (47), require
separate lexicon entries, for instance:
gender: fem
toile d’araignée number: sing
(56) ⇒
code: N21 gender inflection: no
number inflection: yes
gender: fem
number: pl
toiles d’araignées ⇒
gender inflection: no
number inflection: no
while the second one produces the plural variant attached to a different
lemma:
lemma: toiles d’araignées
(58) toiles d’araignées ⇒ gender: fem
number: pl
Since the inflection codes for simple words may only apply to their lemmas
and do not allow to transform any inflected form directly into another in-
flected form, it is unclear how abstract base forms would be treated in this
approach. For instance in example (49) the lemma of the second constituent
is the masculine form vif. If the compound lemma is:
gender: fem
mémoire vif number: sing
(59) ⇒
code: N21 code: A38 gender inflection: no
number inflection: yes
14 / L I LT VOLUME 1, ISSUE 2 J ULY 2008
then it is unclear how the rule of making both constituents agree is applied if
no gender inflection is allowed. If however the lemma is:
gender: fem
mémoire vive number: sing
(60) ⇒
code: N21 code: A21 gender inflection: no
number inflection: yes
then the adjective vive implies an artificial adjectival lemma having only the
feminine forms vive and vives. Similar doubts apply to exocentric compounds
in which default agreement of the inflected constituents is impossible.
The inflection tool accompanying this formalism needs adaptation to the mor-
phological model of each new language if only new inflection categories (e.g.
case) or values (e.g. neuter gender) are needed.
English DELAC
In Savary (2000) the English DELAC of 60,000 compounds lemmas and the
corresponding DELACF of 110,000 inflected forms are constructed. The pre-
vious model is enlarged in that: (i) simple constituents in a compound lemma
are annotated by their DELAF-entries, (ii) characteristic constituents, i.e. the
headword and the words agreeing with it, are pointed out, (iii) exceptional
forms are explicitly described.
Examples (16) and (17) are represented by the following samples:
man servant
1 2 class: noun
number: sing
(61) lemma: man lemma: servant ⇒
inflection: {number}
code: N8 code: N1
char. const.: {1, 2}
number: sing number: sing
man eater
1 2 class: noun
number: sing
(62) lemma: eater ⇒
inflection: {number}
code: N1
char. const.: {2}
number: sing
form is concerned). This process yields man servant and man eater in singu-
lar, as well as men servants and man eaters in plural.
The formalism adapts to a different language via a morphology configuration
file specifying the possible inflection classes (number, gender, case, etc.) and
their possible values (singular, feminine, nominative, etc.).
Thus, the French DELAC-entries (54) and (60) can be described as follows:
mémoire vive
1 2 class: noun
gender: fem
lemma: mémoire lemma: vif
⇒ number: sing
code: N21 code: A38
inflection: {number}
gender: fem gender: fem
char. const.: {1, 2}
number: sing number: sing
Note that annotating the inflected components with their DELAF-entries al-
lows to avoid both abstract base forms for compounds and artificial lem-
mas for simple words (cf. examples (59) and (60)). Here, the plural form
vives is not obtained directly from vive but from the attached lemma vif. The
DELACF entries obtained from this description contain the same non-abstract
lemma mémore vive:
Here, the entry inflects in number, and has no characteristic constituent (it
is exocentric). Thus its plural formation is not done by default (i.e. not by
inflecting the characteristic constituents), but follows two exception rules:
one needs to inflect the first constituent into plural and leave the second con-
stituent unchanged (attorneys general), or conversely (attorney generals).
A unification formalism allows to compactly express large paradigms con-
cerned by agreement rules. Thus, example (26) may be represented as fol-
lows:
majster klepka
1 2
lemma: majster lemma: klepka
(66) code: N1-er code: N4-ka
case: nomin case: nomin
gender: masc-hum gender: fem
number: sing number: sing
⇓
class: noun
case: nomin
gender: masc-hum
number: sing
inflection: {number, case}
char-const: {1}
exception: $Number ← h1:$Numberi h2:$Numberi
exception: $Case ← h1:$Casei h2:$Casei
The exception rules use unification variables $Number and $Case to indicate
that any number and any case of a compound is obtained by inflecting and
unifying both constituents, despite the fact that only the first one is character-
istic.
As seen in the appendix, section 1.2, one drawback of this formalism is the
distribution of the morphological description between the compound entry
and the preamble (i.e. lines beginning with % in appendix 1.2 and describing
the characteristic constituents and the exception rules). An entry may not be
regarded independently from the sublist it appears in. In particular sorting the
textual lexicon is not allowed.
C OMPUTATIONAL I NFLECTION OF M ULTI -W ORD U NITS / 17
The formalism also suffers from the lack of expressive power allowing the
attachment of orthographic, syntactic and semantic variants to a common
lemma. Thus, examples (37) through (47) need separate entries for each vari-
ant.
Greek DELAC
In Kyriacopoulou et al. (2002), the set of all inflected forms of a Greek com-
pound may be obtained by the application of restriction filters to the set of
all possible combinations of the inflected forms of the particular constituents.
For instance in (67), N33403, DET and N125 are inflection codes for the
three constituents. The third component always remains in genitive singular
(see the restriction filter), while the two others may inflect freely.
‘a key to paradise’
1 2 3 class: noun
(67) ⇒
restriction: h3:genit,
code: N33403 code: DET code: N125
singi
figure 1 describes the regular French compounds inflecting like cousin ger-
main (cf. example (54)). Morphological categories Gen and Nb, as well as
their corresponding morphological values ({sing, pl}, {masc, fem}, etc.), are
language-dependent. The first constituent ($1), here cousin, inflects in gender
(Gen) and number (Nb). The unification variables assigned to each of these
categories ($g and $n) may take any value of the respective category domains
({masc,fem} and {sing,pl}, respectively). The second constituent ($2), here
the blank space, remains unchanged (no operator present in this box). The
third constituent, here germain, inflects similarly to the first one. The unifi-
cation variables are common in the first and the third box, which means that
C OMPUTATIONAL I NFLECTION OF M ULTI -W ORD U NITS / 19
and the total exploration of the graph NC_NXAmf results in a list of DELACF
entries similar to (55).
Note that if the unification mechanism were not available, each of the four
inflected forms of lemmas like (70) would have to be described by a separate
path in the graph in fig. 1 (the first path imposing the singular masculine form
for the first and the third constituent, the second one imposing the mascu-
line plural, etc.). In Slavic languages such method would rapidly turn into a
nightmare, with paradigms containing several dozens of forms, each of which
would need a separate path in the corresponding graph.
A value inheritance operator allows to assign the same inflection graph to
lemmas inflecting similarly but having different inflection values. For in-
stance the graph on figure 2 applies to both entries below:
despite their different gender. Note that figure 2 differs from 1 only by the
double assignment (‘==’) of variable $g to Gen in the first box. This operator
means that variable $g may take only one gender value - the value that the
20 / L I LT VOLUME 1, ISSUE 2 J ULY 2008
FIGURE 2 Multiflex inflection graph NC_NXA for mémoire vive and cordon bleu
The gender of the compound is inherited from its first constituent ($1:Gen==$g).
The fifth constituent may be either left intact (upper path) or it may agree in
number with the first constituent ($5:Nb=$n). The lack of the category-value
equation for the gender of the fifth constituent means that its gender never
changes, and does not influence the gender of any other components.
<$5>
<$1:Gen==$g;Nb=$n> <$2> <$3> <$4>
<Gen=$g;Nb=$n> <$5:Nb=$n>
Since individual components of the lemma are referred to via ordinal vari-
ables $1, $2, etc., deletions, insertions, duplications, and order changes of
components may be expressed, as in variants (37) through (40), and (45)
through (47). For instance, entry (73) and figure 4 describe example (45).
The first constituent may be either unchanged, or inflected into plural and
C OMPUTATIONAL I NFLECTION OF M ULTI -W ORD U NITS / 21
<$1:Nb=p> <Nb=$n>
’
boundaries can be freely defined, and are not limited to blanks and punctua-
tion marks, as in examples (3)-(14).
In Krstev et al. (2006a) the system has been tested for about 1,100 Serbian
compound nouns6, and in Savary et al. (2007) it is used to describe samples of
inflectionally non-compositional and irregular compounds in French, Polish
and Serbian. In Krstev et al. (2006b) it has been integrated into a platform al-
lowing efficient creation, interconnection and maintenance of heterogeneous
linguistic resources, such as simple and compound word lexicons, wordnets,
etc. Thus, the annotation of simple constituents within compounds can be
automated via real-time access to the underlying DELAF dictionaries.
3.2 Cascaded finite-state approaches
The two-level morphology implemented in the finite-state lexicon compiler,
lexc, accompanied by the regular-expression compiler, xfst, by Beesley and
Karttunen (2003), has provided a framework for several approaches to multi-
word processing.
Lexc
Karttunen et al. (1992) and Karttunen (1993) contain a case study of French
compositional and non-compositional compounds. Their morphological de-
scription is considered as a typical application for composition of two-level
rules.
Firstly, simple words are listed in regular-grammar lexicons, such as the one
in example (75). Here a sample noun lexicon contains two words, démocrate
and social, together with their continuation classes, Nmf and Adj. The contin-
uation classes themselves are related to sets of sequences of terminal symbols
(+N, +Adj, +Sg, etc.) and other continuation classes (Gender, Number, Masc,
Fem).
(75) Multichar_Symbols +N +Adj +Masc +Fem +Sg +Pl
LEXICON Root Nouns ;
LEXICON Nouns démocrate Nmf ; social Adj ;
LEXICON Nm +N Masc ;
LEXICON Nmf +N Gender ;
LEXICON Adj +Adj Gender ;
LEXICON Gender Masc ; Fem ;
LEXICON Masc +Masc Number ;
LEXICON Fem +Fem Number ;
LEXICON Number +Sg # ; +Pl # ;
Further, the lexicon may be composed with a set of lexical alternation rules
such as: ‘an l is replaced by a u if it appears after an a and before cate-
gory and gender labels, except +Adj+Fem’. This rule allows e.g. to trans-
form the lexical entry social+Adj+Masc+Pl into an intermediate form so-
ciau+Adj+Masc+Pl. Another possible rule is: ‘the +Pl label is replaced
by an x if it appears after an a or an e, followed by a u, followed by cat-
egory and gender labels, except +Adj+Fem’. This second rule applied to
sociau+Adj+Masc+Pl yields sociau+Adj+Mascx. Finally, two other rules
‘delete +Adj’ and ‘delete +Masc’ allow to obtain the surface form sociaux.
All such alternation rules may be composed into one transducer which maps
any lexical form with the corresponding surface form as in example (77).
(77) démocrate+N+Fem+Sg démocrate
démocrate+N+Fem+Pl démocrates
démocrate+N+Masc+Sg démocrate
démocrate+N+Masc+Pl démocrates
social+N+Fem+Sg sociale
social+N+Fem+Pl sociales
social+N+Masc+Sg social
social+N+Masc+Pl sociaux
After having added compounds to the lexicon we also enlarge the alterna-
tion system by adding new rules allowing feature insertions or propagations
24 / L I LT VOLUME 1, ISSUE 2 J ULY 2008
from the whole compound to the individual constituents if the ˆan occurs.
Examples of such rules are: (i) ‘insert +Adj at the end of the first constituent
(after this operation the ˆan marker disappears)’, (ii) ‘recopy the gender of the
whole sequence at the end of the first constituent’, (iii) ‘recopy the number
of the whole sequence at the end of the first constituent’. By applying such
rules to the second form in (80) we obtain the third form, which is further
transformed into the fourth (surface) form. Similarly, the three other possible
derivations within lexicon (78), composed with alternation rules, complete
the inflectional paradigm of the compound by describing the surface forms
social-démocrate, sociale-démocrate, and sociales-démocrates.
(80) social-démocrate+N+Masc+Pl
↓
social-démocrateˆan+N+Masc+Pl
↓
social+Adj+Masc+Pl-démocrate+N+Masc+Pl
↓
sociaux-démocrates
The underlying formalism, a lexical transducer, is a mathematically well de-
fined and elegant tool, which allows the whole cascade of rules to be per-
formed in one processing step only. Moreover, the bi-directionality of the
transducer allows it to perform both the morphological analysis and genera-
tion, i.e. assigning the last (surface) form to the first (lexical) form in (80),
and conversely.
Exocentric compounds and inflectional irregularities, as in examples (18)-
(26), may be expressed by attributing appropriate continuation classes to
compounds in the lexicon, and by designing adequate alternation rules as-
signed to these classes. For instance, example (23) can be added to the above
lexicon via the entries in (81) and two alternation rules: (i) ‘insert +Adj+Masc
at the end of the first constituent if the ˆamnff marker appears; after this oper-
ation the marker disappears’, (ii) ‘recopy the number of the whole sequence
at the end of the first constituent if the ˆamnff marker appears’. The two pos-
sible lexicon derivations in this lexicon are (82) and (83). The former relates
the first and the second form in (84), the application of the alternation rules
above yields the third form in the same example, and the application of al-
ternation rules for simple words results in the fourth form. Derivation (83)
introduces no compound-oriented morphological annotation (such as ˆan or
ˆamnff ), thus only the alternation rules for the final constituent apply, yielding
the second form in (85).
(81) Multichar_Symbols ˆamnff
LEXICON Nouns grand-mère AmN2f ;
LEXICON AmN2f 0:ˆamnff Nf ; Nf ;
LEXICON Nf +N Fem ;
C OMPUTATIONAL I NFLECTION OF M ULTI -W ORD U NITS / 25
compound inflection, however few examples are given on this subject in the
reference bibliography.
IDAREX
IDAREX Breidt et al. (1996) uses an additional regular expression layer over
lexc for the description of German multi-word expressions (MWEs), in par-
ticular verbal ones, and their variation. Inflected forms are represented by
regular expressions which may refer either to base forms or to surface forms
of simple components. Morphological features of each component may be
restrained. Optional components and insertions may be indicated. Syntactic
transformations may be listed within one paradigm. For instance, in the fol-
lowing expression:
(87) [ :den (:schönen) :Schein (:zu) wahren |
wahren Vfin: (ADV* NPnom) ADV* :den (:schönen) :Schein ]
the first line accounts for the infinitive expression den (schönen) Schein (zu)
wahren (‘keep up appearances’), in which the verb wahren may take any
inflected form, while all other components are limited to their literal forms
appearing after the ‘:’ character. The second line describes variants of the
same expression in which the verb comes first and is limited to any of its
finite forms (Vfin:), and adverbs and personal pronouns may be inserted be-
tween the verb and the rest of the components, as in dabei wahrt er immer
den Schein.
Numerical variables $1, $2, etc. may be assigned to the components of the
base form, which allows to generically express omissions, duplications and
order changes of components. For instance the following macro can be used
instead of numerous complex rules such as (87):
(88) [ $2 Vfin: (ADV* NPron) ADV* $1
| $1 (:zu) $2 V: ]
It expresses the fact that many Verb Object idioms in German, such as den
(schönen) Schein wahren or die Ohren spitzen (‘prick up one’s ears’), may ap-
pear either in a finite or an infinite form, with optional adverbial and pronom-
inal insertion. Variables $1 and $2 get instantiated to the verb (e.g. wahren
or spitzen) and to the object (e.g. den (schönen) Schein or die Ohren) of the
MWE in question. That instantiation yields a rule similar to (87) for any ade-
quate idiom it is applied to.
Non-compositional and irregular nominal compounds and most of their vari-
ants are describable by this formalism. For instance, examples (22) and (46)
can be represented either by the specific rules (89) and (90) or by the generic
ones (91) and (92).
(89) [ :attorney general N: | attorney N: :general ]
C OMPUTATIONAL I NFLECTION OF M ULTI -W ORD U NITS / 27
The main problem in this formalism seems the fact that no inflectional fea-
tures may be assigned to MWEs’ inflected forms. Thus, one may identify
sequences in a corpus but not perform their morphological analysis or gener-
ation.
Another important drawback is that even if a unification mechanism has been
envisaged it has not been implemented up to our knowledge. Thus, each time
a feature agreement takes place in a compound, all the inflected forms have to
be enumerated explicitly, which is particularly inefficient for large inflection
paradigms, e.g. in Slavic languages. For instance example (39) would need a
rule containing several dozens of alternatives if the description were supposed
not to admit ungrammatical forms.
Multi-word processor of Turkish
Oflazer et al. (2004) describe a multi-word processor for Turkish, which is
a highly-inflective and concatenative language. That tool takes the so-called
lexicalized (invariable), semi-lexicalized (morphologically variable) and non-
lexicalized (duplication- and contrasting-based) collocations and named enti-
ties into account. All those units are contiguous sequences of tokens. Inflec-
tional issues are addressed in all those types of MWUs except the first one.
The MWU processor first runs a text tokenizer, a morphological analyzer and
a guesser, all three based on the Xerox finite-state lexicon compiler (Beesley
and Karttunen, 2003). Then the MWUs are recognized by a three-stage cas-
cade of Perl rules: first the lexicalized collocations are identified, then the
non-lexicalized ones, and finally the semi-lexicalized ones. The rules allow
to transform sequences of simple words with their morphological interpreta-
tions into compounds with their own morphological features. For instance the
sequence in example (94), corresponding to the surface string (93), is trans-
formed into the compound interpretation in example (95)7.
(93) uyur uyumaz (literally, ‘(he) sleeps (he) does not sleep’)
(94) uyu+Verb+Pos+Aor+A3sg uyu+Verb+Neg+Aor+A3sg
(95) uyu+Verb+Pos+ˆDB+Adverb+AsSoonAs (‘as soon as he sleeps’)
TABLE 1 HABIL table describing the components of the MWE begi bistan egon
TABLE 2 HABIL table describing the surface realizations of the Basque MWE begi bistan egon and of the French MWU toile d’araignée
30 / L I LT VOLUME 1, ISSUE 2 J ULY 2008
nent (begi) is inflected into absolutive non-definite form, while its second one
(bistan) remains uninflected, and its third one (egon) may be inflected to any
of its existing forms. These constraints may correspond for instance to the
following corpus occurrence: ez dago horren begi bistan (‘it is so evident’).
This approach allows for an exhaustive description of inflected forms of a
MWE, together with some of its variants resulting from omissions, dupli-
cations, and order changes of constituents. Thus, examples like (23) through
(29), (38) through (40), and (47) through (49) seem describable in this model.
For instance, the lower part of table 2 shows a possible description of exam-
ple (24), in which two rules are necessary to express the plural variant, but
the corresponding lemma is unique (unlike examples (56)-(58)).
The formalism also allows for non-abstract base forms (cf. examples (48)
and (49)), as well as for insertions, however the inserted elements may not be
specified, which is needed in examples like (37), (45) and (46). It is unclear
how the inflection features are determined for exocentric compounds (exam-
ples (18) through (22)). One drawback, for languages with large inflectional
paradigms (see section 2.5), is that if agreement constraints occur within a
MWE then each of its inflected forms needs a separate entry in the database
(e.g. several dozens of entries for most compound nouns in Slavic languages).
An additional unification mechanism could solve this inconvenience. Another
disadvantage results from the fact that separators are not considered as com-
ponents, thus it seems impossible to account for their insertions, deletions or
replacements (cf. examples (8) and (45)).
The MWEs are divided into several classes with respect to their semantic
compositionality and their syntactic variability.
Fixed expressions (e.g. by and large, every which way, ad hoc) are seen as
‘words with spaces’ as they defy conventions of grammar and admit no mor-
phological or lexical variability. For instance, example (96) describes ad hoc
as a simple concatenation of two tokens, functioning as an intransitive adjec-
tive (intr_adj_l) and allowing no syntactic variation.
attested in corpora.
32 / L I LT VOLUME 1, ISSUE 2 J ULY 2008
(99)
arity : 2
‘arter′
lemma :
‘N ′
cat :
lexicalization : 1
in f lection : 2
0:
agreement : 2
cat : ‘N ′
agreement : 2
‘umbilical ′
lemma :
‘A′
1: cat :
in f lection : 1
2: 1
Terminological variants are expressed in FASTR via transformations repre-
sented by metarules (a concept introduced in a number of unification-based
formalisms in order to reduce the grammar size). For instance, metarule (100),
when unified with rule (98), produces the new rule (101) which matches co-
ordination variants such as umbilical or carotid artery.
(100) Metarule Coord(N1 → A2 N3 ) ≡ N1 → A2 C4 A5 N3 :.
(101) Rule N1 → A2 C4 A5 N3 :
hN1 lexicalizationi =
˙ N3
hA2 lemmai = ˙ ‘umbilical’; hA2 inflectioni =
˙ 1
hN3 lemmai = ˙ ‘arter’; hN3 inflectioni =
˙ 2
hN1 agreementi = ˙ hN3 agreementi.
Metarules also allow to express derivational variants, provided that the deriva-
tional morphology of the simple components is described. For instance, the
sequence tension artérielle in example (41) may be expressed by the com-
pound term rule (104), while the variant tension des artères is obtained by
unifying metarule (105) with rule (104) and with the word descriptions (102)
and (103)10. The key element here is the first constraint of rule (105) impos-
ing that the second noun of the variant (here: artères) has the same root as the
adjective of the base term (here: artérielle).
(102) Word ‘artère’:
hcati =
˙ ‘N’; hsecondary rooti =
˙ ‘artér’; hinflectioni =
˙ 21.
(103) Word ‘artériel’:
hcati =˙ ‘A’; hinflectioni =
˙ 2; hroot cati =
˙ ‘N’;
hroot lemmai = ˙ ‘artère’; hhistoryi =
˙ ‘?ielle’.
(104) Rule N1 → N2 A3 :
hN1 lexicalizationi =
˙ N2
10 The ‘?’ sign in the derivation suffix ‘?ielle’ refers to the secondary lemma artér of the word
hN2 lemmai =
˙ ‘tension’; hN2 inflectioni =˙ 1
hA3 lemmai =
˙ ‘artériel’; hA3 inflectioni =
˙ 2
hN1 agreementi =
˙ hN2 agreementi = ˙ hA3 agreementi.
(105) Metarule AdjToNoun(N1 → N2 A3 ) ≡ N1 → N2 D4 N5 :
hA3 rooti =
˙ hN5 rooti
hD4 lemmai = ˙ ‘de’; hD4 inflectioni =
˙ 1
hD4 agreement numberi = ˙ hN5 agreement numberi
Since in exocentric compounds, such as (18) through (22), the feature propa-
gation cannot be performed, the morphology of the resulting compound must
be indicated explicitly, as in rule (106).
(106) Rule N1 → V2 N3 :
hV2 lemmai =
˙ ‘perce’; hV2 inflectioni =˙ 1;
hV2 agreement tensei =
˙ ‘present’; hV2 agreement personi =
˙ 3
hV2 agreement moodi =˙ ‘indicative’
hN3 lemmai =
˙ ‘neige’; hN3 inflectioni =˙ 1;
hN3 agreement numberi = ˙ ‘singular’
hN1 agreement genderi =˙ ‘masculine’
hN1 agreement numberi = ˙ ‘singular’ | ‘plural’.
Compounds admitting variants of inflected forms, as in examples (21) through
(24), may also be described via metarules, which however need to be lexi-
calized in order to avoid spurious variants for regular constructions. For
instance, rule (107) matching attorney general and attorney generals, when
unified with metarule (108), matches the plural variant attorneys general. The
5th constraint in rule (107) does not allow to interpret attorneys general as a
singular form.
(107) Rule N1 → N2 N3 :
hN1 lexicalizationi =
˙ N2
hN2 lemmai = ˙ ‘attorney’; hN3 lemmai = ˙ ‘general’
hN2 inflectioni =
˙ hN3 inflectioni =
˙ 1
hN2 agreement numberi = ˙ ‘singular’;
hN1 agreementi = ˙ hN3 agreementi.
(108) Metarule DoublePlural(N1 → N2 N3 ) ≡ N1 → N4 N5 :
hN2 lemmai =
˙ hN4 lemmai =˙ ‘attorney’
hN3 lemmai =
˙ hN5 lemmai =˙ ‘general’
hN3 agreement numberi =
˙ hN4 agreement numberi =
˙ ‘plural’
hN5 agreement numberi =
˙ ‘singular’.
Variation schemes which appear systematically up to some exceptions may
be expressed by general (i.e. non-lexicalized) metarules accompanied by ne-
gative metarules. For instance, the sequence bezwzgl˛edna wi˛ekszość in ex-
ample (39) may be represented by rule (109), while its variant wi˛ekszość
bezwzgl˛edna results from unifying this rule with metarule (110). However, a
C OMPUTATIONAL I NFLECTION OF M ULTI -W ORD U NITS / 35
number of Adjective Noun compounds in Polish, such as dobre imi˛e (‘a good
reputation’), do not admit inversion of their constituents. Such exceptions
may be described by negative metarules. For instance, the negative metarule
(111), when unified with a rule describing dobre imi˛e, analogous to rule (109),
matches the invalid variant *imi˛e dobre. During corpus processing FASTR re-
jects all sequences that have been matched by negative metarules, thus *imi˛e
dobre will not be admitted as a variant of dobre imi˛e.
(109) Rule N1 → A2 N3 :
hN1 lexicalizationi =
˙ N3
hA2 lemmai = ˙ ‘bezwzgl˛edny’; hA2 inflectioni = ˙ 1
hN3 lemmai = ˙ ‘wi˛ekszość’; hN3 inflectioni =
˙ 45
hN1 agreementi = ˙ hA2 agreementi = ˙ hN3 agreementi.
(110) Metarule InvPlural(N1 → A2 N3 ) ≡ N1 → N3 A2 :.
(111) Metarule NInvPlural(N1 → A2 N3 ) ≡ N1 → N3 A2 :
hA2 lemmai =˙ ‘dobre’; hN3 lemmai =
˙ ‘imi˛e’.
4 Comparative study
Tables 3 and 4 present a comparative summary of the approaches presented
in section 3. The features appearing in the first column correspond to the lin-
guistic properties of MWUs discussed in section 2. The meaning of a ‘X’
character, a ‘×’ character, and a ‘?’ character is, respectively, that the corre-
sponding approach accounts for the given property, it does not account for the
property, or it is unclear if it does. In particular, we suppose that:
. Separators are allowed to have a status of MWU’s constituents, if examples
like (3) through (5) can be described, and if the sequences in example (8)
11 The same criticism was leveled against the cascaded finite-state morphology models (cf.
section 3.2), however in FASTR the degree of the dependency between rules is much lower.
36 / L I LT VOLUME 1, ISSUE 2 J ULY 2008
. accepted.
Insertions and omissions are accounted for, if variants containing extra el-
ements, as en in example (37), can be attached to a lemma which does not
contain this element, and if variants missing some constituent, can be at-
tached to the lemma in which this constituent appears, as in (38).
. Order change is taken into account, if variants like (39) can be attached to
the same lemma.
. Forms resulting from component duplication should be attached to a lemma
where this component is not duplicated, as in example (40).
. Derivational and semantic variants should be related to their base forms
containing no derived form and no semantic replacement, as in (41) and
(42).
. Abbreviations should be attached to their full forms, as in (43) and (44).
. Unification is necessary for a compact representation of huge inflection
paradigms of MWUs, especially those in which agreement rules apply
within constituents (cf. example (15) and section 2.5).
. The lemma of a MWU is non-abstract, if it is a linguistically correct form
(cf. examples (48) and (49)).
. Non-contiguous MWUs are treated, if extra elements, not belonging to an
inflected form, are admitted within this form in a corpus, as in example
(50).
. The morphological description of MWUs is non-redundant if there is a
unique representation of the inflectional behavior of simple words appear-
ing in MWUs (cf. section 3.1 for a counterexample).
. Inflectional analysis and generation are two computational applications for
which a MWU description module should be accessible.
. An automated MWU lexicon creation is a facility of a computational plat-
form allowing to avoid as much manual lexicographic work as possible. It
may rely for instance on exploitation of the existing resources for simple
words in order to annotate the components of MWUs.
C OMPUTATIONAL I NFLECTION OF M ULTI -W ORD U NITS / 37
. section 3.4).
The formal tool is a theoretical framework used either for the description of
MWUs, or for their internal representation and treatment.
. The number of MWUs described refers to the MWUs’ base forms, and not
. their inflected forms.
The language indicated is the one concerned by the experiments described
in the bibliography.
The data presented in tables 3 and 4 confirm the importance of compositional
phenomena in natural languages. Different NLP schools have been studying
these phenomena to a varying extent, and those presented here propose lexi-
calized approaches, i.e. multi-word units are explicitly listed and their linguis-
tic behavior is described either by explicit shared paradigms (e.g. inflectional
codes in the DELA school), or by lexicalized grammars in which separate
rules may interfere (e.g. alternation rules in lexc, or rules and metarules in
FASTR). One interesting type of MWUs, duplications (cf. section 3.2), has
been treated by non-lexicalized patterns.
The results presented are quantitatively very different. Some approaches rely
only on samples of less than several hundred entries, some others judge one or
two thousand entries as sufficiently representative, while the remaining ones
have achieved a large-scale description of tens of thousands of MWUs. In
particular, most features appearing in table 4 for lexc imply the pertinence of
this approach to the morphological treatment of MWUs, however, they need
an experimental confirmation in real-size MWU lexicons.
The linguistic properties discussed in section 2 are only partly addressed in
the references papers. The appreciation of these phenomena is not necessarily
better with a growing number of the entries described.
Some particularly discriminating features are:
. Separators, whose role in MWUs is underestimated by the majority of the
. approaches.
Some idiosyncratic aspects of the inflection of MWUs (exocentricity and
agreement irregularities), which are not addressed by some approaches, al-
though they belong to the fundamental properties of these units.
. Defective paradigms whose importance has been identified by virtually all
approaches.
. Derivational and semantic variants of MWUs, which are explicitly treated
only by FASTR (we suppose though lexc’s and LinGO’s pertinence for
Separators as
1 × × × X X
constituents
2 Squeezed MWUs × × × × X
3 Exocentric MWUs X X × X X
4 Irregular agreement × X × X X
5 Defective paradigms X X X X X
Insertions
6 × × × X X
and omissions
7 Order change × × × × X
8 Duplications × × × × X
9 Derivational variants × × × × ×
10 Semantic variants × × × × ×
11 Abbreviations × × × X ×
12 Unification × X × × X
13 Non-abstract lemmas × X × X X
14 Non-contiguous MWUs × × × × ×
15 Non-redundancy X X X × X
40 / L I LT VOLUME 1, ISSUE 2
16 Infl. analysis X X X X X
17 Infl. generation X X X X X
Automated MWU
18 × × × × X
lexicon creation
19 Sense computation × × × × ×
text filters, sublists, restriction cut-and-paste graphs,
20 Formal tool
FSTs FSTs filters, FSTs rules, FSTs FSTs
Number of
21 126,000 60,000 27,000 ? 2,822
MWUs described
22 Language French English Greek English Serbian
TABLE 3 Comparative features of tools for MWU inflectional description. DELA dictionaries.
lexc IDAREX Oflazer et al. HABIL LinGO FASTR
(1992) (1996) (2004) (2004) (2004) (2001)
Separators as
1 X X ? × × X
constituents
2 Squeezed MWUs X ? ? × × ×
3 Exocentric MWUs X X ? ? ? X
4 Irregular agreement X X ? X ? X
5 Defective paradigms X X ? X ? X
Insertions
6 X X ? X ? X
and omissions
TABLE 4 Comparative features of tools for MWU inflectional description. Other approaches.
42 / L I LT VOLUME 1, ISSUE 2 J ULY 2008
Acknowledgments
The author is grateful to Jean-Yves Antoine, as well as to three anonymous
reviewers, for their critical remarks on a previous version of this study.
References
Alegria, Inaki, Olatz Ansa, Xabier Artola, Nerea Ezeiza, Koldo Gojenola, and Ruben
Urizar. 2004. Representation and Treatment of Multiword Expressions in Basque.
In Second ACL Workshop on Multiword Expressions, July 2004, pages 48–55.
Anscombre, Jean-Claude. 1990. Pourquoi un moulin à vent n’est pas un ventilateur.
Langue Française 86:103–125.
Baldwin, Timothy and Aline Villavicencio. 2002. Extracting the Unpredicatble: A
Case Study on Verb-particles. In Sixth Conference on Computational Natural Lan-
guage Learning (CoNLL)-2002, pages 99–105.
Bauer, Laurie. 1983. English Word-Formation. Cambridge University Press.
Beesley, Kenneth R. and Lauri Karttunen. 2003. Finite State Morphology. CSLI.
Benveniste, Emile. 1974. Fondements syntaxiques de la composition nominale.
Formes nouvelles de la composition nominale, pages 145–176. Gallimard, Paris.
Breidt, Elisabeth, Frédérique Segond, and Guiseppe Valetto. 1996. Formal Descrip-
tion of Multi-Word Lexemes with the Finite-State Formalism IDAREX. In Pro-
ceedings of COLING-96, Copenhagen, pages 1036–1040.
Cadiot, Pierre. 1992. A entre deux noms : vers la composition nominale. Lexique
11:193–240.
Calzolari, Nicoletta, Charles J. Fillmore, Ralph Grishman, Nancy Ide, Alessandro
Lenci, Catherine MacLeod, and Antonio Zampolli. 2002. Towards Best Practice
44 / L I LT VOLUME 1, ISSUE 2 J ULY 2008
%+/+
cordon(cordon.N1:ms) bleu(bleu.A32:ms),N:ms/+N
cousin(cousin.N32:ms) germain(germain.A32:ms),N:ms/+N+G
mémoire(mémoire.N21:fs) vive(vif.A38:fs),N:fs/+N
%+/-/-
%p:p/-/-
%p:p/-/p
toile(toile.N21:fs) d’araignée(araignée.N21:fs),N:fs/+N
%+/+
zimne(zimny.A-ny:Mfp) nogi(noga.N4-ga:Mfp),N+AN:Mfp/+C
%+/-
%N:N/N
%C:C/C
majster(majster.N1-er:Mos)
klepka(klepka.N4-ka:Mfs),N+NN:Mos/+N+C
attorney(attorney.N1:s) general(general.N1:s),NC_NXN1
bas-relief(relief.N7:s),NC_XXN
battle(battle.N1:s) royal(royal.N1:s),NC_NXN1
birth date(date.N1:s),NC_NN_NofN
gentleman(gentleman.N8:s) farmer(farmer.N1:s),NC_NXN
man eater(eater.N1:s),NC_XXN
man(man.N8:s) servant(servant.N1:s),NC_NXN
student(student.N1:s) union(union.N1:s),NC_NXN1s
cordon(cordon.N1:ms) bleu(bleu.A32:ms),NC_NXA
cousin(cousin.N32:ms) germain(germain.A32.N:ms),NC_NXAmf
mémoire(mémoire.N21:fs) vive(vif.A38:fs),NC_NXA
toile(toile.N21:fs) d’araignée(araignée.N21:fs),NC_NDN1
majster(majster.N1-er:Mos) klepka(klepka.N4-ka:Mfs),NC_NXN1
zimne(zimny.A-ny:Mfp) nogi(noga.N4-ga:Mfp),NC_AXNninv
radio-aparat(aparat.N1:ms1q),NC_2XN6
A PPENDIX : / 49
2.2 French
cordon bleu,cordon bleu.N:ms
cordons bleus,cordon bleu.N:mp
cousin germain,cousin germain.N:ms
cousins germains,cousin germain.N:mp
cousine germaine,cousin germain.N:fs
cousines germaines,cousin germain.N:fp
mémoire vive,mémoire vive.N:fs
mémoires vives,mémoire vive.N:fp
toile de araignée,toile de araignée.N:fs
50 / C OMPUTATIONAL I NFLECTION OF M ULTI -W ORD U NITS
2.3 Polish
majster klepka,majster klepka.N:Mos
majstra klepki,majster klepka.N:Dos
majstrowi klepce,majster klepka.N:Cos
majstra klepk˛e,majster klepka.N:Bos
majstrem klepka,majster
˛ klepka.N:Ios
majstrze klepce,majster klepka.N:Los
majstrze klepko,majster klepka.N:Vos
majstrzy klepki,majster klepka.N:Mop
majstrów klepek,majster klepka.N:Dop
majstrom klepkom,majster klepka.N:Cop
majstrów klepki,majster klepka.N:Bop
majstrami klepkami,majster klepka.N:Iop
majstrach klepkach,majster klepka.N:Lop
majstrzy klepki,majster klepka.N:Vop
zimne nogi,zimne nogi.N:Mfp
zimnych nóg,zimne nogi.N:Dfp
zimnym nogom,zimne nogi.N:Cfp
zimne nogi,zimne nogi.N:Bfp
zimnymi nogami,zimne nogi.N:Ifp
zimnych nogach,zimne nogi.N:Lfp
zimne nogi,zimne nogi.N:Vfp
2.4 Serbian
radio aparat,radio-aparat.N:s1qm
radio aparata,radio-aparat.N:s2qm
radio aparatu,radio-aparat.N:s3qm
radio aparat,radio-aparat.N:s4qm
radio aparate,radio-aparat.N:s5qm
radio aparatom,radio-aparat.N:s6qm
radio aparatu,radio-aparat.N:s7qm
radio aparati,radio-aparat.N:p1qm
radio aparata,radio-aparat.N:p2qm
radio aparatima,radio-aparat.N:p3qm
radio aparate,radio-aparat.N:p4qm
radio aparati,radio-aparat.N:p5qm
radio aparatima,radio-aparat.N:p6qm
radio aparatima,radio-aparat.N:p7qm
A PPENDIX : / 51
radio aparata,radio-aparat.N:w2qm
radio aparata,radio-aparat.N:w4qm
radio-aparat,radio-aparat.N:s1qm
radio-aparata,radio-aparat.N:s2qm
radio-aparatu,radio-aparat.N:s3qm
radio-aparat,radio-aparat.N:s4qm
radio-aparate,radio-aparat.N:s5qm
radio-aparatom,radio-aparat.N:s6qm
radio-aparatu,radio-aparat.N:s7qm
radio-aparati,radio-aparat.N:p1qm
radio-aparata,radio-aparat.N:p2qm
radio-aparatima,radio-aparat.N:p3qm
radio-aparate,radio-aparat.N:p4qm
radio-aparati,radio-aparat.N:p5qm
radio-aparatima,radio-aparat.N:p6qm
radio-aparatima,radio-aparat.N:p7qm
radio-aparata,radio-aparat.N:w2qm
radio-aparata,radio-aparat.N:w4qm
radioaparat,radio-aparat.N:s1qm
radioaparata,radio-aparat.N:s2qm
radioaparatu,radio-aparat.N:s3qm
radioaparat,radio-aparat.N:s4qm
radioaparate,radio-aparat.N:s5qm
radioaparatom,radio-aparat.N:s6qm
radioaparatu,radio-aparat.N:s7qm
radioaparati,radio-aparat.N:p1qm
radioaparata,radio-aparat.N:p2qm
radioaparatima,radio-aparat.N:p3qm
radioaparate,radio-aparat.N:p4qm
radioaparati,radio-aparat.N:p5qm
radioaparatima,radio-aparat.N:p6qm
radioaparatima,radio-aparat.N:p7qm
radioaparata,radio-aparat.N:w2qm
radioaparata,radio-aparat.N:w4qm
52 / C OMPUTATIONAL I NFLECTION OF M ULTI -W ORD U NITS
<$3>
<$1> <$2> <Nb=s>
<$3:Nb=p>
<$1:Nb=p> <$2>
<$3> <Nb=p>
FIGURE 5 Multiflex inflection graph NC_NXN1 for attorney general and battle royal
in English
FIGURE 6 Multiflex inflection graph NC_XXN for bas-relief and man eater in English
<$1:Nb=p> <Nb=$n>
’
<$5>
<$1:Gen==$g;Nb=$n> <$2> <$3> <$4>
<Gen=$g;Nb=$n> <$5:Nb=$n>
=
<$1> <$3:Nb=$n;Case=$c;Anim==$a;Gen==$g>
<$2> <Nb=$n;Case=$c;Anim=$a;Gen=$g>