0% found this document useful (0 votes)
77 views6 pages

Building A Wordnet For Arabic

This document discusses building an Arabic WordNet based on the Princeton WordNet for English. Key challenges include differences between Arabic and English like Arabic being a Semitic language written right-to-left with optional vowel diacritics that impact ambiguity. The project aims to develop a lexical resource with a formal semantic foundation linking to WordNet and SUMO ontology. Tools will include a lexicographer interface to facilitate constructing the Arabic WordNet.

Uploaded by

Gezish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views6 pages

Building A Wordnet For Arabic

This document discusses building an Arabic WordNet based on the Princeton WordNet for English. Key challenges include differences between Arabic and English like Arabic being a Semitic language written right-to-left with optional vowel diacritics that impact ambiguity. The project aims to develop a lexical resource with a formal semantic foundation linking to WordNet and SUMO ontology. Tools will include a lexicographer interface to facilitate constructing the Arabic WordNet.

Uploaded by

Gezish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Building a WordNet for Arabic

Sabri Elkateb, William Black Piek Vossen


The University of Manchester Irion Technologies
PO Box 88, Sackville St, Manchester, M60 1QD Irion Technologies, Delftechpark 26, 2628XH,
[email protected], [email protected] Delft, The Netherlands
Horacio Rodríguez [email protected]
Politechnical University of Catalonia Adam Pease
Jordi Girona, 1-3, 08034 Barcelona, SPAIN Articulate Software Inc, 278 Monroe Dr. #30
[email protected] Mountain View, CA 94040
Musa Alkhalifa [email protected]
University of Barcelona, Edifici Aribau, Christiane Fellbaum
5 Planta, Despatx 5.19, Gran via 585, Princeton University, Department of Psychology,
08007 Barcelona, SPAIN Green Hall, Princeton, NJ 08544
[email protected] [email protected]

Abstract on the lexical level to and from English and dozens of


This paper introduces a recently initiated project that other languages. The Suggested Upper Merged Ontology
focuses on building a lexical resource for Modern (SUMO) is being enlarged to provide a formal semantic
Standard Arabic based on the widely used Princeton foundation for AWN (Black et al. 2006). The AWN
WordNet for English (Fellbaum, 1998). Our aim is to database will be freely and publicly available.
develop a linguistic resource with a deep formal
semantic foundation in order to capture the richness of
Arabic as described in Elkateb (2005). Arabic WordNet Challenges
is being constructed following methods developed for Arabic is a Semitic language which differs from Indo-
EuroWordNet (Vossen, 1998). In addition to the European languages syntactically, morphologically and
standard wordnet representation of senses, word
semantically. The term ‘classical Arabic’ refers to the
meanings are also being defined with a machine
understandable semantics in first order logic. The basis standard form of the language used in all writing and
for this semantics is the Suggested Upper Merged heard on television, radio and in public speeches and
Ontology and its associated domain ontologies (Niles religious sermons. The writing system of Arabic has
and Pease, 2001). We will greatly extend the ontology twenty five consonants and three long vowels that are
and its set of mappings to provide formal terms and written from right to left and take different shapes
definitions for each synset. Tools to be developed as part according to their position in the word. In addition to the
of this effort include a lexicographer's interface modeled long vowels, Arabic has short vowels. Short vowels are
on that used for EuroWordNet, with added facilities for not part of the alphabet but rather are written as vowel
Arabic script, following Black and Elkateb's earlier work
diacritics above or under a consonant to give it its desired
(2004).
sound and hence give a word a desired meaning. Texts
without vowels are considered to be more appropriate by
Introduction the Arabic-speaking community since this is the usual
In recent years, a number of wordnet building efforts have form of everyday written and printed materials (books,
been initiated and carried out within a common magazines, newspapers, letters, etc.). But when it comes
framework for lexical representation and are becoming to the text of the Holy Koran, and more generally to
increasingly important resources for a wide range of printed collections of classical poetry, school books and
Natural Language Processing applications. “They can be some Arabic paper dictionaries, vowel diacritics appear in
used in meaning-based information retrieval (searching full. It is very usual for well-edited books, some printed
for concepts rather than specific word forms), in logical texts, and manuscripts to have vowel diacritics partially or
inference (if a document mentions dogs, a wordnet allows randomly written out in cases where words will be
the inference that it is about animals), in word sense ambiguous or difficult to read. For instance, a word in
disambiguation (providing the search space of alternative Arabic consisting of two letters like (‫)ﺑﺮ‬, i.e., ‘b’ and ‘r’,
meanings), etc.” (Dyvik, 2002). The success of the can be very ambiguous without vowel diacritics. Consider
Princeton WordNet (PWN) for English has motivated the examples in Table 1. Especially in such cases as these,
similar projects that aim at developing wordnets for other a writer may use diacritics so readers can easily resolve
languages. In this paper, we describe our methodology for any ambiguity. However, although most Arabs can read
building a wordnet for Modern Standard Arabic (MSA). texts with vowels explicitly indicated, fewer can write
This Arabic WordNet (AWN) is to be based on the design texts using the correct vowel diacritics.
and contents of the PWN and can be linked directly to
PWN 2.0 and EuroWordNet (EWN), enabling translation

29
according to the standard derivational patterns. It is also
possible to organize sets of Arabic words into distinct
semantic fields according to the root from which they are
derived. An example of such a field for the root drs, ‘to
study,’ is shown in Table 2. Arabic can also adapt loan
words from other languages to its system of derivational
morphology in order to make them sound and behave like
Arabic words as, for example, in the case of aksadah,
‘oxidation,’ which is patterned on fa’lalah (Elkateb,
2005).

Table 1: vowel diacritics on ‘b’ and ‘r’

For this reason it is a mistake to rely on users, regardless


of their background, to correctly enter a search word
requiring vowel diacritics. Yet misuse of a single diacritic,
such as the ‘suku:n’ which indicates that a consonant is
not followed by any vowel, or as the ‘shaddah’ (as in barr
in Table 1 and darrasa in Table 2), which indicates a
double consonant, will cause a query to fail. People also
tend to make mistakes about the position of some
diacritics in a word. This can pose a serious problem for
information retrieval systems and computerized lexical
resources which depend on well-formed user input and
may even result in users rejecting the system. In
particular, there may be an outright rejection of a robust
new lexical resource such as AWN unless that new
resource assumes that most of the Arabic speaking users Table 2: derivatives of root (d r s)
do not have expert command in writing vowel diacritics
and will generally ignore them. These users are more Numerous efforts have been devoted to the processing of
comfortable reading texts without diacritics in dealing Arabic morphology which outcome is apparent in several
with everyday written materials including legal and approaches and various technical morphological analysers
business contracts, newspapers, books as well as both and generators. Among other computational approaches to
paper and computerized dictionaries. The end result is that Arabic morphology, using techniques of Finite State
it is preferable to allow users to enter Arabic words Transducer (FST) and two-level morphology is Beesley
without diacritics while at the same time allowing the (1998, 2001) His system dealt with root, stem and pattern
retrieval of those words with vowel diacritics for the morphology using only two layers. One layer corresponds
purposes of disambiguation. to the root and is represented by the root lexicon and the
Another fact about Arabic to take into consideration is that other to the morphological measure including vowel
the language has neither capital letters (for proper names: pattern.
the names of people, countries, cities, geographical However, in order to produce a system on the basis of
features, of months, days of the week, etc.) nor acronyms. morphological analysis and generation that is
This creates increased ambiguity and especially linguistically and computationally efficient; the following
complicates such tasks as Information Extraction in factors have to be taken into consideration:
general and Named Entity Recognition in particular.
1. A word pattern usually combines with a vast number
An additional property of Arabic that should be kept in of roots. Roots and patterns are intersected at compile
mind is that Arabic is a highly derivational and time to yield 90,000 stems. Various combination of
inflectional language and its vocabulary can be easily prefixes and suffixes, concatenated to the stems, yield
expanded using a framework that is latent in the creative over 72,000,000 abstract words.
use of roots and morphological patterns. According to Al- 2. The existence of one morphological form depends on
Fedaghi and Al-Anzi (1989), cited in De Roeck and Al- the existence of other forms comprised of the same
Fares (2000), “85% of words derived from tri-literal morphological unit.
roots” and there are around 10.000 independent roots. 3. There are cases where a single form has more than
Because of this, it is possible to build any necessary one morphological function as illustrated in Table 1
semantic relation among words of different syntactic above.
categories. That is to say, most Arabic words are created 4. A word is generated by the combination of a root
by applying distinct derivational patterns to some root, encoded manually and a diacritized pattern each of
relating the two not only in form and meaning but which has to be hand coded to indicate the subset of
determining their syntactic category as well. New Arabic patterns with which a root can combine.
words can always be coined from an existing root

30
5. A root can be extracted by removing the affixes to The basic distinction between what Pustejovsky, (1995)
identify the base form of the diacritized word and to termed contrastive ambiguity and complementary
apply it to a morphological measure or a pattern. In polysemy should involve different solutions for the
this case both word and pattern must be entered representation of lexical knowledge. Contrastive
manually. ambiguity, as manifested by words such as bank (financial
6. Some techniques are designed not to take any Arabic institution or river side) is handled by multiple
text as an input directly, but to transliterate the Arabic representations for the clarity of senses. However it is
system into ASCII to be fed to the system. The results claimed that this type does not form a significant problem
must be transliterated back to Arabic to be in the language since contrastive ambiguity between two
understood. This technique was introduced by unrelated senses of a word tends to be a historically
Buckwalter (2002) and can be said to have achieved accidental and idiosyncratic property of individual words.
considerable results in Arabic morphological analysis, Hence “we don’t expect to find instances of the same
yet it is unable to adequately deal with ambiguous contrastive ambiguity replicated by other words in the
forms but can only provide full listing of all the language or by words in other languages” (Dyvik, 2003).
possible readings of the ambiguous form. Complementary polysemy occurs in cases where a single
word has multiple senses which are related to one another
There seems to be no agreement on the nearest way to in some predictable way. It is claimed that ambiguity can
adequate morphological analysis/generation and there is result from senses which are manifestations of the same
yet no proper means for generating or analyzing the basic meaning of the word depending on the context it
Arabic roots due to the complexity of the weak vowels occurs in. The manner in which senses are related in
governing a vast amount of the vocabulary. It seems also complementary polysemy is the factor that distinguishes it
that there is no role for morphological generation in from contrastive ambiguity where senses have no
suggesting words, because for much of the vocabulary, the contextual relation. Accordingly, a word like ‘door’ has
rate at which these would prove to be actual words would two related senses being (physical object or aperture). So,
be too low unless at least three quarters of the process are knocking on the ‘door’ (physical object) is different from
done manually (Elkateb, 2005). As far as dictionaries are going through the same ‘door’ (aperture). Let us first
concerned, a multilingual resource generally includes examine the senses of the Arabic word ‘bab’ for ‘door’ in
equivalence and translation relations and should tackle order to figure out how words behave in different
issues like language specific and untranslatable material. languages and how sense extensions vary from one
Translation is not merely an act of linguistic transfer, but language to another:
it also involves the interaction of cultures and that
transference of culture imposes far greater problems than bab (door/chapter)
linguistic transfer. Translation of words of cultural content
may involve solving problems like the unavailability of --sense1 = physical object, e.g. I painted the front
equivalents or tackling untranslatable items and door.
consequently filling the gaps that may exist among
--sense2 = aperture e.g. Adam went through the
languages. Consider the Arabic words in Table 3
door.
--sense3 = written communication (book chapter),
zaka:t annual compulsory alms (2.5 %) of the
“opening of a piece of text” e.g. I started a new
savings of a Muslim when any amount or chapter of my thesis.
property exceeds one year in possession.
suhu:r a light meal before starting a new fasting The first two senses are more closely related than the
day of Ramadan (before daybreak). third. The third sense in Arabic refers to opening/entering
hija:b an Islamic veil which is worn by women to (or going through writing/reading) a written text. This
cover the hair and the neck. sense might be extended from the notion of ‘opening’ as
mu’akhar money/property stipulated upon in the in ‘open the book’ or ‘open a new chapter’ compared to
Sada:q marriage contract which is due to be paid ‘open the door’. Therefore, it can be said to be an instance
by the husband to his wife in case he of complementary polysemy not contrastive ambiguity
intends to divorce her. because of the shared collocates with the verb to open.
It is claimed that complementary polysemy poses a
Table 3: lexical gaps serious problem not only in one language but also would
normally be projected into other languages. The English
word ‘lamb’, for example, is said to denote two different
Lexical Ambiguity senses: a count noun animal and a mass noun meat
A lexical item may carry two distinct and unrelated whereas in Arabic the word ‘hamal’ (lamb) and its
meanings, i.e. homonymy. A homonym can be defined as synonyms ‘kharu:f’ (lamb/sheep) refer only to the count
a word with no relationship between its senses, as in the noun ‘animal’. It seems that it is only accidentally, in
word bank where the first sense refers to a river side and English, that this noun is classified as polysymous because
the second to a financial institution. Ambiguity and it refers to both animal and meat. This may be because it
polysemy of nominal forms represent an important is linked with small masses like ‘chicken, eggs, snails’
concern which affects the organization of word meaning. where complementary polysemy is less frequent. More

31
interestingly, the polysemy in the case of lamb is only a. These are the voices of the electors.
temporary and will disappear as the lamb gets old and b. These are the votes of the electors.
becomes a sheep. The second sense for ‘lamb’ as mass
noun ‘meat’ can only appear in Arabic if the word lamb Ambiguity varies between two languages when one
occurs in a compound as in ‘lahm kharu:f’ (sheep meat/ borrows a word from the other. In this case, polysemy
mutton) where the complementary polysemy is completely projects into the borrowing language from the source
absent. However, Arabic and English interpret other language but not the opposite. The term ‘alqaida’
masses the same way whether large or small, like ‘fish’, borrowed from Arabic to refer to a group of extremists in
‘chicken’, ‘eggs’, ‘potatoes’ etc., where complementary Afghanistan known by this name and classified as a
polysemy may occur equally in both languages: terrorist organization. This proper name of this entity is
derived from the meaning of ‘the base’. Since proper
names are not translated, as illustrated in example 7
1. I did not like the fish we had for lunch.
below, the polysemy in this case occurs only in Arabic but
2. I went to see the dead fish at lunch time.
not in English. In other words, the sentence ‘The
Americans attacked Alqaida’ carries one sense in English
There are cases in Arabic where a word may carry
whereas in Arabic is interpreted as having two senses:
multiple but related senses as in the noun ‘sawt/aswat’
where it can be classified as complementary polysemy
according to its interpretation in Arabic: 7. alamrica:n yuha:jimu:n alqaida.
a. The Americans attacked Alqaida.
(terrorist group based in Afghanistan)
sawt / aswat
b. The Americans attacked the base. (a
--sense1 = vote: an indication of a choice or opinion that is military base)
made by voting
--sense2 = voice: sound produced by speaking or singing. No one would argue about the importance of a semantic
lexicon to handle such different and/or related senses of
The common morphological derivation of a pair of nouns words and concepts. However, there should be an
in Arabic provides evidence for their relatedness as agreement on how to represent lexical data to be easily
polysemes. The Arabic word ‘sawt’ (vote) and ‘swat’ manipulated by computers in order to encode any
(voice) are apparently derived from the same semantic relations between senses and to carry out various
unaugmented triliteral root ‘s w t’ (sound). In addition, the applications of a conceptual lexicon such as word sense
‘indication’ of vote in sense1 refers to verbal consent disambiguation (WSD), lexical chains etc.
‘speaking’ in sense2.
Lexicography
3. hada fariq ?add al aswat (This is a vote
Following EuroWordNet, AWN is developed in two
counting team).
phases by first building a core wordnet around the most
4. hada fariq tasji:l al aswat (This is a voice
important concepts, the so-called Base Concepts (Vossen
recording team).
1998), and secondly extending the core wordnet
downward to more specific concepts using additional
The two senses in 3 and 4 can be classified as
criteria. The core wordnet should thus become highly
complementary polysemy rather than contrastive senses
compatible with wordnets in other languages that are
i.e., to ‘vote’ is to primarily ‘say’ who or what you are in
developed according to the same approach.
favour of. Example 4 above also shows that the word
For the core wordnet, The Common Base Concepts
‘aswat’ denotes two senses: ‘votes’ and ‘voices’ as
(CBCs) of the 12 languages in EWN and BalkaNet (Tufis,
unrelated to one another when modified by ‘tasji:l’
2004) are being encoded as synsets in AWN; other Arabic
(recording) which denotes the recording of voice as well
language-specific concepts are added and translated
as writing down (in a record) the names of the voters
manually to the closest synset. The same procedure is
(votes). Therefore example 4 can be interpreted as having
performed for all English synsets that currently have an
these two contrastive senses in 5:
equivalence relation in the SUMO ontology. Synset
encoding proceeds bi-directionally: given an English
5. hada fariq tasji:l al aswat:
synset, all corresponding Arabic variants (if any) will be
a. This is a voice recording team. (audio
selected; given an Arabic word, all its senses are
recording)
determined and for each of them the corresponding
b. This is a vote recording team. (writing)
English synset is encoded.
The Arabic synsets will be extended with hypernym
This word gets even more ambiguous in its proper context
relations to form a closed semantic hierarchy. SUMO will
than on its own or in a lexicon as in example 6:
be used to maximize the semantic consistency of the
hyponymy links. This will represent the core wordnet,
6. hadihi aswat alnakhibi:n.
which is a semantic basic for the further extension. The
work is mostly done manually.
The word ‘aswat’ in this context refers to two
When a new Arabic verb is added, extensions are made
different senses:
from verbal entries, including verbal derivates,

32
nominalizations, verbal nouns, and so on. We also entities, including synsets, ontology classes and instances.
consider the most productive forms of deriving broken An item has a unique identifier and descriptive
plurals. This is done by applying lexical and information such as a gloss. Items lexicalized in different
morphological rules iteratively. languages are distinct. A word entity is a word sense,
The database is further extended downward from the where the word's citation form is associated with an item
CBCs. First, a layer of hyponyms is chosen based on via its identifier. A form is an entity that contains lexical
maximal connectivity, relevance, and generality. Two information (not merely inflectional variation). The forms
major pre-processing steps are required, preparation and are the root and/or the broken plural form, where
extension. Preparation entails compiling lexical and applicable. A link relates two items, and has a type such as
morphological rules and processing available bilingual "equivalence," "subsuming," etc. Links interconnect sense
resources from which we construct a homogeneous items, e.g., a PWN synset to an AWN synset, a synset to a
bilingual dictionary containing information on the SUMO concept, etc. This data model has been specified in
Arabic/English word pair. This information includes the XML as an interchange format, but is also implemented in
Arabic root, the POS, the relative frequencies and the a MySQL database hosted by one of the partners.
sources supporting the pairing. The Arabic words in these
bilingual resources must also be normalized and
lemmatized while maintaining vowels and diacritics. Ontology
We next apply 17 heuristic procedures, previously used A large ontology providing the semantic underpinning for
for EWN, to the bilingual dictionary in order to derive AWN concepts will be built on SUMO, a formal ontology
candidate Arabic words/English synsets mappings. Each of about 1000 terms and 4000 definitional statements
mapping includes the Arabic word and root, the English currently that is provided in a first order logic language
synset, the POS, the relative frequencies, a mapping score, called Standard Upper Ontology Knowledge Interchange
the absolute depth in AWN, the number of gaps between format (SUO-KIF) and also translated into OWL
the synset and the top of the AWN hierarchy, and attested semantic web language. SUMO has natural language
tokens of the pair. The Arabic word/English synset pairs generation templates and a multi-lingual lexicon that
constitute the input to a manual validation process. We allows statements in SUO-KIF and SUMO to be
proceed by chunks of related units (sets of related WN expressed in multiple languages. Synsets map to a general
synsets, e.g. hyponymy chains and sets of related Arabic SUMO term or a term that is directly equivalent to the
words, i.e., words having the same root) instead of given synset (Figure 1).
individual units (i.e., synsets, senses, words).
Finally, AWN will be completed by filling in the gaps in
its structure, covering specific domains, adding
terminology and named entities, etc. Each synset
construction step is followed by a validation phase, where
formal consistency is checked and the coverage is
evaluated in terms of frequency of occurrence and domain
distribution. The total coverage of AWN will be around
10,000 synsets.

Tools
Tools to be developed for AWN include a lexicographer's
interface modeled on the EWN interface with added
facilities for Arabic script. Because AWN is to be aligned
not just to PWN but to every wordnet aligned to PWN –
either directly or indirectly through an Interlingual Index Figure 1: SUMO mapping to wordnets
or the ontology – the database design supports multiple
languages. The user interface will be explicitly New formal terms will be defined to cover a greater
multilingual and indifferent to the direction of alignment number of equivalence mappings, and the definitions of
between the conceptual structures of the two languages. In the new terms will in turn depend upon existing
addition to search and browsing facilities for the end users fundamental concepts in SUMO. The process of
of the completed database, lexicographers require an formalizing definitions will generate feedback as to
editing interface. A variety of legacy components are whether word senses in AWN need to be divided or
available, each with their relative advantages. The editor's combined and how glosses may be clarified. Wordnets in
interface will communicate with the database server using other languages linked by synset number will benefit, too.
Simple Object Access Protocol (SOAP), allowing multiple The Sigma ontology development environment will be
lexicographers at different sites to maintain a common updated to handle a similar presentation of Unicode-based
database. character sets, including Arabic.
The Interlingual Index (ILI) connecting EWN wordnets is
Database a condensed set of more or less universal concepts linking
The database structure comprises four principal entity synsets across languages via multiple exhaustive
types: item, word, form and link. Items are conceptual equivalence relations. In EuroWordNet and BalkaNet,

33
English PWN has been used to express equivalence of the Third International WordNet Conference, Sojka,
relations across the different languages. By providing Choi, Fellbaum and Vossen eds.
many SUMO definitions and terms that correspond to Black, W. J., and Elkateb, S. (2004) A Prototype English-
Arabic synsets, we will create the opportunity to use Arabic Dictionary Based on WordNet, Proceedings of
SUMO as the ILI for all wordnets that are currently 2nd Global WordNet Conference, GWC2004, Czech
related to PWN. This is illustrated in Figure 2. If the Republic, 67-74.
Arabic word sense for shai is exhaustively defined by
Buckwalter, T. (2002) Arabic Morphological Analysis,
relations to SUMO terms, this definition can replace an
equivalence relation (er1) that is currently encoded Http://www.qamus.org/morphology.htm
between the Arabic synset shai and a synset tea in PWN. De Roeck, A., and Al-Fares, W. (2000) A
Note that the relations from shai to the SUMO terms need Morphologically Sensitive Clustering Algorithm for
to be exhaustive, which may require multiple relations of Identifying Arabic Roots Proceedings of the 38th
different types (sr1 (subsumption), r2, r3) to multiple Annual Meeting of the ACL, Hong Kong, 199-206
SUMO terms. Dyvik, H. (2003) Translations as a semantic knowledge
source: word alignment and wordnet, Section for
Arabic Sumo English Dutch Spanish
wordnet wordnet wordnet wordnet
Linguistic Studies scientific papers, University of
Bergen
sr1 sr1
Beverage thee té Dyvik, H. (2002) Translations as Semantic Mirrors: From
shai
r2
Tea leaves tea
er1 er1
Parallel Corpus to Wordnet1. Section for Linguistic
r3 Studies scientific papers, University of Bergen
Hot water
er1
Elkateb, S and Black, W. J. (2001) Towards the Design of
English-Arabic Terminological Knowledge Base,
Figure 2: SUMO and ILI Proceedings of ACL 2000, Toulouse, France:113-118
Elkateb, S and Black, W. J. (2004) A Bilingual Dictionary
If there are also equivalence relations from other with Enriched Lexical Information, Proceedings of
languages (e.g. Dutch and Spanish) to the same PWN NEMLAR Cairo, Egypt 2004 Arabic Language Tools
synset, then these relations grant the linkage of the synsets and Resources: 79-84
in these languages to the same SUMO definition. Elkateb, S. (2005) Design and implementation of an
English Arabic dictionary/editor. PhD thesis, The
Besides providing a formal semantic framework, SUMO University of Manchester, United Kingdom.
can thus also be used to map synsets across languages, in
Farreres, J. (2005) Creation of wide-coverage domain-
fact even when there is not an equivalent in English. By
independent ontologies. PhD thesis, Univertitat
composing formal definitions for the non-English synsets,
Politècnicade Catalunya.
SUMO as an ILI will not only be less biased by English
Fellbaum, C., (1998, ed.) WordNet: An Electronic Lexical
but also has more expressive power.
Database. Cambridge, MA: MIT Press.
Niles, I., and Pease, A. (2001) Towards a Standard Upper
Conclusion Ontology. In: Proceedings of FOIS 2001, Ogunquit,
Maine, pp. 2-9.
Constructing AWN presents challenges not encountered
by established wordnets. These include the script on the Pease, A., (2000) Standard Upper Ontology Knowledge
one hand and the morphological properties of Semitic Interchange Format. Web document
languages, centered around roots, on the other hand. The http://suo.ieee.org/suo-kif.html.
foundations for meeting these challenges have been laid. Pease, A., (2003) The Sigma Ontology Development
An innovation with significant consequences for wordnet Environment, in Working Notes of the IJCAI-2003
development is the proposal to substitute English WN as Workshop on Ontology and Distributed Systems.
the ILI with SUMO. Volume 71 of CEUR Workshop Proceeding series
Pustejovsky, J. (1995) The Generative Lexicon,
Massachusetts Institute of Technology.
Acknowledgements
Tufis, D. (ed.) (2004) Special Issue on the BalkaNet
This work was supported by the United States Central project. Romanian Journal of Information Science and
Intelligence Agency. Technology, Vol.7, nos 1-2
Vossen, P. (ed.) (1998) EuroWordNet: A Multilingual
Database with Lexical Semantic Networks. Dordrecht:
References Kluwer Academic Publishers.
Beesley, K. (2001) Finite-State Morphological Analysis Vossen P. (2004) EuroWordNet: a multilingual database
and Generation of Arabic at Xerox, ACL/EACL 2001, of autonomous and language-specific wordnets
July 6th, Toulouse, France : 1-8 connected via an Inter-Lingual-Index. International
Black, W., Elkateb, S., Rodriguez, H, Alkhalifa, M., Journal of Lexicography, Vol.17 No. 2, OUP, 161-173
Vossen, P., Pease, A. and Fellbaum, C., (2006).
Introducing the Arabic WordNet Project, in Proceedings

34

You might also like