0% found this document useful (0 votes)
39 views10 pages

CRPITV74 Yang

Uploaded by

nilesh sonune
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views10 pages

CRPITV74 Yang

Uploaded by

nilesh sonune
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Automatic Thesaurus Construction

Dongqiang Yang | David M. Powers


School of Informatics and Engineering
Flinders University of South Australia
PO Box 2100, Adelaide 5001, South Australia
Dongqiang.Yang|[email protected]

Abstract1 grammatical relations (Curran, 2003; Weeds, 2003).


Moreover, the framework conducted by Padó and Lapata
In this paper we introduce a novel method of automating (2007) compared the difference between the two settings.
thesauri using syntactically constrained distributional They observed that the syntactically constrained VSM
similarity. With respect to syntactically conditioned co- outperformed the unconditioned one that exclusively
occurrences, most popular approaches to automatic counts word co-occurrences in a ±n window.
thesaurus construction simply ignore the salience of
grammatical relations and effectively merge them into Given the hypothesis that similar words share similar
one united ‘context’. We distinguish semantic differences grammatical relationships and semantic contents, the
of each syntactic dependency and propose to generate basic procedure for estimating such distributional
thesauri through word overlapping across major types of similarity can consist of (1) pre-processing sentences in
grammatical relations. The encouraging results show that the corpora with shallow or complete parsing; (2)
our proposal can build automatic thesauri with extracting syntactic dependencies into distinctive subsets
significantly higher precision than the traditional or vector spaces (Xs) according to head-modifier,
methods. including adjective-noun (AN) and adverb or the nominal
head in a prepositional phrase to verb (RV) and
Keywords: syntactic dependency, distribution, similarity. grammatical roles including subject-verb (SV) and verb-
object (VO); and (3) determining distributional similarity
1 Introduction using similarity measures such as the Jaccard coefficient
and the cosine, or probabilistic measures such as KL
The usual way of automatic thesaurus construction is to
divergence and information radius. On the other hand,
extract the top n words in the similar word list of each
without the premise of grammatical relations in sematic
seed word as its thesaurus entries, after calculating and
regulation, calculating distributional similarity can simply
ranking distributional similarity between the seed word
work on word co-occurrences.
and all of the other words occurring in the corpora. The
attractive aspect of automatically constructing or Instead of arguing the pros and cons of these two context
extending lexical resources rests clearly on its time representations in specific applications, we focus on how
efficiency and effectiveness in contrast to the time- to effectively and efficiently produce automatic thesauri
consuming and outdated publication of manually with syntactically conditioned co-occurrences.
compiled lexicons. Its application mainly includes
Without distinguishing the latent differences of
constructing domain-oriented thesauri for automatic
grammatical relations in dominating word meanings in
keyword indexing and document classification in
context, most approaches simply chained or clumped
Information Retrieval, Question Answering, Word Sense
these syntactic dependencies into one unified context
Disambiguation, and Word Sense Induction.
representation for computing distributional similarity
As the ground of automatic thesaurus construction, such as in automatic thesaurus construction (Hirschman
distributional similarity is often calculated in the high- et al., 1975; Hindle, 1990; Grefenstette, 1992; Lin, 1998;
dimensional vector space model (VSM). With respect to Curran, 2003), along with in Word Sense Disambiguation
the basic elements in VSM (Lowe, 2001), the (Yarowsky, 1993; Lin, 1997; Resnik, 1997), word sense
dimensionality of word space can be syntactically induction (Pantel and Lin, 2002), and finding the
conditioned (i.e. grammatical relations) or unconditioned predominant sense (McCarthy et al., 2004). These
(i.e. ‘a bag of words’). Under these two context settings, approaches improved the distributional representation of
different similarity methods have been widely surveyed, a word through a fine-grained context that can filter out
for example for ‘a bag of words’ (Sahlgren, 2006) and for the unrelated or unnecessary words produced in the
traditional way of ‘a bag of words’ or the unordered
1
context, given that the parsing errors introduced are
Copyright (c) 2008, Australian Computer Society, Inc. This acceptable or negligible.
paper appeared at the Thirty-First Australasian Computer
Science Conference (ACSC2008), Wollongong, Australia. It is clear that these approaches, based on observed
Conferences in Research and Practice in Information events, often scaled each grammatical relation through its
Technology (CRPIT), Vol. 74. Gillian Dobbie and Bernard frequency statistics in computing distributional similarity,
Mans, Ed. Reproduction for academic, not-for profit purposes for example in the weighted (Grefenstette, 1992) or
permitted provided this text is included. mutual information based (Lin, 1998) Jaccard coefficient.
Although they proposed to replace the unordered context  Nouns: S where i is one of AN, SV, and VO
U
with the syntactically conditioned one, they have i

overlooked the linguistic specificity of grammatical


 Verbs: S U where i is one of RV, SV, and VO
relations in word distribution. Except for the extraction of S

i
syntactically conditioned contexts, they in fact make no
differentiation between them, which are similar to
computing distributional similarity with unordered
3 Syntactically constrained distributional
context. The advantage of using the syntactic constrained similarity
context has not yet been fully exploited when yielding To automate thesauri, we first employed an English
statistical semantics from word distributions. syntactic parser based on Link Grammar to construct a
To fully harvest the advantages of computing syntactically constrained VSM. The word space consists
distributional similarity in the syntactically constrained of four major syntactic dependency sets that are widely
contexts, we proposed to first categorize contexts in terms adopted in the current research on distributional
of grammatical relations, and then overlapped the top n similarity. Following the reduction of dimensionality on
similar words yielded in each context to generate the dependency sets, we created the latent semantic
automatic thesauri. This is in contrast to averaging representation of words through which distributional
distributional similarity across these contexts, which is similarity can be measured so that thesaurus items can be
commonly adopted in the literature. retrieved.

2 Context interchangeability of similar words 3.1 Syntactic dependency


Word meaning can be regarded as a function of word The syntactically conditioned representation mainly rely
distribution within different contexts in the form of co- on the following grounds: (1) the meaning of a noun
occurrent frequencies, where similar words share similar depends on its modifiers such as adjectives, nouns, and
contexts (Harris, 1985). Miller and Charles (1991) the nominal head in a prepositional phrase as well as the
propose that word similarity depends on to what extent grammatical role of a noun in a sentence as a subject or
they are interchangeable across different context settings. object (Hirschman et al., 1975; Hindle, 1990); and (2) the
The flexibility of one word or phrase substituting another meaning of a verb depends on its direct object, subject, or
indicates its extent to be synonymous providing that the modifier such as the head of a prepositional phrase
alternation of meaning in discourse is acceptable. We (Hirschman et al., 1975). These results are partly
calculated distributional similarity in different syntactic consistent with the findings in studying word association
dependencies such as subject-predicate and predicate- and the psychological reality of the paradigmatic
object. Given the interchangeability of synonyms or near- relationships of WordNet (Fellbaum, 1998).
synonyms in different contexts, semantically similar With the hypothesis of ‘one sense per collocation’ in
words derived with distributional similarity should span WSD, Yarowsky (1993) observed that the direct object of
at least two types of syntactically constrained contexts. In a verb played a more dominant role than its subject,
other words, once we can derive the thesaurus items from whereas a noun acquired more credits for disambiguation
each dependency set, the final thesaurus comprises the from its nominal or adjective modifiers. As an application
intersection of the items across any two types of of the distributional features of words, Resnik (1997) and
dependency sets. Lin (1997) employed the selectional restraints in subject-
The heuristic of deriving automatic thesauri with the verb, verb-object, head-modifier and the like to conduct
interchangeability of synonyms or near-synonyms in sense disambiguation.
contexts (‘any two’) can be expressed: The syntactic dependencies can provide a clue for
tracking down the meaning of a word in context. Cruse
 Nouns: U (Si I S j ) where i and j stand for any two types
i, j
(1986) points out that the semantic requirements are of
of dependency sets in terms of grammatical relations: two directions in head-modifier and head-complement,
AN, SV, and VO. namely, determination (selector and selectee) and
 Verbs: U (Si I S j ) where i and j stand for any two of dependency (dependee and depender). The determination
i, j
requirement emphasizes the dominant role of the selector
RV, SV, and VO. in the semantic traits of a construction, while the
where for a given word, S is the thesaurus items produced dependency supplements some additional traits to
through distributional similarity in a single dependency formulate the integrity of the construction.
set. Note that we also used the heuristics of ‘any three’
and ‘any four’ to construct automatic thesauri, but found 3.2 Categorizing syntactic dependencies
most target words had no distributionally similar words Suppose that a tuple <wi, r, wj> describes the words: wi
under these stricter conditions than ‘any two’. We did not and wj, and their bi-directional dependency relation r. For
attempt to demonstrate the conditions here. example, if wi modifies wj through r, all such wj with r to
We similarly hypothesized the union of all grammatical wi form a context profile for wi, likewise wi for wj. In the
relations from the co-occurrence matrices as a baseline hierarchy of syntactic dependencies (Carroll et al., 1998),
(‘all’), which compute distributional similarity with the the major types of grammatical relationships (r) can be
union of all relations and can be indicated: generally clustered into:
 RV: verbs with all verb-modifying adverbs and the (Grefenstette, 1992; Curran, 2003), or a full parser of
head nouns in the prepositional phrases; MINIPAR (Lin, 1998) but we retrieve them instead
through the Link Grammar parser.
 AN: nouns with noun-modifiers including adjective
use and pre/post-modification; Consider, for example, a short sentence from British
National Corpus (BNC):
 SV: grammatical subjects and their predicates;
‘Home care Coordinator, Margaret Gillies,
 VO: predicates and their objects.
currently has a team of 20 volunteers from a
To capture these dependencies we employ a widely used variety of churches providing practical help to a
and freely available parser2 based on Link Grammar number of clients already referred.’
(Sleator and Temperley, 1991). In Link Grammar each
The parse of this sentence with the lowest cost in the link
word is equipped with ‘left-pointing’ and/or ‘right-
grammar parser is shown in Figure 1, where LEFT-
pointing’ connectors. Based on the crafted rules of the
WALL indicates the start of the sentence
connectors in validating word usages, a link between two
words can be formed in reflecting a dependency relation.
Apart from these word rules, ‘crossing-links’ and +----------------------------------------------------------------------
‘connectivity’ are the two global rules working on | +-------------------Ss------------------+
| +---------MX---------+ +-----
interlinks, which respectively restrict a link from starting +-----------Wd-----------+ +------Xd-----+ +---Os
or ending in the middle of pre-existed links and force all |
|
+--AN--+---GN---+
| | |
|
|
+---G---+-Xc-+
| | |
+---E---+ +-
| | |
the words of a sentence to be traced along links. There are LEFT-WALL Home.n care.n Coordinator , Margaret Gillies , currently has.v a
in total 107 major link types in the Link Grammar parser -------------------------------Xp------------------------------------
(ver. 4.1), whereas there are also various sub-link types ----------MVp---------------+ +------
---+ +----Jp----+ +---Jp---+ +------
that specify special cases of dependencies. Using this Ds-+-Mp-+ +--Dmcn-+ | +-Dsu-+--Mp--+--Jp--+----Mg----+
parser, we extracted and classified the following link | | | | | | | | | |
team.n of 20 volunteers.n from a variety.n of churches.n providing.v
types into the four main types of dependencies:
-----------------------------------------------------------------+
 RV |
------MVp-----------+ |
----Os---------+ +---J---+ +--------Mv--------+ |
1. E: verbs and their adverb pre-modifiers +----A----+ | +--Ds-+--Mp-+--Jp-+ +----E----+ |
| | | | | | | | | |
2. EE: adverbs and their adverb pre-modifiers practical.a help.n to a number.n of clients.n already referred.v .

3. MV: verbs and their post-modifiers such as adverbs,


prepositional phrase Figure 1: A complete linkage of parsing a sentence
using Link Grammar
 AN
The parse of this sentence with the lowest cost in the link
1. A: nouns and their adjective pre-modifiers
grammar parser is shown in Figure 1, where LEFT-
2. AN: nouns and their noun pre-modifiers WALL indicates the start of the sentence. We can classify
four types of grammatical relations from this parse,
3. GN: proper nouns and their common nouns namely:
4. M: nouns and their various post-modifiers such as  RV: <currently, E, has>, <already, E, referred>
prepositional phrases, adjectives, and participles
 AN: <home, AN, care>, <care, GN, coordinator>,
 SV <volunteer, Mp, team>, <church, Mp, variety>,
1. S: subject-nouns/gerunds and their finite verbs. There <practical, A, help>, <client, Mp, number>,
are also some sub-link types under S, for example, <referred, Mv, clients>
Ss*g stands for gerunds and their predicates, and Sp  SV: <coordinator, Ss, has>
plural nouns and their plural verbs
 VO: <has, Os, team>, <providing, Os, help>
2. SI: the inversion of subjects and their verbs in
questions After parsing the 100 million-word BNC and filtering out
non-content words and morphology analysis, we
 VO separately extracted the relationships to construct four
1. O: verbs and their direct or indirect objects parallel matrixes or co-occurrence sets, denoted as RX:
RVX, ANX, SVX, and VOX in terms of the four types of
2. OD: verbs and their distance-complement syntactic dependencies above. The row vectors of RX
3. OT: verbs and their time objects denoted respectively RvX, AnX, SvX, and VoX for the four
dependencies. Similarly, the column vectors of RX are
4. P: verbs and their complements such as adjectives denoted as rVX, aNX, sVX, and vOX respectively.
and passive participles
Consider SVX a m by n matrix representing subject-verb
Note that except for RV, we define the AN, SV, and VO dependencies between m subjects and n verbs. We
dependencies almost identically to shallow parsers illustrate the SV relation using the rows (SvX or {Xi,*}) of
SVX corresponding to nouns conditioned as subjects of
2
http://www.link.cs.cmu.edu/link/
verbs in sentences, and the columns (sVX or {X*,j}) to
verbs conditioned by nouns as subjects. The cell Xi,j meaningful grammatical relationships between words
shows the frequency of the ith subject with the jth verb. providing the parser is reasonable accurate.
The ith row Xi,* of SVX is a profile of the ith subject in
We initially substituted each cell frequency freq(Xi,j) with
terms of its all verbs and the jth column X*,j of SVX
its information form using log(freq(Xi,j)+1) to retain
profiles the jth verb versus its subjects.
sparsity (00) (Landauer and Dumais, 1997). It can
The parsing results are shown in Table 1, where Dim produce ‘a kind of space effect’ that can lessen the
refer to the size of each matrix in the form of rows by gradient of the frequency-rank curve in Zipf’s Law
columns, and Freq segmentations are the classification of (1965), reducing the gap between rarer events and
frequency distribution, and Token/Type stands for the frequent ones.
statistical frequencies of specific relationships with their
Singular Value Decomposition (SVD) often acts as an
corresponding dependency category R.
effective way of reducing the dimensionality of word
space in natural language processing. A reduced SVD
Dim Freq 1 2-10 11-20 21-30 >31
representation can diminish both ‘noise’ and redundancy
whilst retaining the useful information that has the
ANX 48.5 by Token 1,813.7 6,243.4 1,483.1 799.8 3,617.8
maximum variance. This approach has been dubbed
37.6 Type 1,813.7 2,040.0 103.6 32.2 44.9 Latent Semantic Analysis (LSA) (Deerwester et al., 1990;
RVX 37.4 by Token 863.1 2,276.4 481.4 234.9 692.2 Landauer and Dumais, 1997) and maps the word-by-
document space into word-by-concept and document-by-
14.2 Type 863.1 751.9 33.8 9.5 10.9 concept spaces. Note that the ‘noisy’ data in the raw co-
SVX 32.7 by Token 511.8 1,699.4 297.8 133.3 380.7 occurrence matrices mainly comes from the results of
11.3 Type 511.8 587.4 21.0 5.4 6.0
wrong parsing and also redundancy exists as a common
problem of expressing similar concepts in synonyms.
VOX 6.1 by Token 488.5 1,811.5 475.4 266.2 1,286.9
Typically at least 200 principal components are employed
33.3 Type 488.5 575.1 33.1 10.7 15.6
in Information Retrieval to describe the SVD compressed
word space. Instead of optimising the semantic space
Table 1: The statistics of the syntactically conditioned versus other algorithms (through tuning the number of
matrices derived from parsing BNC (thousand) principal components in applications or evaluations), we
specified a fixed dimension size for the compressed
Given different methodologies to implementing parsing, semantic space, which is thus not expected to be optimal
it is hardly fair to appraise a syntactic parser. Molla and for our experiment. We established 250 as a fixed size of
Hutchinson (2003) compared the Link Grammar parser the compressed semantic space. Among the singular
and the Conexor Functional Dependency Grammar values, the first 20 components account for around 50%
(CFDG) parser with respect to intrinsic and extrinsic of the variance, and the first 250 components for over
evaluations. In the intrinsic evaluation the performance of 75%.
the two parsers was compared and measured in terms of
the precision and recall of extracting four types of As is usual with the SVD/LSA application, we assume
dependencies, including subject-verb, verb-object, head- that the semantic representation of words is a linear
modifier, and head-complement. In the extrinsic combination of eigenvectors representing their distinct
evaluation a question-answering application was used to subcategorizations and senses, and that relating the
contrast the two parsers. Although the Link Grammar uncorrelated eigenvector feature sets of different words
parser is inferior to the CFDG parser in locating the four can thus score their proximity in the semantic space.
types of dependencies, they are not significantly different
when applied in question answering. Given that our main 3.4 Distributional similarity
task is to investigate the function of the syntactic
We consistently employed the cosine similarity of word
dependencies: RV, AN, SV, and VO, acquired with the
vectors as used in LSA and commonly adopted in
same Link Grammar parser, in automatic thesaurus
assessing distributional similarity (Salton and McGill,
construction, it is appropriate to use the Link Grammar
1986; Schütze, 1992). The cosine of the angle θ between
parser to extract these dependencies.
vectors x and y in the n-dimensional space is defined as:
3.3 Dimensionality reduction in VSM n

x⋅ y ∑x y i i
The four syntactically conditioned matrices, as shown in cosθ = = i =1
Table 1, are extremely sparse with nulls in over 95% of x y n n

the cells. Instead of eliminating the cells with lower ∑x ∑ y


i =1
2
i
i =1
2
i
i
frequencies, we kept all co-occurrences unchanged to
avoid worsening data sparseness. where the length of x and y is ||x|| and ||y||.
Our matrices record the context with both syntactic Note that the accuracy and coverage of automatic term
dependencies and semantic content. These dual clustering inevitably depend on the size and domains of
constraints yield rarer events than word co-occurrences in the corpora employed, as well as similarity measures.
‘a bag of words’. However, they impose more accurate or Consistently using one similarity method—the cosine,
our main task in this paper is to explore the context
interchangeability in automatic thesaurus construction, mainly with the relationships of syn/antonym, IS-A,
rather than to compare different similarity measures with HAS-A, whereas Roget’s Thesaurus covers both
one united syntactic structure that combines all the syntagmatic and paradigmatic relations and hierarchically
dependencies together. Although taking into account clusters related words or phrases into each topic without
more similarity measures in the evaluations may solidify explicitly annotating their relationships.
conclusions, this would take us beyond the scope of the
Kilgarriff and Yallop (2000) claimed that WordNet, along
work.
with the automatic thesauri generated under the
hypothesis of similar words sharing similar syntactic
4 Evaluation structures, are tighter rather than looser in defining
whether they are ‘synonyms’ or related words. This
4.1 The ‘gold standard’ thesaurus contrasts with Roget and the automatic thesauri derived
It is not a trivial task to evaluate automatic thesauri in the through unordered word co-occurrences. Since we
absence of a benchmark set. Subjective assessment on accounted for distributional similarity in the syntactically
distributionally similar words seems a plausible approach conditioned VSM, the reasonable way of evaluating it is
to assessing the quality of term clusters. It is practically to compare our automatic thesauri to WordNet. Apart
unfeasible to implement it given the size of the term from that, to perform a systematic evaluation on the
clusters. A low agreement on word relatedness also exists relationships among distributionally similar words, we
between human subjects. also included Roget as a supplement to the ‘gold
standard’, as it covers words with both paradigmatic and
The alternative way of measuring term clusters is to syntagmatic relationships.
contrast them with existing lexical resources. For
example, Grefenstette (1993) evaluated his automatic 4.2 Similarity comparison
thesaurus with a ‘gold standard’ dataset consisting of
Roget’s Thesaurus ver. 1911, Macquarie Thesaurus, and We defined two distinctive measures to compare
Webster’s 7th dictionary. If two words were located automatic thesauri with the ‘gold standard’, which are
under the same topic in Roget or Macquarie, or shared SimWN for WordNet and SimRT for Roget.
two or more terms in their definitions in the dictionary,
they were counted as a successful hit for synonyms or 4.2.1 Similarity in WordNet
semantic-relatedness. To improve the coverage of the
SimWN is based on the taxonomic similarity method
‘gold standard’ dataset, Curran (2003) incorporated more
proposed by Yang and Powers (2005; 2006). Since Yang
thesauri: Roget’s Thesaurus (supplementing the free
and Powers’s method outperformed most popular
version of 1911 provided by Project Gutenberg with the
similarity methods in terms of correlation with human
modern version of Roget’s Thesaurus II), Moby
similarity judgements, we employed them in the
Thesaurus, The New Oxford Thesaurus of English, and
evaluation. Given two nominal or verbal concepts: c1 and
The Macquarie Encyclopaedic Thesaurus.
c2, SimWN scores their similarity with:
The ‘gold standard’ datasets are not without problem due
to their domain and coverage, because they are at best a Sim(c1, c 2) = α str × α t × β tdist −1 , dist ≤ γ
snapshot of general or specific English vocabulary  αstr: 1 for nouns but for verbs successively falls back
knowledge (Kilgarriff, 1997; Kilgarriff and Yallop, to αstm the verb stem polysemy ignoring sense and
2000). Moreover, the organization of thesauri forces form; or αder the cognate noun hierarchy of the verb;
different notions of being synonymous or similar, given or αgls the definition of the verb.
the etymologic trend of words and different purposes of
lexicographers. For example, as 1 of 1,000 topics in  αt: the path type factor to specify the weights of
Roget’s Thesaurus ver. 1911, there are two groups of different link types, i.e. syn/antonym, hyper/
synonyms {teacher, trainer, instructor, institutor, master, hyponym and holo/meronym in WordNet.
tutor, director, etc.} or {professor, lecturer, reader, etc.}
 β: the probability associated with a direct link
under the topic of teacher. They express an academic between concepts (type t).
concept of being in the position of supervision over
somebody. In the noun taxonomy of WordNet, the  dist: the distance between two concept nodes
synonym of teacher only consists of instructor, affiliated
 γ: the path length dist is limited to depth factor γ,
with the coordinate terms (sharing one common
otherwise the similarity is 0
superordinate) such as lecturer and reader, or the
hyponyms such as coach and tutor, or the hypernyms As for multiple senses of a word, word similarity
such as educator and pedagogue. As for professor and maximizes their sense or concept similarity in WordNet.
master, they both distance teacher by three links through
Yang and Powers (2005) compared their taxonomic
their hypernym educator.
similarity metric with human judgements on the 65 noun
Subject to the availability of these thesauri or pairs, where the cut-off point 2.36 of human similarity
dictionaries, we incorporated both WordNet and Roget’s scores for nouns on a Likert scale from 0 to 4 divides
Thesaurus, freely acquired, into the ‘gold standard’ each dataset into similar (≥ 2.36) and dissimilar subsets
thesaurus. WordNet only consists of paradigmatic (< 2.36). We found that the cut-off of 2.36 for nouns
relations and organizes a fine-grained semantic taxonomy corresponds to the searching depth limit γ = 4 in SimWN,
and likewise the cut-off of 2 on the 130 verb pairs (Yang nouns and verbs that account for the major part of
and Powers, 2006) corresponds to γ = 2. Thus for the published thesauri and are more informative than other
noun candidates in automatic thesauri, we set up γ = 4, to PoS tags. The word distribution within different distances
identify similar words within the distance of less than to the 100 nouns and 100 verbs in the ‘gold-standard’ are
four links. If two nodes are syn/antonyms or related to listed in Table 2, where ∑X indicates the overall nouns
each other in the taxonomy with the shortest path length from AnX, aNX, SvX, and vOX and verbs from rVX, VoX,
of less than 4, we counted them as a successful hit. So too and sVX in the ‘gold-standard’. For the ‘gold-standard’
is the shorter distance limit γ = 2 for verb candidates. words from WordNet, SA denotes syn/antonyms of the
targets, and DI the words with exactly I link distance to
4.2.2 Similarity in Roget’s Thesaurus targets (for nouns I ≤ γ = 4; for verbs I ≤ γ = 2); ∑
denotes the total number of ‘gold-standard’ words in each
Roget’s Thesaurus divides its hierarchy into seven levels
matrix; and Total means the overall number of ‘gold-
from the top class to the bottom topic, and stores topic- standard’ words from both WordNet and Roget. In Table
related words under 1 of 1,000 topics. SimRT counted it a 2 the average number of ‘gold-standard’ words across
hit if two words are situated under the same topic.
each matrix is evenly distributed.
Note that the relationships among the ‘gold standard’
The agreement between the WordNet-style and Roget-
words retrieved by SimRT are anonymous. Although
style words in the ‘gold-standard’ across these matrices,
WordNet only organizes paradigmatic relationships,
that is, the ratio of the number of words retrieved by
SimWN does not distinguish in what way two words are SimWN and SimRT in both WordNet and Roget against the
similar, for example, IS-A, HAS-A, or a mixture of them, total number of ‘gold-standard’ words, is on average
and only collects words within a distance from zero
7.3% on nouns and less than 15.2% on verbs. We
(syn/antonyms) to four links in WordNet.
aggregated all the ‘gold-standard’ words across AnX,
aNX, SvX, and vOX for nouns, as well as rVX, VoX, and
4.3 Candidate words in the ‘gold standard’ sVX for verbs, which results in 244,245 nouns and
148,455 verbs overall in the ‘gold standard’. The
agreement between WordNet and Roget candidates on
WordNet Roget Total
nouns and verbs is respectively about 6.9% and 14.9%,
SA D1 D2 D3 D4 ∑ that is to say, about 14.8% and 11.6% nouns in WordNet
Noun aNX 462 2,825 14,244 41,483 48,625 107,639 141,102 232,181 and Roget are of same, so are 25.4% and 26.5% for verbs.
AnX 458 2,887 14,278 41,940 49,267 108,830 142,218 234,424
Each target noun on average owns about 1,148 WordNet,
1,464 Roget, and 2442 Total words in the ‘gold standard’,
vOX 439 2,619 13,027 37,433 43,620 97,138 133,733 214,727
and each target verb 872, 834, and 1485 words
SvX 434 2,607 12,938 37,355 43,274 96,608 131,527 212,156 respectively.
∑X 469 2,979 14,967 44,185 52,054 114,779 146,435 244,245

Verb rVX 1,282 24,702 58,617 84,601 81,713 144,545 4.4 A walk-through example
VoX 1,260 24,265 57,225 82,750 79,771 141,039 For each seed word, after computing the cosine similarity
sVX 1,269 24,354 57,642 83,265 80,681 142,256 of the seed with all other words in each dependency
∑X 1,297 25,283 60,483 87,165 83,415 148,455 matrix, we produced and ranked the top n words as
candidates. We then applied the two heuristics: ‘any two’
and ‘all’ on these candidates to forming automatic
Table 2: The word relatedness distribution in the thesauri.
‘gold-standard’ across each matrix
In Table 3 we exemplify the top 20 similar words of
We select 100 seed nouns and 100 seed verbs with term sentence and attack yielded in each dependency set and
frequencies of around 10,000 times in BNC. The average the two heuristics. Consider the distributionally similar
frequency of these nouns is about 8,988.9, and 10,364.4 words of sentence and attack in aNX and rVX for
for these verbs. High frequency words are likely to be example. The words related to the linguistic sense of
generic or general terms and the less frequent words may sentence consists of syllable, words, adjective, etc, in
not happen in the semantic sets. The average frequency of aNX, while the words with the judicial sense make up
the nouns in AnX, aNX, SvX, and vOX is in fact decreased around half of the 20 words including imprisonment,
to 3,361.1, 5,629.1, 1,156.7, and 1,692.1, and the verbs in penalty, and the like. The words such as rape and
rVX, VoX, and sVX are decreased to 3,014.3, 3,328.9, and slaughter from rVX are from the literal sense of attack,
1,971.8, as we only extracted syntactic dependencies together with its metaphorical sense among other words
from BNC. Overall, the average frequency of the nouns is like badmouth, flame, and so on.
about 2,959.7 across AnX, aNX, SvX, and vOX, and
3,960.9 for the verbs across rVX, VoX, and sVX. The heuristic of ‘any two’ collected the intersection of
thesaurus items across these dependency sets. For
We first used SimWN and SimRT to compare each seed example, punishment and words are the similar words to
word to all other words from the dependency sets, namely sentence, which respectively occurred in aNX and vOX as
AnX, aNX, SvX, and vOX for nouns and rVX, VoX, and well as in aNX and AnX; criticise and bomb are the
sVX for verbs, to retrieve its candidate words in the ‘gold similar words to attack, which respectively occurred in
standard’. Instead of a normal thesaurus with a full VoX and rVX as well as in VoX and sVX.
coverage of PoS tags, we only compiled the synonyms of
Similar words Analogously for the ranked word list from an automatic
aNX imprisonment term utterance penalty excommunication syllable thesaurus, the top n similar words with respect to each
words punishment prison prisoner phrase detention sense of T in WordNet are produced in the order of
hospitalisation fisticuffs banishment verdict Minnesota meaning
adjective warder hyper/hyponyms and holo/meronyms with exhausting
AnX words syllable utterance clause nictation word swarthiness initially synonyms and then antonyms, whereas the top n
paragraph text homograph discourse imprisonment nonce words in Roget can be subsequently acquired within +/-n
phrase hexagram adjective verb niacin savarin micheas (preceding/succeeding) words from T in each of its
vOX soubise cybele sextet cristal raper stint concatenation kohlrabi
tostada apprenticeship ban contrivance Guadalcanal necropolis category. Through these redefined precision and recall Pn
misanthropy roulade gasworks curacy jejunum punishment can stand for the coverage of the automatic thesaurus on
SvX ratel occurrence cragsman jingoism shiism Oklahoma potentially arbitrary senses or categories of T and Rp can
genuineness unimportance language gathering letting grimm describe relatedness of the thesaurus on the actual sense
chaucer accent taxation ultimatum arrogance test verticality
habituation or category of T.
any imprisonment words utterance word term punishment
two paragraph text phrase jail verb meaning noun poem 5 Results
language passage sequence syllable lexicon fine
all Imprisonment utterance penalty excommunication punishment We took the top n similar words derived from each co-
prison prisoner detention hospitalisation banishment Minnesota occurrence matrix for ‘any two’ or ‘all’, with n varying
meaning contrariety phoneme consonant counterintelligence
starvation fine cathedra lifespan from 1 to 1000 in ten steps, roughly doubling each time.
The results are shown in Table 4. We individually listed
(a) The similar words to sentence (as a noun) Pn and Rp values with respect to WordNet, Roget, and
Similar words the union of WordNet and Roget (Total).
rVX assault rape criticize arm slaughter abduct mortar accuse defend
fire avow lash badmouth blaspheme slit singe flame kidnap
persecute ‘all’ ‘any two’
VoX Raid criticise bomb realign outwit beleaguer guard raze bombard
WordNe Roget Total WordNe Roget Total
criticize resemble spy pulse misspend reformulate alkalinise t t
metastasise placard ruck glory N Pn Rp Pn Rp Pn Rp Pn Rp Pn Rp Pn Rp
sVX ambush invade fraternize palpitate patrol wound pillage bomb
1 noun 22.0 22.0 15.0 15.0 27.0 27.0 24.0 24.0 12.0 12.0 28.0 28.0
billet shell fire liberate kidnap raid garrison accuse assault arrest
slaughter outnumber verb 13.0 13.0 7.0 7.0 16.0 16.0 15.0 15.0 8.0 8.0 20.0 20.0
any assault criticize bomb ambush accuse raid fire rape bombard 2 noun 31.0 35.2 19.0 23.7 36.0 41.2 34.0 34.0 20.0 20.0 42.0 37.5
two kidnap infiltrate patrol defend storm invade arrest garrison
torture stab shoot verb 39.0 31.7 9.5 12.0 40.0 34.2 48.5 34.4 11.0 13.3 49.5 38.2
all raid bomb assault criticize ambush accuse fire guard bombard
5 noun 42.4 21.1 22.2 29.5 46.8 27.1 56.6 17.1 28.4 24.0 63.2 20.0
patrol rape storm infiltrate wound kidnap criticise garrison
alkalinize torture spy verb 54.2 25.6 20.2 17.1 55.8 26.9 62.6 27.4 23.8 15.0 64.0 28.7

10 noun 43.4 11.8 19.4 18.5 47.5 15.5 56.6 10.4 26.9 17.1 62.3 11.0
(b) The similar words to attack (as a verb)
verb 53.3 19.5 18.0 17.5 54.7 19.6 62.3 21.7 20.9 15.9 63.7 21.2
Table 3: A sample of thesaurus items
20 noun 37.7 9.5 16.1 13.8 41.6 9.8 50.2 8.7 22.7 16.5 56.0 8.4

verb 49.3 15.0 13.9 15.0 50.9 14.7 57.5 15.6 16.1 13.8 59.0 15.4
4.5 Performance evaluation
50 noun 29.0 8.0 11.2 11.2 32.3 7.4 41.4 7.2 16.7 9.5 46.4 6.8
Instead of simply matching with the ‘gold standard’ verb 43.8 11.9 10.0 10.9 45.4 11.3 49.5 12.2 11.4 9.9 51.3 11.5
thesauri, Lin (1998) proposed to compare his automatic
100 noun 22.9 8.4 8.2 9.5 25.7 7.4 33.8 6.6 12.8 6.6 38.4 5.9
thesaurus with WordNet and Roget on their structures,
taking into account the similarity scores and orders of verb 39.7 10.0 7.7 8.4 41.2 9.2 44.1 10.4 8.4 7.5 45.6 9.8
similar words respectively produced from distributional 200 noun 18.6 6.9 5.9 7.8 20.9 5.9 26.6 6.2 8.9 6.2 30.2 5.5
similarity and taxonomic similarity. This approach can verb 36.0 9.3 5.9 6.5 37.4 8.6 39.6 9.3 6.4 6.2 41.0 8.5
account for thesaurus resemblance under the hierarchy of
500 noun 13.6 6.4 3.9 6.1 15.4 5.5 18.6 6.0 5.4 5.8 21.0 5.3
WordNet or Roget, which is an apparent advantage over
straight word matching. verb 32.6 8.5 4.2 5.7 33.8 7.7 35.1 8.5 4.6 5.3 36.4 7.7

1000 noun 11.0 6.3 2.8 5.5 12.4 5.4 14.1 6.1 3.6 5.5 16.0 5.2
Instead of calculating the varied cosine similarity
between each target vector yielded from automatic verb 30.5 8.2 3.4 4.9 31.6 7.3 32.7 8.2 3.6 4.9 33.8 7.3

thesaurus and from WordNet or Roget (Lin, 1998), we


adapted the concept of Precision (Pn) and Recall- Table 4: The precision and recall in automatic thesauri
precision (Rp) from information retrieval to demonstrate under the heuristics of ‘any two’ and ‘all’ (percentage)
much sensible values of precision and recall for a ranked
list. Given the top n similar words S for a target T in an
automatic thesaurus Pn is defined as |S|/n, where |S| refers
6 Discussion
to the number of S that can be retrieved in the top n
similar words of T in WordNet or Roget. Rp is 6.1 ‘any two’ vs ‘all’
conditioned on precision and is correspondingly defined It is clear that in terms of Pn measurement ‘any two’
as |S|/∑d(S), where in terms of words d(S) denotes consistently outperformed ‘all’ for both nouns and verbs
minimum distance between T and S if S can be located in thesaurus construction. The improvement in the
within the top n similar words of T in WordNet or Roget. precision of the ‘any two’ clusters over the ‘all’ heuristic
was significant (p < 0.05, paired t test). This is achieved sense of a word. The predominant sense often serves as a
under the condition of comparable Rp. Before reaching back-off in sense disambiguation. To study the sense
the threshold 200, the overall Rp for verbs for ‘any two’ distribution of the words in automatic thesaurus, we also
almost stay higher than for ‘all’, which is contrary in the calculated Pn on the condition of extracting the ‘gold-
case of nouns. Since then no noticeable difference can be standard’ words exclusively related to the first sense of a
observed. The reason behind this could be that some target (First), in contrast to all the senses.
‘gold-standard’ words derived from a matrix may never
Overall the precision of First sense is not less than 50%
occur in the thesaurus entries from another matrix, which
of the precision of all sense for both nouns and verbs in
are neglected in ‘any two’.
the ‘any two’ heuristic. This implies that distributionally
We also extend this work to the words with intermediate similar words derived using the ‘any two’ heuristic are
(around 4,000) and low (around 1,000) term frequencies more semantically related to the first sense of a target,
in BNC. For the 100 nouns and 100 verbs with the around 50% or more, than other senses. Even in the ‘all’
intermediate frequencies, 3,753.9 and 3,675.2 heuristic around 50% of the words that match a ‘gold-
respectively, the average frequency of the nouns across standard’ for any sense, hold semantic relatedness with
AnX, aNX, SvX, and vOX is 1,274.7, and the verbs across the first senses of targets.
rVX, VoX, and sVX is 1,422.0. For the 100 nouns and 100
The unbalanced sense distribution among the thesaurus
verbs with low frequencies: 824.1 and 864.6, the average
items shows the uneven usages of words with respect to
frequency of the nouns across AnX, aNX, SvX, and vOX is
the Zipf’s Law (1965). Kilgarriff (2004) also noted
297.0, and the verbs 342.2 across rVX, VoX, and sVX. For
Zipfian distribution of both word sense and words when
the intermediate and low frequency words, the heuristic
analysing the Brown corpus and BNC. The predominant
of ‘any two’ still significantly outperformed the ‘all’ in
sense of a word can be formed through their
yielding automatic thesauri (p < 0.05) with higher
distributionally similar words instead of laborious sense
precision.
annotation work, which serves as an important resource
As the threshold increasing from 1 to 1000 in Table 4, in sense disambiguation.
both the nominal and verbal parts of thesaurus using the
heuristics of ‘any two’ and ‘all’ could corroborate a 6.3 Distributional similarity and semantic
preference for relationships from WordNet rather than relatedness
from Roget, since both Pn in WordNet contributed
majority of the overall Pn in contrast to it in Roget. Note Semantic similarity is often regarded as a special case of
that from the figures shown in Table 2, we can observe semantic relatedness, while the latter also contains word
that the overlap between WordNet and Roget is rather association. Distributional similarity consists of both
small, where only 14.8% of WordNet or 11.6% of Roget semantic similarity and word association between a seed
for nouns co-occur, so does 25.4% of WordNet or 26.5% word and candidate words in its thesaurus items, except
of Roget for verbs. This could be caused by filtering out for the ‘noisy’ words (due to the parsing or statistical
more Roget words present in the ‘all’ or ‘any two’ errors) that hold no plausible relationships with the seed.
thesaurus. This trend keep unchanged even when more Consider the distributionally similar words of sentence
unrelated words could be introduced as the threshold produced in aNX in Table 3 (a) for example. Only three
approached 1000. words, namely term, phrase, and verdict, were connected
with sentence through the similarity measurement of
We can compare the entry of sentence and attack with the SimWN in WordNet, whereas 14 words such as phrase and
threshold of 20 in the ‘any two’ thesaurus to their penalty shared the same topics with sentence in Roget.
respective entries in the ‘all’ thesaurus, that are listed in The noun sentence consists of three senses in WordNet,
Table 3. The entry of sentence in the ‘any two’ thesaurus
constituted the top 20 similar words in Table 3 (a), they  sentence#n#1: a string of words satisfying the
were all akin to sentence without any ‘noisy’ words such grammatical rules of a language
as Minnesota and counterintelligence in the ‘all’  sentence#n#2: (criminal law) a final judgment of
thesaurus. So did attack in Table 3 (b), which comprised guilty in a criminal case and the punishment that is
near-synonyms after filtering out the unrelated words imposed
such as alkalinise in the ‘all’ thesaurus. However, some
truly related words were also missed out in the ‘any two’  sentence#n#3: the period of time a prisoner is
thesauri, for example, the similar words penalty and imprisoned
banishment to sentence in the ‘all’ thesaurus, as well as The word sentence is also located in Section 480
guard and wound to attack. This can be partly (Judgement), 496 (Maxim), 535 (Affirmation), 566
complemented through increasing the threshold. Even (Phrase), and 971 (Condemnation) in Roget. For
with the threshold 50, the overall thesaurus entries of example, the nominal part of Section 480 is,
were still acceptable with approximately 50% of total
precision. 480. Judgment. [Conclusion.]

N. result, conclusion, upshot; deduction, inference,


6.2 The predominant sense ergotism[Med]; illation; corollary, porism[obs3];
Word senses in WordNet are ranked by their frequencies, moral. estimation, valuation, appreciation,
where the first sense often serves as the predominant judication[obs3]; dijudication[obs3], adjudication;
arbitrament, arbitrement[obs3], arbitration;
assessment, ponderation[obs3]; valorization. its distributionally similar words, including emulator,
award, estimate; review, criticism, critique, notice, unix, NT, Cobol, Oracle (as the database system),
report. decision, determination, judgment, finding, processor, and PC, are not included in the 1911 version
verdict, sentence, decree; findings of fact; findings of Roget. We selected the target word with relatively
of law; res judicata[Lat]. plebiscite, voice, casting higher frequencies in BNC and did a simple morphology
vote; vote &c. (choice) 609; opinion &c. (belief) analysis in the construction of the matrices using word-
484; good judgment &c. (wisdom) 498. judge, mapping table in WordNet, so that all nouns and verbs
umpire; arbiter, arbitrator; assessor, referee. from automatic term clustering can be covered (at least in
censor, reviewer, critic; WordNet). However, not all word relationships in
automatic thesauri could be contained in WordNet, even
connoisseur; commentator &c. 524; inspector,
though we have included Roget to supply richer
inspecting officer. twenty-twenty hindsight
relationships. For example, take the words sentence and
[judgment after the fact]; armchair general,
detention. In Table 3 (a) detention is listed in the top 20
Monday morning quarterback.
similar words to sentence on aNX, but they have no direct
Generally sentence#n#1 in WordNet can be projected into or indirect links in WordNet, nor are they situated under
Section 496 and 566, and sentence#n#2 into Section 480 any topic or section in Roget, but their intense association
and 971, and sentence#n#3 into Section 535. With respect has become commonly used. Likewise, kidnap as one of
to the evaluation of SimWN in WordNet, term in Table 3 the top 20 similar words to attack on rVX in Table 3 (b),
(a) is the hypernym of sentence#n#3; and phrase and which is distributionally similar to attack, but there are no
sentence#n#1 distance themselves in three links, say, existing connections between them in WordNet and
sentence#n#1 has a meronym of clause that is a Roget.
coordinate of phrase; and sentence#n#2 bears the same
hypernym with verdict within four links. Apart from the 7 Conclusion
paradigmatic relationships in WordNet, the three words With the introduction of grammatical relations in
also connect with sentence through SimRT in Roget, where computing distributional similarity, automatic thesaurus
words such as verdict and sentences are located under the construction can be improved through the
same section—Judgement (480). However, sentence interchangeability of similar words in diverse
holds more relations of being in the same domain with its syntactically conditional contexts. Most methods still
similar words in the thesaurus from aNX. For example, combined these contexts into one united representation
penalty and sentence come from/exist in Section 971, for similarity computation, which worked analogously to
which expresses the notion of criminality deserving a these based on the premise of ‘a bag of words’. After the
penalty in a way of judicial sentence, and prisoner and categorization of the syntactically conditioned contexts,
sentence are situated in Section 971, which illustrates through which similar words can be formed under the
being in prison resulting from judgements in a court in assumption of context interchangeability, automatic
the context of criminal law. thesauri were yielded with significantly higher precision
As we compute distributional similarity on the than the traditional methods. Future research will focus
assumption of similar words sharing similar contexts on clustering dependencies and extracting word senses
conditioned by grammatical relations, in general more from the thesaurus entries. Learning or enriching
paradigmatic relations can be found than syntagmatic ontologies from automatic thesauri is also the next task.
ones. In Table 4, the higher precision for WordNet than
for Roget’s Thesaurus show that distributionally similar 8 References
words are more semantically similar rather than
Carroll, John, Ted Briscoe and Antonio Sanfilippo
associated words. This is consistent with the conclusion
(1998). Parser Evaluation: a Survey and a New
of Kilgarriff and Yallop (2000) on computing
Proposal. In the First International Conference on
distributional similarity that the hypothesis of similar
Language Resources and Evaluation, 447-454.
words sharing similar contexts constrained by
Granada, Spain.
grammatical relations can yield tighter or WordNet-style
thesauri, whereas the hypothesis of similar words sharing Cruse, D. A. (1986). Lexical Semantics, Cambridge
unconditioned co-occurrences can yield looser or Roget- University Press.
style thesauri. Note that distributionally similar words
Curran, James R. (2003). From Distributional to Semantic
could be semantically opposite to each other, given the
Similarity. Ph.D thesis
common grammatical relations they often share. For
example, in the automatic thesaurus produced with ‘any Deerwester, Scott C., Susan T. Dumais, Thomas K.
two’, the nouns failure and success, or strength and Landauer, George W. Furnas and Richard A. Harshman
weakness, are antonymous, as well the verbs cry and (1990). Indexing by Latent Semantic Analysis. Journal
laugh, deny and admit. of the American Society of Information Science 41(6):
391-407.
It is clear that the ‘gold standard’ is subject to the
vocabulary size of WordNet and Roget’s Thesaurus. The Fellbaum, Christiane (1998). WordNet: An Electronic
worse case is from the 1911 version of Roget’s Thesaurus Lexical Database. Cambridge, MA, The MIT Press.
we adopted, where words generated in modern times are
not contained. For example words such as software and Grefenstette, Gregory (1992). Sextant: Exploring
Unexplored Contexts for Semantic Extraction from
Syntactic Analysis. In the 30th Annual Meeting of the Linguistics(EACL), workshop on Evaluation Initiatives
Association for Computational Linguistics, 324-326. in Natural Language Processing, 43-50. Budapest,
Newark, Delaware. Hungary.
Grefenstette, Gregory (1993). Evaluation Techniques for Padó, Sebastian and Mirella Lapata (2007). Dependency-
Automatic Semantic Extraction: Comparing Syntactic based construction of semantic space models. To
and Window Based Approaches. In the Workshop on appear in Computational Linguistics 33(2).
Acquisition of Lexical Knowledge from Text, 143-153.
Pantel, Patrick and Dekang Lin (2002). Discovering
Harris, Zellig (1985). Distributional Structure. In The Word Senses from Text. In the Eighth ACM SIGKDD
Philosophy of Linguistics J. J. Katz, (ed). New York, International Conference on Knowledge Discovery and
Oxford University Press: 26-47. Data Mining, 613-619. New York, NY, USA.
Hindle, Donald (1990). Noun Classification from Resnik, Philip (1997). Selectional Preference and Sense
Predicate-argument Structures. In the 28th Annual Disambiguation. In ACL Siglex Workshop on Tagging
Meeting of the Association for Computational Text with Lexical Semantics, Why, What and How?,
Linguistics, 268-275. Pittsburgh, Pennsylvania. 52-57. Washington, USA.
Hirschman, Lynette, Ralph Grishman and Naomi Sager Sahlgren, Magnus (2006). The Word-Space Model: Using
(1975). Grammatically-based Automatic Word Class Distributional Analysis to Represent Syntagmatic and
Formation. Information Processing and Management Paradigmatic Relations between Words in High-
11: 39-57. Dimensional Vector Spaces. Ph.D thesis
Kilgarriff, Adam (1997). I don't Believe in Word Senses. Salton, Gerard and Michael J. McGill (1986).
Computers and the Humanities 31(2): 91-113. Introduction to Modern Information Retrieval. New
York, NY, USA, McGraw-Hill.
Kilgarriff, Adam (2004). How Dominant is the
Commonest Sense of a Word? In the 7th International Schütze, Hinrich (1992). Dimensions of Meaning. In the
Conference (TSD 2004, Text, Speech and Dialogue), 1992 ACM/IEEE Conference on Supercomputing, 787-
103-112. Brno, Czech Republic. 796. Minneapolis, Minnesota, USA.
Kilgarriff, Adam and Colin Yallop (2000). What's in a Sleator, Daniel and Davy Temperley (1991). Parsing
Thesaurus? In the Second International Conference on English with a Link Grammar, Carnegie Mellon
Language Resources and Evaluation, LREC-2000, University.
1371-1379. Athens, Greece.
Weeds, Julie Elizabeth (2003). Measures and
Landauer, Thomas K. and Susan T. Dumais (1997). A Applications of Lexical Distributional Similarity. Ph.D
Solution to Plato's Problem: the Latent Semantic thesis
Analysis Theory of Acquisition, Induction, and
Yang, Dongqiang and David M.W. Powers (2005).
Representation of Knowledge. Psychological Review
Measuring Semantic Similarity in the Taxonomy of
104: 211-240.
WordNet. In the Twenty-Eighth Australasian Computer
Lin, Dekang (1997). Using Syntactic Dependency as a Science Conference (ACSC2005), 315-322. Newcastle,
Local Context to Resolve Word Sense Ambiguity. In Australia, ACS.
the 35th Annual Meeting of the Association for
Yang, Dongqiang and David M.W. Powers (2006). Verb
Computational Linguistics, 64-71. Madrid, Spain.
Similarity on the Taxonomy of WordNet. In the 3rd
Lin, Dekang (1998). Automatic Retrieval and Clustering International WordNet Conference (GWC-06), 121-
of Similar Words. In the 17th International Conference 128. Jeju Island, Korea.
on Computational Linguistics, 768-774. Montreal,
Yarowsky, David (1993). One Sense per Collocation. In
Quebec, Canada.
ARPA Human Language Technology Workshop, 266-
Lowe, Will (2001). Towards a Theory of Semantic Space. 271. Princeton, New Jersey.
In the 23rd Annual Conference of the Cognitive
Zipf, George Kingsley (1965). Human Behavior and the
Science Society, 576-581. Edinburgh, UK.
Principle of Least Effort: an Introduction to Human
McCarthy, Diana, Rob Koeling, Julie Weeds and John Ecology. N.Y., Hafner Pub. Co.
Carroll (2004). Finding Predominant Senses in
Untagged Text. In the 42nd Annual Meeting of the
Association for Computational Linguistics (ACL-04),
267-287. Barcelona, Spain.
Miller, George A. and Walter G. Charles (1991).
Contextual Correlates of Semantic Similarity.
Language and Cognitive Processes 6(1): 1-28.
Molla, Diego and Ben Hutchinson (2003). Intrinsic
versus Extrinsic Evaluations of Parsing Systems. In
European Association for Computational

You might also like