Development and Application of A Metric On Semantic Nets: S1S1tM3, 17, I, 1707 Li
Development and Application of A Metric On Semantic Nets: S1S1tM3, 17, I, 1707 Li
I, J A N U A K l / r L B K U A K I 1707 li
Abstract -Motivated by the properties of spreading activation and and to sets of concepts in a hierarchical knowledge base
conceptual distance, the authors propose a metric, called Distance, on the show the power of hierarchical relations in representing
power set of nodes in a semantic net. Distance is the average minimum
path length over all painvise combinations of nodes between two subsets of
information about the conceptual distance between con-
nodes. Distance can be successfully used to assess the conceptual distance cepts.
between sets of concepts when used on a semantic net of hierarchical In t h s paper a knowledge base will be viewed as a
relations. When other kinds of relationships, like “cause,” are used, graph. In many problems dealing with discrete objects and
Distance must be amended but then can again be effective. The judgments binary relations, a graphical representation of the objects
of Distance significantly correlate with the distance judgments that people
make and help us determine whether semantic net SI is better or worse
and the binary relations on them is a very convenient form
than semantic net S,. First a “conceptual distance” task is set, and people of representation [4], [5] that often leads to a solution using
are asked to perform it. Then the same task is performed by Distance on algorithms from graph theory. DependiRg on the nature of
SI and &. If Distance on SI performs more like people than Distance on the problem, the edges could represent physical links (as
S,, the conclusion is that SI is better than S,. Distance embedded in the for communication networks), time duration (as for task
methodology facilitates repeatable quantitative experiments.
planning), or abstract relationshps (as for association [6]
structures).
I. INTRODUCTION Many graph problems, such as the minimum cost span-
ning tree and traveling salesman problems, require mini-
0 UR RESEARCH group focuses on developing better
herarchical knowledge bases [l]and, in the course of
this work, requires a method for assessing the value of a
mizing the sum of weights of some edges given a set of
constraints [7]. Distances between entire graphs have been
investigated in the pattern recognition literature, where a
knowledge base. Our application area is the retrieval of graph represents a picture [8], [9]. Such distances are based
biomedical literature, and a natural problem which the on the number of transformations that would make one
knowledge base should help solve is the ranking of docu-
graph similar to another one. However, to the best of our
ments listed in response to a query. Our method for
knowledge, a distance as defined in this paper has not
ranking documents assumes that both are represented as
been explored in the graph theory literature or elsewhere.
sets of nodes in a hierarchical knowledge base. The method
Human information processing often involves compar-
has reliably helped us judge the success of our merging
ing concepts. There are various ways of assessing the
algorithms and might similarly help other people. The
similarity of concepts depending on the representation
method uses a metric called Distance that is easy to
adopted for knowledge. In featural representations, con-
manipulate mathematically and to interpret.
cepts are represented by sets of features. The similarity
Some examples of the psychological and information
between two concepts a and b with feature sets A and B
science significance of Distance are given elsewhere [2], [3].
can be expressed as a weighted sum of functions of differ-
This paper focuses on the mathematical characteristics of
ent set operations on A and B [lo]. The theory of spread-
Distance and presents new cases and interpretations. Ex-
ing activation applies to comparing concepts in a semantic
periments in which Distance is applied to pairs of concepts
net [ll].Another method consists of constructing network
fragments for sought for objects and then matching these
Manuscript received May 10, 1987; revised February 24, 1988 and fragments against the network data base [12]. In yet an-
August 29, 1988. This work was supported in part by the National
Science Foundation under Grant ECS-84-06683.
other strategy the in-degree and out-degree of two nodes in
R. Rada is with the Department of Computer Science, University of a semantic net are compared in the course of deciding how
Liverpool, Liverpool L69 3BX, England. similar the two nodes are [13]. Faceted thesauri are a kind
H. h4ili is with the Department of Electrical Engineering and Com-
puter Science, George Washington University, Washington, DC 20052.
of semantic net and are being used in indexing and retriev-
E. Bicknell is with the Mesh Section, National Laboratory of Medicine, ing software. To decide whether two nodes in such a
Bethesda, M D 20894. faceted thesaurus are similar, software scientists have de-
M. Blettner is with the Department of Statistics and Computational
Mathematics, University of Liverpool, Liverpool L69 3BX, England.
vised measures based on the number and type of overlap-
IEEE Log Number 8824462. ping facets that the two nodes have [14].
Distance is based on a simplified version of spreading f ( x , y ) is a metric if the following properties are satisfied:
activation. One of the assumptions of the theory of spread-
ing activation is that the semantic network is organized 1) f ( x , x ) = 0, zero property,
along the lines of semantic similarity. The more properties 2) f(x, y ) = f( y, x), symmetric property,
two concepts share in common, the more links there are 3) f ( x , y) > 0, positive property, and
between the concepts and the more closely related they +
4) f ( x , y ) f( y , z ) > f ( x,z ) , triangular inequality.
are. In these terms, semantic relatedness is based on an
aggregate of the interconnections between the concepts. Second, we explore the extent to which the shortest path
This is different from semantic distance which is equal to length between two nodes can be used as a measure of
the minimal path length between two concepts. Links may conceptual distance, using spreading activation as a model.
be assigned criteriality tags to indicate the importance In particular, we show that under some conditions, the
(strength) of the link between the connected nodes [ll]. shortest path length between two nodes indicates the
Links between a concept and its defining features (versus conceptual distance between the nodes-for two nodes
characteristic features [15]) expectably have higher “criteri- Distance is simply the shortest path length between them.
alities.” Because “is-a” relations [16] are based on similar- Finally, we extend the definition of Distance to handle
ity between defining features, we hypothesize that when concepts represented by sets of nodes, rather than single
only is-a relations are used in semantic nets, semantic nodes. This extension is significant in the context of infor-
relatedness and semantic distance are equivalent (we could mation retrieval systems where a document or a query is
use the latter as a measure of the former). represented by more than one concept from a semantic
Distance is principally designed to work with hierarchi- net.
cal knowledge bases. Hierarchies, both of abstractions and
of concrete entities, are commonplace in the world and
A . Conceptual Distance is a Metric
important in intelligent behavior [17]. They can be useful
in controlling search [18] and in learning about the world Much human information processing involves concept
[19]. The types of hierarchies most explored in this work matching. Category membership and similarity are two
are the type embedded in thesauri as used in the informa- important aspects of concept matching. The more similar
tion retrieval field. These thesauri have long histories of two concepts are, the smaller the conceptual distance be-
being maintained for the indexing and retrieving of docu- tween them. Conceptual distance is a decreasing function
ments, and since they are now typically parts of computer of similarity. When concepts are represented by points in a
information systems, they lend themselves to experiments multidimensional space, conceptual distance can conve-
in which the computer uses the thesaurus to help searchers niently be measured by the geometric distance between the
[201, [211. points representing the concepts at hand and, as such,
In the next section, we will present the mathematical satisfies the properties of a metric.
properties of Distance. After that we will describe Some information retrieval systems use vector descrip-
1) what the mathematical properties mean in terms of tions for documents and queries [22]. Each dimension
knowledge engineering, and 2) several experiments using corresponds to an elementary concept known to the sys-
Distance on a semantic net with hierarchical relations. The tem. The coordinate of a vector along a dimension attests
major points will be that to the relative importance of the corresponding elementary
concept for the document (or query) at hand. In such
Distance is one tool to use in comparing one semantic
systems, conceptual distance is measured by the geometric
net against another,
distance between the corresponding concepts (231. In such
Distance is a metric on sets of nodes of a graph, and
systems, a query retrieves the documents whose vectors are
applications of Distance to nonhierarchical semantic
“closest” to the vector representing the query. Such a
nets reveal the need for amendments to
retrieval strategy is prohibitively costly for large document
Distance.
collections. Accordingly, researchers have explored the ex-
tent to which the search for close documents can be
11. METHODOLOGY
reduced by organizing documents into a hierarchy of
In this section, we discuss the methodology followed in classes, where the elements of each class are within a
the design of Distance. The design of Distance was guided prescribed conceptual distance from each other [22]. An
by two observations: incoming query is compared to “exemplar” documents,
one from each top-level class. If the conceptual distance is
1) the behavior of conceptual distance resembles that of
higher than a prescribed value (depending on the user’s
a metric, and
input and the “radius” of the class), it is concluded that no
2) the conceptual distance between two nodes is often
document from that class is close enough to the query, and
proportional to the number of edges separating the
the whole class is disregarded. For this reasoning to hold,
two nodes in the hierarchy.
it is essential that conceptual distance satisfy the proper-
In this section, we first discuss the extent to which concep- ties of a metric, especially the triangle inequality. In their
tual distance satisfies the properties of a metric. A function work on memory-based reasoning, Stanfill and Waltz have
RADA PI U [ . : D L V t L O P M t N T A N D APPLICATION V t A M I K I C U N b t M A N L I L N L I b I ,
advocated a similar strategy and stressed the importance The problem with this example is that the similarity be-
of metric properties [24]. tween Jamaica and Cuba is based on the geographical
Some cognitive scientists have challenged the applicabil- characteristics and ignores the political differences, and
ity of metric properties to conceptual similarity measures. conversely, for the Cuba-Russia similarity. If we account
On the symmetry property, Tversky noted that there are for both defining properties, Jamaica will not be very
instances where similarity appears to be asymmetric [lo]. similar to Cuba, nor will Cuba be very similar to Russia.
We believe that the asymmetry in those instances does not In summary, treating conceptual distance as a metric is
derive from the asymmetry of similarity (as a feature consistent with the practical view of concepts as points in
comparison process) but from the existence of another a multidimensional space. Exceptions to the symmetry and
asymmetric relationshp between two concepts. An in- triangle inequality seem to result from a broad, if not
stance of such a relationship is the instance-class relation- inconsistent, use of similarity. Our work shows the viabil-
ship. For example, if people say that robins are more ity of treating conceptual distance as a metric.
similar to birds than birds are similar to robins, we suspect
that the people are making a fuzzy category-membership
B. Shortest Path Lengths in is-a Hierarchies
decision, rather than a similarity assessment. In a related
argument, Ortony noted an extreme asymmetry in meta- In this section, we discuss the extent to which shortest
phorical statements [25]. Most students may consider the path lengths in is-a herarchies can be used to measure
statement, “Lectures are like sleeping pills,” to be more conceptual distance. In particular, we show that, in the
true than the statement, “Sleeping pills are like lectures.” context of Quillian’s model of semantic memory [26],
The asymmetry is due to the different roles played by the shortest path lengths are not sufficient to measure the
concepts in a metaphor. We build metaphors such as “ A is conceptual distance between concepts. However, when the
like B,” if the characteristic features of A match the paths are restricted to is-a links, the shortest path length
defining features of B [15]. Therefore, although “is like” does measure conceptual distance. We also discuss how
may sound like “is similar to,” it is implicit in metaphors well the metric properties are supported by spreading
that the feature comparison process is selective. The au- activation [ll].
thors believe that if similarity is limited to an (uncon- In Quillian’s model of semantic memory, concepts are
strained) feature comparison process, it is symmetric. represented by nodes and relationships by links. Links are
Tversky [lo] argued that the triangle inequality trans- labeled by the name of the relationship and are assigned
lates to what we will refer to as the “reverse triangle “criteriality tags” that attest to the importance of the link.
inequality.” Given three concepts A , B , and C, the reverse In computer implementations, criteriality tags are numeri-
triangle inequality says that the similarity of A to C is cal values that represent the degree of association of the
greater than the sum of the similarity of A to B and the two concepts (such as how often that link is traversed) and
similarity of B to C. Tversky showed that the reverse the nature of the association. The association is positive if
triangle inequality can be violated. We have two objections the existence of that link indicates some sort of similarity
to Tversky’s argument. First, it is not always the case that between the end nodes and negative otherwise. For exam-
the triangle inequality for conceptual distance translates ple, superordinate links (the term used for is-a) have a
into the reverse triangle inequality for similarity. If the positive association, while “is-not-a” links have a negative
similarity between two concepts x and y is given by association.
S( x, y ) = [l+ D ( x , y)]-’, where D ( x , y ) is a metric of Roughly speaking, spreading activation [ 111 prescribes
conceptual distance between x and y,’ then S(x, y ) does that to compare two concepts, the paths that separate the
not satisfy the reverse triangle inequality.2 Second, one of two nodes and that satisfy the constraints defined by the
the examples used by Tversky to illustrate the violation of semantics of the relations and the context are considered
the reverse triangle inequality also illustrates an inconsis- for evaluation. These paths are traced by propagating two
tent use of similarity: “activation tags” from the nodes corresponding to the
concepts, one tag originating from each node. When two
... although Jamaica is very similar to Cuba (due to its activation tags “meet” at one node, the paths from the
geographical characteristics) and Cuba is very similar to originating nodes to that node are concatenated to form a
Russia (politically), Jamaica is not at all similar to path between the originating nodes. For each path, posi-
Russia. . . . [15]. tive criteriality tags contribute to “positive evidence” (for
similarity), and negative criteriality tags contribute to
‘Prior to our work on Distance, we developed a similarity (Relevance)
“negative evidence.” When positive evidence exceeds 50me
measure based on the same formula, where D ( x , y ) was the shortest path predetermined threshold, the comparison is concluded suc-
length between x and y in an is-a hiearchy. Relevance simulated well cessfully. On the contrary, if negative evidence falls below
people’s assessments of similarity, but its results were not as easily
interpretable as Distance’s.
some negative threshold, it is concluded that the concepts
-Let U , h , and c be three concepts such that D ( u , h ) = D ( h . (,) = are not similar.
D ( u , c ) = d . While D(U , c) is less than D ( u , h ) + D(h. c)-i.e., the trian- In Quillian’s model superordinate (is-a) links are as-
gle inequality is true-it is not true that S ( u , c ) ( = [1+ d ] - ’ ) is bigger
than S ( u . h ) + S ( h , c ) ( = 2 X [ 1 + d]-’)-i.e.. the reverse triangle in-
signed high criteriality tags. If spreading activation is used
equality is not true. across is-a links only, short paths will significantly con-
20 IEEE TRANSACTIONS ON SYSTEMS, M A N , A N D CYBERNETICS, VOL. 19. NO. 1. JANUARY/FEBRUARY 1989
tribute to positive evidence of similarity, and the corre- a complete definition of Distance with a number of impor-
spondence between semantic distance (shortest path length) tant mathematical properties.
and semantic relatedness (conceptual distance) will be I ) Compound Concepts in Quillian’s Model: In Quillian’s
strong. We hypothesize that such correspondence is strong model. concepts can be combined to define more complex
enough for the length of is-a paths to be used as a measure concepts. In the sequel, concepts that are expressed by an
of semantic relatedness. English word will be called elementary concepts. A concept
It is not necessary that a link and its inverse have the such as “the old red house” is represented as a combina-
same criteriality tag. Consider the is-a relation between tion of the elementary concepts “old,” “red,” and “house.”
“robin” and “bird.” Such a relation may be more impor- Roughly speaking, Quillian’s model prescribes that the old
tant for the concept robin than it is for bird. This asymme- red house be represented by a node that has conjunctive
try is considered as a fundamental property of semantic links to instances of the three concepts “old,” “red,” and
constructs in KL-ONE [27]. However, for the purposes of “ h o ~ s e , ”to~ say that “the old red house” is at the same
spreading activation, it does not really matter whether a time a house and a red object and an old object. Conjunc-
path is made of is-a links or inverse is-a links, or a mix of tive links can be thought of as links labeled “and” from
both; at the end, its contribution to positive evidence will the node “the old red house” to each one of the three
be the same. nodes “house,” “old,” and red.” “
Based on the above observations, we define the concep- Similarly, concepts that have several meanings, such as
tual distance between two concepts represented by nodes “plant,” are represented by a node that has disjunctive
in an is-a semantic net as follows. links to nodes, each of which is an instance of one of the
alternate meanings. To take an example used by Quillian
Definition 1: Let A and B be two concepts represented
[26], the word “plant” may mean 1) a physical plant, that
by the nodes a and b, respectively, in an is-a semantic net.
is a building used for manufacturing processes, 2) plant as
A measure of the conceptual distance between A and B is
a living organism, and 3 ) the verb “ t o plant.” We will refer
given by
to these three meanings by plant,, plant,, and plant,
Distance( A , B ) = minimum number of edges respectively. Accordingly, the concept plant is represented
separating a and b. by a node labeled “plant” that has three links labeled “or”
to plant,, plant, and plant,. This says that plant is plant,
Henceforth, we will use interchangeably in the expres-
or plant or plant ,.
sion of Distance concepts or nodes representing those
Now assume that the concept “plant” is compared to
concepts. Clearly, Distance satisfies 1) the zero property,
the concept “flower.” The two concepts are definitely
2) the positive property, and 3 ) the symmetry property.
similar, because in one interpretation (plant 2), “flower”
The triangular inequality is between on the fact that by
is-a “plant.” In general, a disjunctive compound concept
concatenating a shortest path between A and B to a
matches all concepts that match one of its alternate inter-
shortest path between B and C , we get a path between A
pretations. In terms of conceptual distance, the conceptual
and C whose length is bigger or equal to the minimum
distance between a disjunctive concept and another con-
path length between A and C. Thus we have the following
cept is the minimum conceptual distance between the
result.
disjunctive concept’s alternatives and that other concept.
Theorem 1: Distance is a metric. We shall later refer to this property as the disjunctive
minimum.
We later show that Distance, as defined above, well
For the case of a conjunctive concept, it is important
simulates people’s assessments of conceptual distance.
that all the elementary concepts be considered. For exam-
ple, the old red car” is not close to “ the old red house,”
“
C. Distance Between Sets of Nodes
although both concepts share the elementary concepts old
Traditionally, documents and queries are represented by and red. Conversely, “the old pink mansion” is conceptu-
a combination of concepts from a predetermined set of ally close to the “old red house” because “old” equals
concepts, called the indexing vocabulary. When the index- “old,” “pink” is close to “red,” and “mansion” is close to
ing vocabulary is a hierarchical semantic net, documents, “ house.”
and queries are represented by sets of nodes from the 2) Documents and Queries: In information retrieval sys-
hierarchy. To extend Distance to handle sets of tems, documents (articles, books, records, etc.) are often
nodes, we relate concepts as sets of nodes to semantic characterized by a set of index terms chosen from a
constructs in Quillian’s semantic net model [26], and then hierarchical semantic net. When maximum specificity is
we use spreading activation to guide the design of the sought in the indexing procedure, the index terms repre-
Distance algorithm. First, we study a special kind of senting a document often represent significantly distinct
concept in Quillian’s model, call it a compound concept, concepts. In this case, removing an index term would
and study how spreading activation would operate on such
concepts. Then we map documents and queries to com-
‘This description is not exact, as it does not handle all the subtleties
pound concepts and translate spreading activation proper- involved. However. it will suffice for our purposes. and it does not violate
ties into desirable properties for Distance. Finally, we give any of the basic assumptions of the model.
RADA et al.: DEVELOPMENT AND APPLICATION OF A METRIC ON SEMANTIC NETS LI
adversely affect the precision of indexing. The concept concept Y , Distance need only be applied to pairs of
reflected by a document is best described by ANDing the corresponding elementary concepts. (In fact, we later see
concepts represented by its index terms. As such, docu- an experiment where such an approach provided a good
ments are similar to Quillian’s conjunctive concepts. measure of conceptual distance.) However, our experience
For queries, the way index terms are combined is ex- with documents and queries shows that such a mapping
plicit. For instance, in many operational information re- is not readily obtainable. Accordingly, we define
trieval systems natural language queries are coded into Distance between conjunctive concepts as
Boolean queries by a trained librarian. A Boolean query is
Distance( X , A . . . A X,, Y, A . . . A Y,)
a parenthesized logical expression composed of index terms
(atoms) and the logical operators, V , A , and ., A query 1 k rn
= Distance(X,,Y,) (2)
consisting of a single term retrieves the documents that
have that term in their index. When many terms are used,
km r = l ,=,
the operators stand for the corresponding set operations where the X , and are elementary concepts. Notice that
between the sets of documents that would be retrieved by we choose to divide the double sum by the product km.
the corresponding single term queries [22]. T h s normalization has been used to reduce the bias of
Using the Quine-McCluskey algorithm [28], a query (or number of elementary concepts; without it, concepts with
any Boolean expression for that matter) can be converted more elementary concepts tend to be further apart. It is
into minimal disjunctive normal form. As such, a query also consistent with some of the processing assumptions of
can be seen as a disjunction of conjunctive compound spreading activation [ll].Roughly speaking, when a node
concepts, except for the fact that conjunctions may contain A is activated and B is adjacent to A , B is subsequently
NoTed terms. Negated terms are difficult to interpret in the activated by an “amount” inversely related to the number
context of the semantic net representation. If X is a node of nodes adjacent to A , and proportional to the strength of
in a semantic net, what is ,X? One way to address the link between A and B.4 Thus the more elementary
negations of concepts in semantic nets is through excep- concepts a compound concept has, the less (relatively) a
tions. Using Touretzky’s [29] formalism, if X is-not-a Y path through an elementary concept will account for simi-
and Z is-a X, then Z is-not-a Y; thus Y, includes larity.
anything that is under X . We regard Y, as the set of Finally, for some of our applications we need to de-
nodes that are farthest in the semantic net from Y. Then, fine Distance between a concept and the “null”
the conceptual distance between X and Y, is the concep- concept-an empty set of conjunctive concepts. In our
tual distance between X and that set. Admittedly, there efforts to evaluate semantic nets we have also developed
may be contexts in whch the interpretation of negation is an algorithm called Indexer for automatic indexing of
inappropriate. document titles into terms of a semantic net [31]. The
3) Distance on Sets of Nodes: In this section, we use the performance of Indexer is compared to that of expert
behavior of spreading activation on compound concepts to human indexers by checking the distance between the
guide the extension of Distance to handle sets of nodes. human-produced and the computer-produced sets of index
The disjunctive minimum rule translates into the identity: terms. Our automatic indexer would at times fail to pro-
Distance (C, V . . . V C,, C ) duce any terms to index a document. However, we had not
defined Distance over the empty set initially. Attempts to
- min Distance (C,,C ) (1) analyze the experimental data in which these documents
-
r = l ... k
1 1
Notice that when V, = {ul} and V,= { U,} (V, and 1) the information retrieval context in our experiments,
V, are singletons), Distance ( V,, V,) reduces to 2) the reliability of human observers,
Distance(u,,u2), as in Definition 1. Using (1) above, 3) the results of applying Distance to pair of nodes, and
Distance readily generalizes to arbitrary compound con- 4) the results of using Distance in document retrieval
cepts (any combinations of AND’S and OR’S), provided that experiments.
these concepts are expressed in disjunctive minimal form.
A . Experimental Data
Notice that (2) does not yield zero for identical com-
pound concepts, and the zero property has to be imposed The National Library of Medicine maintains one of the
in Definition 2. This is another problem related to the fact world’s largest bibliographc retrieval systems, called Med-
that Distance is computed indiscriminantly between all line [33]. Medline contains bibliographic information for
pairs of concepts. What is needed is a reference value of over five million articles from over 3000 biomedical peri-
Distance that attests to the minimum conceptual distance odicals. In addition to the usual bibliographic information
between concepts, that is the conceptual distance between (such as author, title, journal, and date of publication),
identical concepts. We choose the value zero because it is each article is also represented by a set of terms from a
the smallest value attainable by Distance, and because it semantic net called Mesh (see Fig. 1). Over 2000 queries
happens to correspond to a mathematical property of are addressed to Medline each day from sites around the
metrics (zero property). However, we later see that a zero world. These queries are often encoded as Boolean expres-
value (or any other fixed value for that matter) leads to sions over Mesh terms.
some undesirable behavior of Distance (see Section IV-C). Mesh is a hierarchical semantic net of over 15000 terms
The computation of Distance(U, 0)according to the [34]. The 15000 terms are placed into a nine-level hierar-
above definition would be prohibitively costly. Were we to chy that includes high-level nodes such as “anatomy,”
generate all the subsets of V (21‘1 of them), the time to “organism,” and “disease” (see Fig. 2). The hierarchy is
compute Distance( U, @) would be exponential. Theorem 2 based on broader-than’’ relationships, where the broader
“
allows us to compute Distance(U,@) in less than the terms are higher in the tree. The broader-than relationship
O( n 3 ) required for the common “all-pairs shortest path” is very similar to the is-a relationship [35], but also in-
~
from term,), it suggested that were there to be a difference, of term-term distances in herarchical semantic nets, and
it would probably be a small one. that with a good hierarchcal semantic net the rankings
The 1986 edition of Mesh had an inadequate coverage of determined by Distance roughly correspond to those which
information science related topics. Accordingly, we initi- people perceive. We have done a host of other experiments
ated a study of how to make the information science part with term-term distances in hierarchical semantic nets and
of Mesh better [38]. One resource was the Association of with variants on Distance and always concluded the same
Computing Machinery’s hierarchical semantic net for com- thng, namely, that this approach to validating semantic
puter scence, called the computing reviews classification net merging strategies has merit.
structure (CRCS) [39]. The information science section of
Mesh had about 200 terms, whle CRCS had about 1000
terms in a four-level hierarchy.
D. Distance Applied to Documents and Queries
Our merging algorithm first determined the similarities
between Mesh and CRCS and then exploited the differ- Given a document D characterized by a set of index
ences. In the simple case where a term t , existed in CRCS terms D = { t,,,, t,,,; . ., to,,} and a query Q coded into
and not in Mesh but t , had a parent t , in CRCS which an ANDed set of index terms Q = { tQ,,,tQ,,; . ., we
equaled a term t , in Mesh, we added t , to Mesh as a child hypothesized that the distance between D and Q gives a
of t,. This algorithm had several other capabilities so that measure of the conceptual distance of the document to the
terms in CRCS could also accurately become parents of query. In the experiments reported below, we computed
terms in Mesh. To test whether the merger had created a Distance between a query and a number of documents. We
better semantic net, a sequence of experiments were per- then ranked the documents with Distance, assuming that
formed in which human evaluations of distances between the greater the distance between the query and a docu-
terms were compared to those of Distance on Mesh versus ment, the less relevant the document was to the query.
Distance on Mesh + CRCS. Similarly, we asked people to rank a set of documents with
Twelve pairs of terms that were both in Mesh and respect to a given query and then compared their ranks to
+
Mesh CRCS were given to ten computer science students those produced by Distance.
at George Washngton University. The students were asked For ten different queries and six articles the averages of
to assign a number between one and five to each pair of two physicians’ evaluations were compared to those pro-
terms to indicate what they thought was the conceptual duced by Distance on Mesh. The agreement between the
distance between the components of the pair. The 12 pairs computer and the people was significant at the 0.05 level.
of terms were then ranked in increasing order of their To show that t h s ranking by the computer depended on
distance. Similarly, shortest path lengths in Mesh and more than the exact matches among terms of the query
Mesh + CRCS were computed for each pair of terms, and and document, the experiments were repeated but now
ranks were computed from these distances for the two with path lengths constrained. If only exact matches be-
semantic nets. tween terms in the query and document descriptions were
The average Spearman’s correlation coefficient of ten used, then there was a negative correlation between the
students shows that their rankings significantly agree at people’s and comuter’s ranlungs, proving that Distance
the 0.01 level of confidence ( (Y = 0.01). Now comparing the was sensitive to the structure of Mesh.
average of the students’ rankings against Mesh and against Two scientists compared each of 52 documents against
+
Mesh CRCS, we get the query “lipids and encephalitogenic basic proteins.”
The 52 documents were retrieved from Medline by a
pavg. Stud, Mesh = search with the term lipids in it. Each document was
represented by all the Mesh terms stored in Medline for
Pavg. Stud, (Mesh+CRCS) = 0‘52’ that document (typically, ten terms per document). The
ranking of each scientist and the ranlung of Distance was
As a descriptive statistic we can accept that these two statistically significantly correlated. The correlation be-
correlation coefficients are significantly different. The aug- tween the rankings of the two scientists was also signifi-
mentations provide a better correlation between people cant at the 0.05 level. The same methodology was applied
and the semantic net. We have done a similar experiment to the queries “suicide and substance dependence,” “liver
with four physicians who ranked the terms; there again the diseases and peritoneoscopy,” “shock and endorphins,”
results clearly show the augmented semantic net as more and “ biocompatible materials and dental implementation,”
accurately representing the cognitive distances that people and the same results attained. That is, the human judges
hold true. agreed with each other and with Distance in the ranking of
In our own subjective evaluations the merged Mesh + documents to query. These and other experiments support
CRCS was better than Mesh alone. Distance has helped us the claim that Distance on Mesh sets a baseline for perfor-
systematically document the difference in functionality mance that is not disconnected from the decisions of
between Mesh+CRCS and Mesh alone. In general, we people regarding the conceptual similarity between sets of
have found that Distance is a useful tool for the evaluation terms.
RADA et ul.: DEVELOPMENT A N D APPLICATION OF A METRIC O N SEMANTIC NETS 25
NA: Keratoconus
AT. Cornea. conical.
ET. H e r e d w : assoclatcd with Down syndrome, aiopc dermauus, Marfan syndrome. rcun111s pig-
(.’
i--
...................
Procedure .)
.......
<‘. .............
Disease )
menlosa. anmdia, vcrnal catarrh. Alpen syndrome. Ehlers-Danlos spdrome
SX: Blurred vision uncorrected by glarses.
SG: More frequent in females: onset at pukny: myopia: asugmausm: possibly more advanced i n nne
eye, eventually bdaleral.
LB: Ophlhalmoscapy- progressive b u l s n g of cornea: apex of cone usuaUy shghlly below
....... ,
center of
...... ..... ......... Cornea. corneal promsian recognned by viewmg eye Gom side: momeumes pulsation of corneal
conus synchronous with artend pulse lncrcased mtraocular lension: clefls xn Descemel mem-
brane. Reunoscopy: distonion of light reflex: distortion of appearance of nerve head, vessels 01
fundus. Kerawscopy. dlstoruon of corneal light reflex
CR: Prognosis: asugmaurm progressing for y-, then becoming slauonary. passible corneal perfam-
UO”.
PA’ G p c i t y at apex of cone: line of gmy. yellow, or olivc-greenpigment farming incomplete ring
< .....
.......
1 ...........
....Prosthesis ...)
......... ..,,’
1
...... .............
collagen
.......... .........
Fig. 4. Example of CMIT disease description. Disease is keratoconus.
Fields in CMIT mean: NA is name, AT is alternate terms, ET is
etiology, SX is symptoms, SG is signs, LB is laboratory findings. CR is
course, PA is pathology.
I I
document made clear that they were “about” such rela-
tionships, then this enhanced Distance again correlated
.......J ....~..
.....
(’:-
..........I...... ......
Rheumatoid
Arthritis -)
well with the ranking decisions of people. Each nonhierar-
chlcal relation, such as “cause” or “treat,” had to be
handled in a distinct way [40]. The ramifications of such
enhancements to metric aspects of Distance have only
Juvenile
Rheumatoid
......... ......
.:> been partially explored.
. ._ I
I..
Ten eye diseases that had the same names in CMIT and (.. .. h 3
......... .’
.)
Mesh were first used. Distance was applied to all pairs of
the disease names and then to all pairs of descriptions of ....1..... .... 1
..............
as ‘1
Examination of the Mesh + CMIT descriptions and the Fig. 5. Sample graph G, used to illustrate discontinuity in Distance
path lengths between terms suggests that too many dis-
tances were being calculated that were not meaningful. C. Distance Near Zero
Each attribute of a disease should be treated separately.
Distance fails in some ways to capture the intuitive
For instance, the etiology feature for a disease is meaning-
notions of what it means for one concept to be close to
fully compared to the etiology feature of other diseases but
another. Consider the query (knee prothesis AND rheuma-
not to the laboratory findings feature. Taking advantage of
the breakdowns in CMIT requires treating the different toid arthritis). According to Distance on Mesh a document
indexed (joint prosthesis AND rheumatoid arthritis) is as
attributes of a disease differently. The distance between
the etiology and laboratory findings features is less impor- close to the query as a document indexed under disease6
tant than the distance between two etiologies or the dis- (see Fig. 3). Distance would also say that a document
indexed under disease is closer to the query than a docu-
tance between two laboratory findings. This is similar to
our earlier example about “the old red house” and “the mented indexed (juvenile rheumatoid arthritis AND knee
old pink mansion,” where Distance should be applied prothesis)!
Consider the case of graph G, with a root and two linear
selectively, i.e., as a feature comparison process, rather
than applying it to all pairs of elementary concepts. Unlike branches of five nodes each (see Fig. 5),
the case for documents where a mapping of index terms
along semantically distinct dimensions does not exist (and
for whch Distance performed well), a mapping not only
exists in this case but may well be essential to the success gets larger as i goes from one to four, but when i = 5, there
of Distance, i.e., a breakdown or decomposition of the is a drastic drop in the value of Distance to zero. G, can be
+
Mesh CMIT descriptions into their natural parts, like generalized to G, in which ak and a p k are k edges from
etiologies and laboratory findings, might allow for dis- a,, and the leaves are a, and a-,. The pathology true for
tances that more closely related to the distances among G, is also true for G,. The Distance between {a,, a-,, a,}
Mesh disease names. and {a,, a P i ,a , } steadily grows as i grows but drops to
The etiology components of the diseases were next iso- zero when i reaches n .
lated. The hypothesis was that on the etiologies the dis- The problem with the above cases is not so much the
eases would have distances from one another that more fact that there is a discontinuity near zero-after all, we
closely corresponded to those distances that existed be- imposed the zero property by definition. More damaging is
tween the names alone. The same steps as used for assess- that Distance values may increase as the conceptual dis-
ing the correlation between Mesh disease names and Mesh tance seems to decrease. The basic problem seems to be
+ CMIT descriptions were now used to assess the correla- that Distance between sets of nodes treats equally all pairs
tion between the rankings on etiologies and the rankings of nodes. This becomes an issue for extreme cases as the
on disease names. The degree of correlation was signifi- ones presented above. This situation is not unrelated to the
cant. Incidentally, these experiments proved that some of
the Mesh diseases are hierarchically organized according to 6Let the reader be assured that Medline indexing manuals do not allow
etiologies. a document to be simply indexed under “Disease.”
RADA et ul. : DEVELOPMENT AND APPLICATION OF A METRIC ON SEMANTIC NETS 21
problem we had originally had with the featural model. We close, but not identical, to those that humans would, and a
have considered and found weaknesses in several metric way to measure this closeness is important.
alternatives to Distance, such as the distance between In looking at the distance between sets of nodes, we
centroids for each set of nodes, and several nonmetric have had to deal with different kinds of relationships. It
alternatives, such as the path length between the closest seems that the broader-than and narrower-than relations
nodes in two sets. can be treated in basically the same way, but other rela-
tionships, like cause, merit different handling. For queries
and documents we argue that nonhierarchical relationships
should only be traversed when both query and document
V. CONCLUSION
specify that they are about that relationship. The integrity
Our research group has been for the past two years of the node-to-node path lengths in hierarchical semantic
developing methods of merging semantic nets, such as nets are the key to the success or failure of Distance as a
Mesh and CMIT [l]. As is typical for such machine measure of conceptual distance. In our evaluations, these
learning experiments, issues of representation and reason- path lengths have shown themselves to be cognitively
ing are as critical as those of learning. Our hypothesis is meaningful. Some have argued that spreading activation
that better semantic nets result from the mergers, but to does not spread across more than one link [42], [43], but
evaluate “betterness” we need a way of reasoning with a they ignore link labels, whle we have shown the impor-
semantic net. This leads to the development of a measure tance of distinguishing hierarchcal from nonhierarchical
of conceptual distance between sets of nodes in a semantic links.
net. By transforming documents and queries into sets of Distance might be implemented in an information re-
nodes, we can do information retrieval experiments that trieval system which is based on the indexing of docu-
test the value of our semantic net. ments and queries into terms from a semantic net (see Fig.
In our search for an evaluation tool, we realize that 1). Often in such systems a query retrieves more docu-
certain cognitively meaningful and mathematically conve- ments than the user wants, and the documents are arbitrar-
nient properties are desirable. Although the relationships ily ordered as they appear on the computer screen. A
in our semantic nets are directed, e.g., broader-than and measure like Distance might be applied to help rank the
narrower-than7 our early experiments suggest that for the documents to the query and allow the querist to pay most
purposes of conceptual distance, these relationships can be attention to those documents which are most like to be
treated as undirected. Accordingly, the measure of distance conceptually close to the query.
over sets of nodes has the property of symmetry. Further- There are a host of specific questions about the cognitive
more, we are interested in knowing under what conditions realism of Distance that we have not adddressed. For
one document is far from or close to another document. instance, to the extent that the semantic net is a tangled
For this purpose a property like the triangle inequality is hierarchy and has levels, should terms at different levels be
useful. The work in memory-based reasoning [24] is one treated differently [17]? Relative to Distance, “ part-of’
example where metric properties are important. Our mea- links seem to obey the same properties as is-a links, but
sure of conceptual distance, called Distance, satisfies the how would a metric on causal links look? We do not claim
properties of a metric. On the surface it is surprisingly that the brain is making Distance-like calculations in the
simple-just the average of the path lengths between pairs course of determining cognitive similarity. Nor do we
of nodes. However, it has proven remarkably powerful and argue that the tangled hierarchy and Distance are adequate
flexible. for other cognitive tasks [44]. Cognition probably does not
In our efforts to evaluate semantic nets, we have also rely on measurements that satisfy the properties of a
developed an algorithm called Indexer for automatic in- metric. A metric has, however, many attractive features
dexing of document titles into Mesh [31]. In one set of because of its mathematical and semantic tractability. We
experiments we added thousands of synonyms to the main claim that the better the semantic net on which Distance
terms of Mesh and tested the main-terms-plus-synonyms operates, the more the conceptual similarity decisions of
semantic net with Indexer. We compared the performance Distance match the conceptual similarity decisions of peo-
of Indexer against human indexers by counting the num- ple. We have been surprised at how powerful a simple
ber of hits and misses. To our surprise the synonyms did algorithm like Distance can be in evaluating hierarchical
not increase the hits any more than they increased the semantic nets.
misses. Then we refined the measure of performance by
applying Distance. When Distance measured the distance
between the human and machine results, the synonyms
proved to be helpful. In other words, the synonyms usually ACKNOWLEDGMENT
led Indexer to be closer to the human indexing. The
measure of absolute hits and misses was unrealistically Donald Bamber of the Navy Personnel Research and
demanding of Indexer. In artificial intelligence experi- Development Center provided the important examples of
ments it might be expected that the machine gives answers the weaknesses of Distance as two sets approach one
2x IEEE TRANSACTIONS O N SYSTEMS, MAN, A N D CYBERNETICS, VOL. 19, NO. 1, JANUARY/FEBRIJARY 1989
another in the graph. Referees for t h s TRANSACTIONS b) Let us prove that it is always the case. Assume that
provided guidance on substantial revisions of this paper.
Distance ( { U } , U ) # Distance( U,, - { U } , U ) ,
APPENDIX then, according to Lemma 1,
Theorem 2: Let U be a nonempty subset of V , then Distance ( U,, , U ) min {Distance ( { U } , U ) ,
Distance ( U , +) = max Distance ( { U } , U ) . Distance (U,, - { U } ,U ) }
UEV
Distance( V, U V,, V,) = Distance( V I ,V,). (4) Going back to the proof of Theorem 2, let U,, be an
element of V such that
In general,
Distance ( { U,, } , U ) = max Distance ( { U } , U )
Distance(VlUV2,&) > min {Distance(V,,V,)} (sa) ('E v
I =1,2
By definition of Distance(U, +), we have
Distance( VIU V,, V,) G max {Distance( V,, V , ) } . (5b)
1 =1,2
Distance ( U , +) > Distance ( { U,, } , U )
Proof of Lemma 1: Let v1= {U:; . -,u t } , V, =
or, for some set U,,,
{ u \ , . . . , u ; } , and V,= { u : , . . . , u ; 2 } ,
Distance ( U , U,,) 2 Distance ( { U,, } ,U ) .
Distance ( VIU V2,V,)
Let us assume that
Distance ( U , U,,) > Distance ( { U,, } ,U ) .
According to Lemma 2, for all U E U,,, we have
Distance ( { U } , U ) = Distance ( U,, , U )
= Distance ( U ,+).
= Distance ( U , 4). Distance ( V , ,V,) G Distance (VI, V,) + Distance ( V,, V,) .
a) Let V, = { U : ; . . , U : } , V, = { U ; ; . . , U $ } , and V3= c) When Lf3 is empty, we write Distance(V,, +) as Dis-
{ U:; . ., U ? } be three nonempty subsets of V. We have tance( VI, V,,,) and reduce this case to the previous one
l k q
for which the triangle inequality was proven to hold.
Distance( V,, V,) = - 1 d ( U;, ui)
kq 1-1 J’1 REFERENCES
kqm
1 1 p1
-1
d(ui,Up)
J=l
pp. 52-60, Jan./Feb. 1987.
G. Loberg, G. M. Powell, A. Orefice, and J. D. Roberts, “Repre-
senting operational planning knowledge,” I E E E Truiis. S.v.st. Mon.
Adding (6) and (7) yields Cvherri.. vol. SMC-16, pp. 774-787, Nov./Dec. 1986.
S . Miyamoto, K. Oi, 0. Abe, A. Katsuya, and K. Nakayama.
Distance ( V I ,V,) + Distance ( V2,V 3 ) “Directed graph representations of association structures: A sys-
tematic approach,” I E E E Truns. Svst. Mun Cvhrrn.. vol. SMC-16.
no. 1. pp. 53-61, 1986.
E. Reingold, J. Nievergelt, and N. Deo, Comhir~utorrulA1gorithr1i.s.
Englewood Cliffs, NJ: Prcntice-Hall, 1977.
S. Y. Lu, “A tree-matching algorithm based on node splitting and
Because d satisfies the triangle inequality, we have merging,” I E E E Truns. Puttern Anul. Muchine Intell.. vol. PAMI-6.
pp. 249-256, Mar. 1984.
d ( U;, U;) + d ( up) > d ( up).
U;, U;, M. A. Eshera and K. S. Fu, “A graph distance measure for image
analysis,” I E E E Truns. Sy:~t. Mun Crhern., vol. SMC-14. pp.
From (8) and (9) it follows that 398-407, May/June 1984.
A. Tversky. “Features of similarity,” Psych. Ret,., vol. 84. pp.
Distance ( V,, V,) + Distance (V, ,V,) 327-352. 1977.
A. M. Collins and E. F. Loftus. “A spreading activation theory of
semantic processing,” Ps-vch. Rei,., vol. 82, pp. 407-428, 1975.
R. Fikes and T. Kehler, “The role of frame-based representation in
reasoning,” Commuii. Assoc. Comput. Much., vol. 28. no. 9. pp.
904-920, Sept. 1985.
Because the summand does not depend on the index j , the C. Hoede, “Similarity in knowledge graphs.” Dep. Appl. Math.,
summation over j is equivalent to multiplying the sum- Twente Univ. of Technology, 7500 AE Enschedc, The Netherlands,
Memor. 550, Jan. 1986.
mand by q. Implementing this change and eliminating q R. Prieto-Diaz and P. Freeman, “Classifying software for reusabil-
yields: ity,” I E E E Softwure, vol. 4, pp. 6-16. Jan. 1987.
D. Rumelhart and D. Norman, Rrpresenrution in M e m o r ~ , .
Distance (V,, V,) + Distance ( V,, V , ) La Jolla. CA: Center for Human Information Processing, Junc
I k m 1983.
R. Brachman. “What IS-A is and isn’t: An analysis of taxonomic
links in semantic networks,” Coniputrr. vol. 16, no. 10. pp. 30-36.
1983.
B. Adelson, “Comparing natural and abstract categories: A casc
2 Distance (V,, V,). (12) study from computer science,” Cogii. Sci., vol. 9, no. 4. pp. 417-430.
1985.
This establishes the triangle inequality in the case where D. Nau and T. C. Chang, “Problem solving knowledge in a
the three sets are nonempty. frame-based process planning system.” Inrer. J . I n t e l l . Syst. vol. 1.
b) When V, is empty, then there exists at least one no. 1, pp. 29-44, Spring 1986.
B. Buchanan and L. M. Fu. “Learning immediate concepts in
subset of V , V, such that constructing a hiearchical know,ledge based,” in Proc. 9th I n t . Join1
Conf. Artrfrcrul Intell., 1985. pp. 659-666.
Distance(V,,+) = Distance(V,,V,,,).
A. S. Pollitt, “ A rule-based system as an intermediary for searching
In the previous steps we proved that cancer therapy literature on MEDLINE.” in Intelligent Informution
Systems: Progress und Prospects, R. Davis, Ed. London: Hor-
wood, 1986, pp. 82-126.
Distance( V , , V,) < Distance( V,, V,,) P. Shoval, “Principles, procedures and rules in an expert system for
+ Distance ( V,, ,V,) . information retrieval.” Inform. Processing Management. vol. 21, no.
6, pp. 475-487, 1985.
G. Salton and M. McGill, Introduction to Modern Informution
From the definition of Distance(+, V,), we know that Retrieoul. New York: McGraw-Hill, 1983.
E. Fox, “Extending the Boolean and vector space models of infor-
Distance (V,,, V,) < Distance (+, V,). mation retrieval with P-norm queries and multiple concept types.”
Ph.D. dissertation, Dep. Comput. Sci., Cornell Univ., Ithaca, NY.
Therefore, 1983.
C. Stanfill and D. Waltz, “Toward memory-based reasoning.”
Commun. Assoc. Comut. Mach.. vol. 29, no. 12, pp. 1213-1228,
Distance(Vl, V,) < Distance(V,,@,)+Distance(+,V,). 1986.
A. Ortonq. “Beyond literal similarity.” Py.c/i R ~ T. vol. X6. pp. Ro! Rada rccci\cd the D A degree i n ps\cht>log\
161-1XO. 1979. frow Yale Uni\crsit!. Ne\\ H a l m . CT. t h e If r>
M. R. Quillian. “Semantic memory,” in Seniutirrc, Iriforni. Proc,e.s.\- degree from Ba?lor College of Xledicinc. Hoti\-
rng. M. Minsky. Ed. Cambridge. MA: MIT Pres\. 1968. ton. TX. the M S degree in coiiip~iter \cicncc
R. J. Hrachman and J. G. Schmolze. “An oveniew of the KL-ONE from the Uni\ersit\ of H o u t o n . Houston. TX.
knowledge representation system.” Cogti. Scr.. vol. 9,pp. 171-216. and the Ph.D. degree i n computer sciencc frorii
19x6. the Uni\ersit\ of Illinois at Urbana
E. J. McCluskey, “Minimization of Boolean functions.” Bell $wr. He \\as an Assistant l’rofe\sor of C’omputcr
Tech. J . . vol. 35. no. 6, pp. 1417-1444, 1956. Science at U’a!ne State Universit! from I Y X I t o
1291 D. Touretzky. “The mathematics of inheritance $)stems.” Ph.D. 1984. He \\orked from 19x5 t o 19SX as Editor of
dissertation, Dep. Comput. Sei., Carnegie-Mellon Uni\ ., Pitts- ltide\. Vedic II.\ at the Satinnal Libran of
burgh, PA, Ma! 1984. Medicine in Hethe&. MD and currentlb holds a Chair i n Computer
[30] E. E. Smith. E. J. Shoben, and L. J. Rips, “Comparison processes in Science at the Uni\crritb of Li\erponl. Hi\ research interests focus on
semantic memory,” P,~ycli.Rev., pp. 214-241. 1974. intelligent infcormation s!btems.
[311 R. Rada. L. Darden. and J. Eng. “Relating t\vo knowledge bases:
The role of identity and part-whole,” in The Role of Lutiguu~eit1
Probler?~Sol/,ing,vol. 2 , R. Jernigan. Ed. Amsterdam, The Nether- Hafedh Mili received the 13,s degree in inatlic-
lands: Elsecier, 1987, pp. 71-91. matics and phbsics from L>ccc Mi\tc dc Jcmmal.
A. Aho, J. Hopcroft. and J. Ullman. The De.~igtiurid Atiu(vsis of Tunisia. a Diploma in applied rnathcmalic\ from
(’onippturr if/,porI/hni.v Reading. MA: Addison-Wesley. 1974. Ecole Centrale de Paris. France. and t h e P11.I)
D. R . MMcCarn. “Medline: An introduction to on-line searching.” J . degree in computer xien-u from Gcoge Wash-
Auier. Soc.. Ii$orni. Sci., vol. 31. no. 3. pp. 1x1-192, May 1980. ington Uni\crsit>. b’ashington. DC.
J. Hackus. S. Davidson. and R. Rada, “Searching for patterns in the He ha\ been cmplo!cd 0 1 1 an NSF Kcwarch
Mesh vocabulary.” Bull. Med. 1-rh. Assoc.. vol. 75. no. 3. pp. Associateship and an 113M b-ello\\ ship. H i s main
221-227. July 1987. interests are i n kno\\ledgc reprc\cnt~ition and
[35] N. Libra? and Inform. Assoc. Council. (;urdelrnes f o r T/iesurciurri.~ intelligent information \?stem\.
Strut ture. Co?istruc’tron. urid Ute. New York: Amer. Nat. Stan-
dards Inst.. 1980.
J . P Schwartz. J. H. Kullback, and S. Shrier. “A frame\vork for Ellen Richiiell rccei\ed the H S. degree fr<ini Rice
taak cooperation within systems containing intelligent components.” Universit\. I I o u t o n . TX. and the Ph I>. degrec
Trcm. .y\.st. M a n Ciherti., vol. 16. no. 6. pp. 788-791, from Hro\\n Univcnit!. Pro\idence. RI. hoth i n
Nov.,’Dcc. 1986. chcrni\try
5. Sicgel. .Vo~ipuru~ietrrc~ Sturrstrc~.. New York: McGraw-Hill. She held Post-Doctoral Fellon h i p s i n Florida
1956. and Oregon and \\a\ an Associate I’rofcwor of
R. R a d a ct U / . , “ A vocabulary for medical informatics.” Coniput. Computer Science at Wa!ne State Uni\ersit! in
~ I I , I H I c , ~ / Rc-;.. vol. 20. pp 244-263. 1987. Detroit. MI. from 1977 to 19x6 From 19Xh t o
J . C:~mm.:t and A. Ralston. “The new (1982) computing reviews 19XX she \\as a Special F,upert at the National
cIa\\ifica;,on sqstcni-Final \eraion.” C’oninum. AMJC. Cor~iput. Lihrar! o f hledicine. She i \ interested i t i both
L f d . . vol 25, no. 1. pp. 13-25. Jan. 1982. computers and chemistr?
U. Radii. “Gradualness facilities knowledge refinement.” If
frcui.5 Potrern Ariui. .‘Lfat/irrie lti/e/l., vol. PAMI-7. no 5. pp.
523-530. Sept. 1985. Maria Rlettner recei\ed the Diplom and 1’11 D.
S L a t e r and R. Rada. “A method of medical knowledge base degrees i n \tatistics from the Unit ersit! of Dort-
augnientation.” Merhod~of lriforni. I.led , vol. 26, no. 1, pp. 31-39. mund. (;erman!
1987. She narked a\ a Hio5tatistician at the lnterna-
J. Holland. K. Holyoak. R. Nisbett, and P Thagard. I~idutrrori: tional Agenc? for Re\carch on Cancer in Lvon.
Prows., of 1iiferetic.e. L,eurrirri,p. urid Llr.sc.or~eri~. Cambridge. MA: France froin 19x3 t o 19x5 and thc N~itional
MIT Press. 19x6. Cancer Institute i n Iktheada. MD. from 19x5 t o
[431 A. M.13. de (;root. “The range of automatic spreading activation in 19XX Currentk. shc i \ working ;I\ a Lecturer in
word priming,” J . Verhul 1,eurrirti.y l‘erhui Rehor-ior. vol. 22. pp. the Dcpartrncnt of Statistic5 and C ~ ~ m p ~ i t ~ t i o n a l
417--436. 19x3. Mathematic\ ;it the Uni\ei-\it\ of Li\emool. Li\-
W . Walker and W. Kintsch, “Automatic and strategic aspects of crpool. England. Her m;iin interest is i n the
knmvledge retrieval.” Co,qii. S u . , \ol. 9. pp. 261-283. 19x5. application and dc\e~opmcnto f \tatistical method\ in biomcdic~nc