0% found this document useful (0 votes)

40 views14 pages

Development and Application of A Metric On Semantic Nets: S1S1tM3, 17, I, 1707 Li

This document proposes a new metric called Distance to measure the conceptual distance between sets of concepts in a semantic net. Distance is defined as the average minimum path length between all pairs of nodes in two subsets. It can quantify the conceptual distance between concepts when used on a hierarchical semantic net. Distance correlates well with human judgments of conceptual distance.

Uploaded by

phyoset thaw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views14 pages

Development and Application of A Metric On Semantic Nets: S1S1tM3, 17, I, 1707 Li

Uploaded by

phyoset thaw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

ltbk I K A N S A L I I U N S U N S1S1tM3, M A N , A l V L J C I U t . K l Y t . I I C S . V U L . 17, NU.

I, J A N U A K l / r L B K U A K I 1707 li

Development and Application

of a Metric on Semantic Nets
ROY RADA, HAFEDH MILI, ELLEN BICKNELL, AND MARIA BLETTNER

Abstract -Motivated by the properties of spreading activation and and to sets of concepts in a hierarchical knowledge base
conceptual distance, the authors propose a metric, called Distance, on the show the power of hierarchical relations in representing
power set of nodes in a semantic net. Distance is the average minimum
path length over all painvise combinations of nodes between two subsets of
information about the conceptual distance between con-
nodes. Distance can be successfully used to assess the conceptual distance cepts.
between sets of concepts when used on a semantic net of hierarchical In t h s paper a knowledge base will be viewed as a
relations. When other kinds of relationships, like “cause,” are used, graph. In many problems dealing with discrete objects and
Distance must be amended but then can again be effective. The judgments binary relations, a graphical representation of the objects
of Distance significantly correlate with the distance judgments that people
make and help us determine whether semantic net SI is better or worse
and the binary relations on them is a very convenient form
than semantic net S,. First a “conceptual distance” task is set, and people of representation [4], [5] that often leads to a solution using
are asked to perform it. Then the same task is performed by Distance on algorithms from graph theory. DependiRg on the nature of
SI and &. If Distance on SI performs more like people than Distance on the problem, the edges could represent physical links (as
S,, the conclusion is that SI is better than S,. Distance embedded in the for communication networks), time duration (as for task
methodology facilitates repeatable quantitative experiments.
planning), or abstract relationshps (as for association [6]
structures).
I. INTRODUCTION Many graph problems, such as the minimum cost span-
ning tree and traveling salesman problems, require mini-
0 UR RESEARCH group focuses on developing better
herarchical knowledge bases [l]and, in the course of
this work, requires a method for assessing the value of a
mizing the sum of weights of some edges given a set of
constraints [7]. Distances between entire graphs have been
investigated in the pattern recognition literature, where a
knowledge base. Our application area is the retrieval of graph represents a picture [8], [9]. Such distances are based
biomedical literature, and a natural problem which the on the number of transformations that would make one
knowledge base should help solve is the ranking of docu-
graph similar to another one. However, to the best of our
ments listed in response to a query. Our method for
knowledge, a distance as defined in this paper has not
ranking documents assumes that both are represented as
been explored in the graph theory literature or elsewhere.
sets of nodes in a hierarchical knowledge base. The method
Human information processing often involves compar-
has reliably helped us judge the success of our merging
ing concepts. There are various ways of assessing the
algorithms and might similarly help other people. The
similarity of concepts depending on the representation
method uses a metric called Distance that is easy to
adopted for knowledge. In featural representations, con-
manipulate mathematically and to interpret.
cepts are represented by sets of features. The similarity
Some examples of the psychological and information
between two concepts a and b with feature sets A and B
science significance of Distance are given elsewhere [2], [3].
can be expressed as a weighted sum of functions of differ-
This paper focuses on the mathematical characteristics of
ent set operations on A and B [lo]. The theory of spread-
Distance and presents new cases and interpretations. Ex-
ing activation applies to comparing concepts in a semantic
periments in which Distance is applied to pairs of concepts
net [ll].Another method consists of constructing network
fragments for sought for objects and then matching these
Manuscript received May 10, 1987; revised February 24, 1988 and fragments against the network data base [12]. In yet an-
August 29, 1988. This work was supported in part by the National
Science Foundation under Grant ECS-84-06683.
other strategy the in-degree and out-degree of two nodes in
R. Rada is with the Department of Computer Science, University of a semantic net are compared in the course of deciding how
Liverpool, Liverpool L69 3BX, England. similar the two nodes are [13]. Faceted thesauri are a kind
H. h4ili is with the Department of Electrical Engineering and Com-
puter Science, George Washington University, Washington, DC 20052.
of semantic net and are being used in indexing and retriev-
E. Bicknell is with the Mesh Section, National Laboratory of Medicine, ing software. To decide whether two nodes in such a
Bethesda, M D 20894. faceted thesaurus are similar, software scientists have de-
M. Blettner is with the Department of Statistics and Computational
Mathematics, University of Liverpool, Liverpool L69 3BX, England.
vised measures based on the number and type of overlap-
IEEE Log Number 8824462. ping facets that the two nodes have [14].

0018-9472/89/OlO0-0017$01 .OO 01989 IEEE

18 ILEF TRANSACTIONS ON SY51EMS. M A N . AND C Y B F R N t I I C S . LO1 19. NO 1. JANI’ARY/ltBRIJARY 19x9

Distance is based on a simplified version of spreading f ( x , y ) is a metric if the following properties are satisfied:
activation. One of the assumptions of the theory of spread-
ing activation is that the semantic network is organized 1) f ( x , x ) = 0, zero property,
along the lines of semantic similarity. The more properties 2) f(x, y ) = f( y, x), symmetric property,
two concepts share in common, the more links there are 3) f ( x , y) > 0, positive property, and
between the concepts and the more closely related they +
4) f ( x , y ) f( y , z ) > f ( x,z ) , triangular inequality.
are. In these terms, semantic relatedness is based on an
aggregate of the interconnections between the concepts. Second, we explore the extent to which the shortest path
This is different from semantic distance which is equal to length between two nodes can be used as a measure of
the minimal path length between two concepts. Links may conceptual distance, using spreading activation as a model.
be assigned criteriality tags to indicate the importance In particular, we show that under some conditions, the
(strength) of the link between the connected nodes [ll]. shortest path length between two nodes indicates the
Links between a concept and its defining features (versus conceptual distance between the nodes-for two nodes
characteristic features [15]) expectably have higher “criteri- Distance is simply the shortest path length between them.
alities.” Because “is-a” relations [16] are based on similar- Finally, we extend the definition of Distance to handle
ity between defining features, we hypothesize that when concepts represented by sets of nodes, rather than single
only is-a relations are used in semantic nets, semantic nodes. This extension is significant in the context of infor-
relatedness and semantic distance are equivalent (we could mation retrieval systems where a document or a query is
use the latter as a measure of the former). represented by more than one concept from a semantic
Distance is principally designed to work with hierarchi- net.
cal knowledge bases. Hierarchies, both of abstractions and
of concrete entities, are commonplace in the world and
A . Conceptual Distance is a Metric
important in intelligent behavior [17]. They can be useful
in controlling search [18] and in learning about the world Much human information processing involves concept
[19]. The types of hierarchies most explored in this work matching. Category membership and similarity are two
are the type embedded in thesauri as used in the informa- important aspects of concept matching. The more similar
tion retrieval field. These thesauri have long histories of two concepts are, the smaller the conceptual distance be-
being maintained for the indexing and retrieving of docu- tween them. Conceptual distance is a decreasing function
ments, and since they are now typically parts of computer of similarity. When concepts are represented by points in a
information systems, they lend themselves to experiments multidimensional space, conceptual distance can conve-
in which the computer uses the thesaurus to help searchers niently be measured by the geometric distance between the
[201, [211. points representing the concepts at hand and, as such,
In the next section, we will present the mathematical satisfies the properties of a metric.
properties of Distance. After that we will describe Some information retrieval systems use vector descrip-
1) what the mathematical properties mean in terms of tions for documents and queries [22]. Each dimension
knowledge engineering, and 2) several experiments using corresponds to an elementary concept known to the sys-
Distance on a semantic net with hierarchical relations. The tem. The coordinate of a vector along a dimension attests
major points will be that to the relative importance of the corresponding elementary
concept for the document (or query) at hand. In such
Distance is one tool to use in comparing one semantic
systems, conceptual distance is measured by the geometric
net against another,
distance between the corresponding concepts (231. In such
Distance is a metric on sets of nodes of a graph, and
systems, a query retrieves the documents whose vectors are
applications of Distance to nonhierarchical semantic
“closest” to the vector representing the query. Such a
nets reveal the need for amendments to
retrieval strategy is prohibitively costly for large document
Distance.
collections. Accordingly, researchers have explored the ex-
tent to which the search for close documents can be
11. METHODOLOGY
reduced by organizing documents into a hierarchy of
In this section, we discuss the methodology followed in classes, where the elements of each class are within a
the design of Distance. The design of Distance was guided prescribed conceptual distance from each other [22]. An
by two observations: incoming query is compared to “exemplar” documents,
one from each top-level class. If the conceptual distance is
1) the behavior of conceptual distance resembles that of
higher than a prescribed value (depending on the user’s
a metric, and
input and the “radius” of the class), it is concluded that no
2) the conceptual distance between two nodes is often
document from that class is close enough to the query, and
proportional to the number of edges separating the
the whole class is disregarded. For this reasoning to hold,
two nodes in the hierarchy.
it is essential that conceptual distance satisfy the proper-
In this section, we first discuss the extent to which concep- ties of a metric, especially the triangle inequality. In their
tual distance satisfies the properties of a metric. A function work on memory-based reasoning, Stanfill and Waltz have
RADA PI U [ . : D L V t L O P M t N T A N D APPLICATION V t A M I K I C U N b t M A N L I L N L I b I ,

advocated a similar strategy and stressed the importance The problem with this example is that the similarity be-
of metric properties [24]. tween Jamaica and Cuba is based on the geographical
Some cognitive scientists have challenged the applicabil- characteristics and ignores the political differences, and
ity of metric properties to conceptual similarity measures. conversely, for the Cuba-Russia similarity. If we account
On the symmetry property, Tversky noted that there are for both defining properties, Jamaica will not be very
instances where similarity appears to be asymmetric [lo]. similar to Cuba, nor will Cuba be very similar to Russia.
We believe that the asymmetry in those instances does not In summary, treating conceptual distance as a metric is
derive from the asymmetry of similarity (as a feature consistent with the practical view of concepts as points in
comparison process) but from the existence of another a multidimensional space. Exceptions to the symmetry and
asymmetric relationshp between two concepts. An in- triangle inequality seem to result from a broad, if not
stance of such a relationship is the instance-class relation- inconsistent, use of similarity. Our work shows the viabil-
ship. For example, if people say that robins are more ity of treating conceptual distance as a metric.
similar to birds than birds are similar to robins, we suspect
that the people are making a fuzzy category-membership
B. Shortest Path Lengths in is-a Hierarchies
decision, rather than a similarity assessment. In a related
argument, Ortony noted an extreme asymmetry in meta- In this section, we discuss the extent to which shortest
phorical statements [25]. Most students may consider the path lengths in is-a herarchies can be used to measure
statement, “Lectures are like sleeping pills,” to be more conceptual distance. In particular, we show that, in the
true than the statement, “Sleeping pills are like lectures.” context of Quillian’s model of semantic memory [26],
The asymmetry is due to the different roles played by the shortest path lengths are not sufficient to measure the
concepts in a metaphor. We build metaphors such as “ A is conceptual distance between concepts. However, when the
like B,” if the characteristic features of A match the paths are restricted to is-a links, the shortest path length
defining features of B [15]. Therefore, although “is like” does measure conceptual distance. We also discuss how
may sound like “is similar to,” it is implicit in metaphors well the metric properties are supported by spreading
that the feature comparison process is selective. The au- activation [ll].
thors believe that if similarity is limited to an (uncon- In Quillian’s model of semantic memory, concepts are
strained) feature comparison process, it is symmetric. represented by nodes and relationships by links. Links are
Tversky [lo] argued that the triangle inequality trans- labeled by the name of the relationship and are assigned
lates to what we will refer to as the “reverse triangle “criteriality tags” that attest to the importance of the link.
inequality.” Given three concepts A , B , and C, the reverse In computer implementations, criteriality tags are numeri-
triangle inequality says that the similarity of A to C is cal values that represent the degree of association of the
greater than the sum of the similarity of A to B and the two concepts (such as how often that link is traversed) and
similarity of B to C. Tversky showed that the reverse the nature of the association. The association is positive if
triangle inequality can be violated. We have two objections the existence of that link indicates some sort of similarity
to Tversky’s argument. First, it is not always the case that between the end nodes and negative otherwise. For exam-
the triangle inequality for conceptual distance translates ple, superordinate links (the term used for is-a) have a
into the reverse triangle inequality for similarity. If the positive association, while “is-not-a” links have a negative
similarity between two concepts x and y is given by association.
S( x, y ) = [l+ D ( x , y)]-’, where D ( x , y ) is a metric of Roughly speaking, spreading activation [ 111 prescribes
conceptual distance between x and y,’ then S(x, y ) does that to compare two concepts, the paths that separate the
not satisfy the reverse triangle inequality.2 Second, one of two nodes and that satisfy the constraints defined by the
the examples used by Tversky to illustrate the violation of semantics of the relations and the context are considered
the reverse triangle inequality also illustrates an inconsis- for evaluation. These paths are traced by propagating two
tent use of similarity: “activation tags” from the nodes corresponding to the
concepts, one tag originating from each node. When two
... although Jamaica is very similar to Cuba (due to its activation tags “meet” at one node, the paths from the
geographical characteristics) and Cuba is very similar to originating nodes to that node are concatenated to form a
Russia (politically), Jamaica is not at all similar to path between the originating nodes. For each path, posi-
Russia. . . . [15]. tive criteriality tags contribute to “positive evidence” (for
similarity), and negative criteriality tags contribute to
‘Prior to our work on Distance, we developed a similarity (Relevance)
“negative evidence.” When positive evidence exceeds 50me
measure based on the same formula, where D ( x , y ) was the shortest path predetermined threshold, the comparison is concluded suc-
length between x and y in an is-a hiearchy. Relevance simulated well cessfully. On the contrary, if negative evidence falls below
people’s assessments of similarity, but its results were not as easily
interpretable as Distance’s.
some negative threshold, it is concluded that the concepts
-Let U , h , and c be three concepts such that D ( u , h ) = D ( h . (,) = are not similar.
D ( u , c ) = d . While D(U , c) is less than D ( u , h ) + D(h. c)-i.e., the trian- In Quillian’s model superordinate (is-a) links are as-
gle inequality is true-it is not true that S ( u , c ) ( = [1+ d ] - ’ ) is bigger
than S ( u . h ) + S ( h , c ) ( = 2 X [ 1 + d]-’)-i.e.. the reverse triangle in-
signed high criteriality tags. If spreading activation is used
equality is not true. across is-a links only, short paths will significantly con-
20 IEEE TRANSACTIONS ON SYSTEMS, M A N , A N D CYBERNETICS, VOL. 19. NO. 1. JANUARY/FEBRUARY 1989

tribute to positive evidence of similarity, and the corre- a complete definition of Distance with a number of impor-
spondence between semantic distance (shortest path length) tant mathematical properties.
and semantic relatedness (conceptual distance) will be I ) Compound Concepts in Quillian’s Model: In Quillian’s
strong. We hypothesize that such correspondence is strong model. concepts can be combined to define more complex
enough for the length of is-a paths to be used as a measure concepts. In the sequel, concepts that are expressed by an
of semantic relatedness. English word will be called elementary concepts. A concept
It is not necessary that a link and its inverse have the such as “the old red house” is represented as a combina-
same criteriality tag. Consider the is-a relation between tion of the elementary concepts “old,” “red,” and “house.”
“robin” and “bird.” Such a relation may be more impor- Roughly speaking, Quillian’s model prescribes that the old
tant for the concept robin than it is for bird. This asymme- red house be represented by a node that has conjunctive
try is considered as a fundamental property of semantic links to instances of the three concepts “old,” “red,” and
constructs in KL-ONE [27]. However, for the purposes of “ h o ~ s e , ”to~ say that “the old red house” is at the same
spreading activation, it does not really matter whether a time a house and a red object and an old object. Conjunc-
path is made of is-a links or inverse is-a links, or a mix of tive links can be thought of as links labeled “and” from
both; at the end, its contribution to positive evidence will the node “the old red house” to each one of the three
be the same. nodes “house,” “old,” and red.” “

Based on the above observations, we define the concep- Similarly, concepts that have several meanings, such as
tual distance between two concepts represented by nodes “plant,” are represented by a node that has disjunctive
in an is-a semantic net as follows. links to nodes, each of which is an instance of one of the
alternate meanings. To take an example used by Quillian
Definition 1: Let A and B be two concepts represented
[26], the word “plant” may mean 1) a physical plant, that
by the nodes a and b, respectively, in an is-a semantic net.
is a building used for manufacturing processes, 2) plant as
A measure of the conceptual distance between A and B is
a living organism, and 3 ) the verb “ t o plant.” We will refer
given by
to these three meanings by plant,, plant,, and plant,
Distance( A , B ) = minimum number of edges respectively. Accordingly, the concept plant is represented
separating a and b. by a node labeled “plant” that has three links labeled “or”
to plant,, plant, and plant,. This says that plant is plant,
Henceforth, we will use interchangeably in the expres-
or plant or plant ,.
sion of Distance concepts or nodes representing those
Now assume that the concept “plant” is compared to
concepts. Clearly, Distance satisfies 1) the zero property,
the concept “flower.” The two concepts are definitely
2) the positive property, and 3 ) the symmetry property.
similar, because in one interpretation (plant 2), “flower”
The triangular inequality is between on the fact that by
is-a “plant.” In general, a disjunctive compound concept
concatenating a shortest path between A and B to a
matches all concepts that match one of its alternate inter-
shortest path between B and C , we get a path between A
pretations. In terms of conceptual distance, the conceptual
and C whose length is bigger or equal to the minimum
distance between a disjunctive concept and another con-
path length between A and C. Thus we have the following
cept is the minimum conceptual distance between the
result.
disjunctive concept’s alternatives and that other concept.
Theorem 1: Distance is a metric. We shall later refer to this property as the disjunctive
minimum.
We later show that Distance, as defined above, well
For the case of a conjunctive concept, it is important
simulates people’s assessments of conceptual distance.
that all the elementary concepts be considered. For exam-
ple, the old red car” is not close to “ the old red house,”
“
C. Distance Between Sets of Nodes
although both concepts share the elementary concepts old
Traditionally, documents and queries are represented by and red. Conversely, “the old pink mansion” is conceptu-
a combination of concepts from a predetermined set of ally close to the “old red house” because “old” equals
concepts, called the indexing vocabulary. When the index- “old,” “pink” is close to “red,” and “mansion” is close to
ing vocabulary is a hierarchical semantic net, documents, “ house.”
and queries are represented by sets of nodes from the 2) Documents and Queries: In information retrieval sys-
hierarchy. To extend Distance to handle sets of tems, documents (articles, books, records, etc.) are often
nodes, we relate concepts as sets of nodes to semantic characterized by a set of index terms chosen from a
constructs in Quillian’s semantic net model [26], and then hierarchical semantic net. When maximum specificity is
we use spreading activation to guide the design of the sought in the indexing procedure, the index terms repre-
Distance algorithm. First, we study a special kind of senting a document often represent significantly distinct
concept in Quillian’s model, call it a compound concept, concepts. In this case, removing an index term would
and study how spreading activation would operate on such
concepts. Then we map documents and queries to com-
‘This description is not exact, as it does not handle all the subtleties
pound concepts and translate spreading activation proper- involved. However. it will suffice for our purposes. and it does not violate
ties into desirable properties for Distance. Finally, we give any of the basic assumptions of the model.
RADA et al.: DEVELOPMENT AND APPLICATION OF A METRIC ON SEMANTIC NETS LI

adversely affect the precision of indexing. The concept concept Y , Distance need only be applied to pairs of
reflected by a document is best described by ANDing the corresponding elementary concepts. (In fact, we later see
concepts represented by its index terms. As such, docu- an experiment where such an approach provided a good
ments are similar to Quillian’s conjunctive concepts. measure of conceptual distance.) However, our experience
For queries, the way index terms are combined is ex- with documents and queries shows that such a mapping
plicit. For instance, in many operational information re- is not readily obtainable. Accordingly, we define
trieval systems natural language queries are coded into Distance between conjunctive concepts as
Boolean queries by a trained librarian. A Boolean query is
Distance( X , A . . . A X,, Y, A . . . A Y,)
a parenthesized logical expression composed of index terms
(atoms) and the logical operators, V , A , and ., A query 1 k rn
= Distance(X,,Y,) (2)
consisting of a single term retrieves the documents that
have that term in their index. When many terms are used,
km r = l ,=,
the operators stand for the corresponding set operations where the X , and are elementary concepts. Notice that
between the sets of documents that would be retrieved by we choose to divide the double sum by the product km.
the corresponding single term queries [22]. T h s normalization has been used to reduce the bias of
Using the Quine-McCluskey algorithm [28], a query (or number of elementary concepts; without it, concepts with
any Boolean expression for that matter) can be converted more elementary concepts tend to be further apart. It is
into minimal disjunctive normal form. As such, a query also consistent with some of the processing assumptions of
can be seen as a disjunction of conjunctive compound spreading activation [ll].Roughly speaking, when a node
concepts, except for the fact that conjunctions may contain A is activated and B is adjacent to A , B is subsequently
NoTed terms. Negated terms are difficult to interpret in the activated by an “amount” inversely related to the number
context of the semantic net representation. If X is a node of nodes adjacent to A , and proportional to the strength of
in a semantic net, what is ,X? One way to address the link between A and B.4 Thus the more elementary
negations of concepts in semantic nets is through excep- concepts a compound concept has, the less (relatively) a
tions. Using Touretzky’s [29] formalism, if X is-not-a Y path through an elementary concept will account for simi-
and Z is-a X, then Z is-not-a Y; thus Y, includes larity.
anything that is under X . We regard Y, as the set of Finally, for some of our applications we need to de-
nodes that are farthest in the semantic net from Y. Then, fine Distance between a concept and the “null”
the conceptual distance between X and Y, is the concep- concept-an empty set of conjunctive concepts. In our
tual distance between X and that set. Admittedly, there efforts to evaluate semantic nets we have also developed
may be contexts in whch the interpretation of negation is an algorithm called Indexer for automatic indexing of
inappropriate. document titles into terms of a semantic net [31]. The
3) Distance on Sets of Nodes: In this section, we use the performance of Indexer is compared to that of expert
behavior of spreading activation on compound concepts to human indexers by checking the distance between the
guide the extension of Distance to handle sets of nodes. human-produced and the computer-produced sets of index
The disjunctive minimum rule translates into the identity: terms. Our automatic indexer would at times fail to pro-
Distance (C, V . . . V C,, C ) duce any terms to index a document. However, we had not
defined Distance over the empty set initially. Attempts to
- min Distance (C,,C ) (1) analyze the experimental data in which these documents
-
r = l ... k
1 1

were treated as missing values were unsatisfactory. Accord-

where C,, for i =1; . ., k and C represent concepts (com- ingly, we decided to extend Distance to the case where one
pound or elementary). When C itself is a disjunctive of the concepts is the empty set. For indexing purposes,
concept, Distance(C,,C) above is in turn computed as a returning the empty set constitutes the worst possible
minimum over the component concepts of C. answer. For instance, a document that is not indexed
When conjunctive concepts are compared, we must take cannot be retrieved at all. Thus we define Distance be-
into account the conceptual distances among elementary tween a concept X and the “empty set” as the maximum
concepts. In the previous example of “the old red house” Distance( X, Y ) where Y is any conjunctive compound
and “the old pink mansion,” the two concepts are similar concept built over all subsets of the set of nodes in the
because pink is similar to red and mansion is similar to semantic net. This extension of Distance both well cap-
house: we compare values of comparable features, and we tures the importance of the empty set and, as will be seen
do not pay attention to the fact that “pink” is far from later, is simple to compute. The final interpretation of the
“house” and “mansion” is far from “red.” This corre- experiments were radically different when we were able to
sponds to a feature comparison process as advocated by handle the empty set versus when we were not able to
Tversky [lo] and Smith et al. [30]. It is not, however, clear handle the empty set. Based on the above considerations,
how spreading activation handles feature comparisons; the
method has been criticized for not handling features prop-
spreads like electric current in a network of resistors, where
erly [30]. If a clear mapping exists between the elementary the4Activation
smaller the resistance, the h g h e r the criteriality tag. This last rule is
concepts of concept X and the elementary concepts of equivalent to Kirschoff‘s law for nodes in electrical networks.
22 IEEE TRANSACTIONS ON SYSTEMS. MAN, A N D CYBERNETICS, V o l . . 19. NO. 1, JANUARY/FtHRUARY 1989

and on the definition of Distance between concepts repre- algorithm [32]:

sented by single nodes, we define Distance between con- Theorem 2: Let U be a nonempty subset of V , then
junctive concepts as follows.
Definition 2: Let V be the set of nodes of an is-a Distance ( U , +) = max Distance ( { U } , U )
l,€ v
semantic net, and let V,, V,, and V, be three subsets of V , The proof of this theorem is given in the Appendix.
each representing the compound concept consisting of a
conjunction of its elements. We have Theorem 3: Distance is a metric on the sets of concepts
(single and compound) defined on a semantic net.
Distance (V,, V,)
The proof of this theorem is given in the Appendix.
(0, if V,=V, The mathematical properties of Distance allow us to
answer certain questions straightforwardly. For instance,
Distance(u,u), under what conditions are two sets ( V , and V,) of nodes
IVlllV21 U € V , O € V , closer to or further from a third set V;! If V, = { V, U { U}),
if V, # V,, V, f cp, V, f cp then we can determine whether D( V,, V,) is greater than,
equal to, or less than D ( V l , V,) by determining whether
and D ( { U},V,) is greater than, equal to, or less than D(V,, V,),
respectively.
Distance (V,, V,)
111. EXPERIMENTAL RESULTS
max {Distance ( V,, U ) } ,
In this section, we study a series of experiments where
- we use Distance to measure the conceptual distance be-
= ;s{Distance(U, V , ) } , tween concepts. We compute Distance on an is-a hierarchy
between concepts represented by single nodes, and con-
\ if V, = cp and V, # cp cepts represented by sets of nodes. In either case, Distance
proves to be a valuable tool to
where Distance( U,U ) is the shortest path length between
1) simulate human assessments of conceptual distance
nodes U and U (as in definition 1). Let U - , be the set
and
{ w E V/IDistance(U ,w ) = max., Distance( U ,U)} ; 2) evaluate some cognitive aspects of our semantic nets.

then All of our experiments use human subjects as standard

references against whom Distance performance is mea-
Distance ( u , ~ u=) Distance( { U } ,U-’). (3) sured. The next section describes in order:

Notice that when V, = {ul} and V,= { U,} (V, and 1) the information retrieval context in our experiments,
V, are singletons), Distance ( V,, V,) reduces to 2) the reliability of human observers,
Distance(u,,u2), as in Definition 1. Using (1) above, 3) the results of applying Distance to pair of nodes, and
Distance readily generalizes to arbitrary compound con- 4) the results of using Distance in document retrieval
cepts (any combinations of AND’S and OR’S), provided that experiments.
these concepts are expressed in disjunctive minimal form.
A . Experimental Data
Notice that (2) does not yield zero for identical com-
pound concepts, and the zero property has to be imposed The National Library of Medicine maintains one of the
in Definition 2. This is another problem related to the fact world’s largest bibliographc retrieval systems, called Med-
that Distance is computed indiscriminantly between all line [33]. Medline contains bibliographic information for
pairs of concepts. What is needed is a reference value of over five million articles from over 3000 biomedical peri-
Distance that attests to the minimum conceptual distance odicals. In addition to the usual bibliographic information
between concepts, that is the conceptual distance between (such as author, title, journal, and date of publication),
identical concepts. We choose the value zero because it is each article is also represented by a set of terms from a
the smallest value attainable by Distance, and because it semantic net called Mesh (see Fig. 1). Over 2000 queries
happens to correspond to a mathematical property of are addressed to Medline each day from sites around the
metrics (zero property). However, we later see that a zero world. These queries are often encoded as Boolean expres-
value (or any other fixed value for that matter) leads to sions over Mesh terms.
some undesirable behavior of Distance (see Section IV-C). Mesh is a hierarchical semantic net of over 15000 terms
The computation of Distance(U, 0)according to the [34]. The 15000 terms are placed into a nine-level hierar-
above definition would be prohibitively costly. Were we to chy that includes high-level nodes such as “anatomy,”
generate all the subsets of V (21‘1 of them), the time to “organism,” and “disease” (see Fig. 2). The hierarchy is
compute Distance( U, @) would be exponential. Theorem 2 based on broader-than’’ relationships, where the broader
“

allows us to compute Distance(U,@) in less than the terms are higher in the tree. The broader-than relationship
O( n 3 ) required for the common “all-pairs shortest path” is very similar to the is-a relationship [35], but also in-
~

KAUA er ui.: U ~ V ~ L U Y M ~ANN UI APYLILAIIUN vk A miuc ON SEMANTIC NETS 23

separated by 24 h. For each student, the Spearman p

correlation coefficient’ between the two different rankings
was computed. The (arithmetic) average of the 22 coeffi-
Manual Analyscs cients was 0.63. This suggests high intraobserver reliability.
of Docurnenls
A Kendall’s concordance [37] performed on the 22 first-
Translauan into
MeSH Terms
time rankings of the students gave a value of 0.19. At the
0.05 confidence level, we can reject the null hypothesis that
the student ranks were independent.
For a population of six titles and ten queries about
1
“rheumatoid arthritis and knee prosthesis,” rankings by
two physicians were compared. To analyze the data, we
grouped articles with queries in a way that allowed the 60
article-query pairs to be treated as one population for
ranking. Then Spearman’s correlation coefficient was used
to compare two different rankings. The null hypothesis
that the two rankings are independent was rejected.

of Quev C. Shortest Path Lengths in Mesh

Tnnslauon in10
MeSH Terns In this section, we test the extent to whch the minimum
path length between two nodes provides a good measure of
1
conceptual distance between the corresponding concepts.
(1 Query ‘) First, we challenge one of the basic assumptions of our
model, that is the symmetry of conceptual distance. We
Fig. 1. Medline with emphasis on manual Mesh encoding mentioned earlier (Section 11-A) that people’s assessments
of conceptual similarity (or distance) may not always cor-
(...... ......
MeSH
...........
.-..- ~

respond to a feature comparison process as prescribed by

Tversky [lo]. When the concepts to be compared are in an
.... .........
............ is-a relationshp, for example, people might use solely that
relation as a basis for their assessments. We also saw that
............... ....... ...... ................ the asymmetry of is-a links does not affect the similarity
between concepts as determined by spreading activation
_, .,. -- - ....... .......... ... [ll]because a path of is-a links would be transversed in
both directions equally.
We performed a set of experiments to determine whether
............ ......-,../’ .... ....
the conceptual distance between two terms in a broader-
Fig. 2. Small portion of Mesh tree. Overall tree is nine levels deep and than relationship is symmetric. We took a set of pairs of
includes over 15000 terms.
terms from Mesh. The terms constituting a pair in our
experiments were such that one was broader-than the
cludes other broader-than relations, such as “ part-of.’’ One other. We asked students to assign numbers to each pair
of our assumptions is that these other types of broader-than (term,,term,), in answer to the question, “How close is
relationship exhibit the same effect on conceptual distance term, to term,?” The numbers assigned correspond to the
measurements as the is-a relationshp does. conceptual distance between the two. There seemed to be
no correlation between the numbers assigned, and whch
B. Observer Reliability of the terms was broader-than the other. Although the
Ranking documents by order of relevance to a query is experiment was not conclusive because of the underlying
based on an assessment of the strength of the relationships assumptions we made about the matching process (for
example, that search in memory would always proceed
between each document and the query. Psychological stud-
ies have shown that human judges are prone to biases
when confronted with a similar task [36]. Therefore, before
comparing the performance of Distance to people, we ’Given k entities e,, .... e , , the Spearman correlation coefficient be-
tween two rankings r,,. . ’ ,rk and r;, ’ . ., r l is given by
studied the performance of people.
Twenty-two students from a computer science course
at George Washington University were given a query
about computers and medicine and seven titles from the
Medline response to “computers and medicine.” Each The coefficient is 1 for identical rankings, 0 for unrelated rankings, and
student was asked to rank the titles at two different times - 1 for inversely related rankings.
24 IEEE TRANSACTIONS ON SYSTEMS. MAN. AND CYBERNETICS, VOL. 19. NO. 1. JANUARY/FEBRUARY 19x9

from term,), it suggested that were there to be a difference, of term-term distances in herarchical semantic nets, and
it would probably be a small one. that with a good hierarchcal semantic net the rankings
The 1986 edition of Mesh had an inadequate coverage of determined by Distance roughly correspond to those which
information science related topics. Accordingly, we initi- people perceive. We have done a host of other experiments
ated a study of how to make the information science part with term-term distances in hierarchical semantic nets and
of Mesh better [38]. One resource was the Association of with variants on Distance and always concluded the same
Computing Machinery’s hierarchical semantic net for com- thng, namely, that this approach to validating semantic
puter scence, called the computing reviews classification net merging strategies has merit.
structure (CRCS) [39]. The information science section of
Mesh had about 200 terms, whle CRCS had about 1000
terms in a four-level hierarchy.
D. Distance Applied to Documents and Queries
Our merging algorithm first determined the similarities
between Mesh and CRCS and then exploited the differ- Given a document D characterized by a set of index
ences. In the simple case where a term t , existed in CRCS terms D = { t,,,, t,,,; . ., to,,} and a query Q coded into
and not in Mesh but t , had a parent t , in CRCS which an ANDed set of index terms Q = { tQ,,,tQ,,; . ., we
equaled a term t , in Mesh, we added t , to Mesh as a child hypothesized that the distance between D and Q gives a
of t,. This algorithm had several other capabilities so that measure of the conceptual distance of the document to the
terms in CRCS could also accurately become parents of query. In the experiments reported below, we computed
terms in Mesh. To test whether the merger had created a Distance between a query and a number of documents. We
better semantic net, a sequence of experiments were per- then ranked the documents with Distance, assuming that
formed in which human evaluations of distances between the greater the distance between the query and a docu-
terms were compared to those of Distance on Mesh versus ment, the less relevant the document was to the query.
Distance on Mesh + CRCS. Similarly, we asked people to rank a set of documents with
Twelve pairs of terms that were both in Mesh and respect to a given query and then compared their ranks to
+
Mesh CRCS were given to ten computer science students those produced by Distance.
at George Washngton University. The students were asked For ten different queries and six articles the averages of
to assign a number between one and five to each pair of two physicians’ evaluations were compared to those pro-
terms to indicate what they thought was the conceptual duced by Distance on Mesh. The agreement between the
distance between the components of the pair. The 12 pairs computer and the people was significant at the 0.05 level.
of terms were then ranked in increasing order of their To show that t h s ranking by the computer depended on
distance. Similarly, shortest path lengths in Mesh and more than the exact matches among terms of the query
Mesh + CRCS were computed for each pair of terms, and and document, the experiments were repeated but now
ranks were computed from these distances for the two with path lengths constrained. If only exact matches be-
semantic nets. tween terms in the query and document descriptions were
The average Spearman’s correlation coefficient of ten used, then there was a negative correlation between the
students shows that their rankings significantly agree at people’s and comuter’s ranlungs, proving that Distance
the 0.01 level of confidence ( (Y = 0.01). Now comparing the was sensitive to the structure of Mesh.
average of the students’ rankings against Mesh and against Two scientists compared each of 52 documents against
+
Mesh CRCS, we get the query “lipids and encephalitogenic basic proteins.”
The 52 documents were retrieved from Medline by a
pavg. Stud, Mesh = search with the term lipids in it. Each document was
represented by all the Mesh terms stored in Medline for
Pavg. Stud, (Mesh+CRCS) = 0‘52’ that document (typically, ten terms per document). The
ranking of each scientist and the ranlung of Distance was
As a descriptive statistic we can accept that these two statistically significantly correlated. The correlation be-
correlation coefficients are significantly different. The aug- tween the rankings of the two scientists was also signifi-
mentations provide a better correlation between people cant at the 0.05 level. The same methodology was applied
and the semantic net. We have done a similar experiment to the queries “suicide and substance dependence,” “liver
with four physicians who ranked the terms; there again the diseases and peritoneoscopy,” “shock and endorphins,”
results clearly show the augmented semantic net as more and “ biocompatible materials and dental implementation,”
accurately representing the cognitive distances that people and the same results attained. That is, the human judges
hold true. agreed with each other and with Distance in the ranking of
In our own subjective evaluations the merged Mesh + documents to query. These and other experiments support
CRCS was better than Mesh alone. Distance has helped us the claim that Distance on Mesh sets a baseline for perfor-
systematically document the difference in functionality mance that is not disconnected from the decisions of
between Mesh+CRCS and Mesh alone. In general, we people regarding the conceptual similarity between sets of
have found that Distance is a useful tool for the evaluation terms.
RADA et ul.: DEVELOPMENT A N D APPLICATION OF A METRIC O N SEMANTIC NETS 25

NA: Keratoconus
AT. Cornea. conical.
ET. H e r e d w : assoclatcd with Down syndrome, aiopc dermauus, Marfan syndrome. rcun111s pig-

(.’
i--
...................
Procedure .)
.......

<‘. .............
Disease )
menlosa. anmdia, vcrnal catarrh. Alpen syndrome. Ehlers-Danlos spdrome
SX: Blurred vision uncorrected by glarses.
SG: More frequent in females: onset at pukny: myopia: asugmausm: possibly more advanced i n nne
eye, eventually bdaleral.
LB: Ophlhalmoscapy- progressive b u l s n g of cornea: apex of cone usuaUy shghlly below
....... ,
center of
...... ..... ......... Cornea. corneal promsian recognned by viewmg eye Gom side: momeumes pulsation of corneal
conus synchronous with artend pulse lncrcased mtraocular lension: clefls xn Descemel mem-
brane. Reunoscopy: distonion of light reflex: distortion of appearance of nerve head, vessels 01
fundus. Kerawscopy. dlstoruon of corneal light reflex
CR: Prognosis: asugmaurm progressing for y-, then becoming slauonary. passible corneal perfam-
UO”.

PA’ G p c i t y at apex of cone: line of gmy. yellow, or olivc-greenpigment farming incomplete ring

< .....

.......
1 ...........
....Prosthesis ...)
......... ..,,’
1
...... .............

collagen
.......... .........
Fig. 4. Example of CMIT disease description. Disease is keratoconus.
Fields in CMIT mean: NA is name, AT is alternate terms, ET is
etiology, SX is symptoms, SG is signs, LB is laboratory findings. CR is
course, PA is pathology.
I I
document made clear that they were “about” such rela-
tionships, then this enhanced Distance again correlated
.......J ....~..
.....
(’:-
..........I...... ......
Rheumatoid
Arthritis -)
well with the ranking decisions of people. Each nonhierar-
chlcal relation, such as “cause” or “treat,” had to be
handled in a distinct way [40]. The ramifications of such
enhancements to metric aspects of Distance have only
Juvenile
Rheumatoid
......... ......
.:> been partially explored.

B. Distance Applied to Featural Models

Fig. 3. Deep hierarchy of Mesh is evident here. Added edge between
“juvenile rheumatoid arthritis” and “knee prosthesis” misled In one set of experiments, we tried to merge current
Distance. medical information and terminology (CMIT) with Mesh
[41]. CMIT is a system for naming and describing diseases
IV. DIRECTIONS that is produced by the American Medical Association.
CMIT describes approximately 3600 diseases in a struc-
tured format with eight attributes. The eight attributes are
A . Distance Applied to Other Relations
alternate terms, etiology, symptoms, signs, laboratory find-
In studying a document set about juvenile rheumatoid ings, radiologic findings, course, and pathology. Within
arthritis (JRA), we applied Distance to Mesh alone and to each attribute, the description of a disease consists of a
Mesh augmented with other relationships [40]. The aug- series of noun phrases, usually separated from one another
mented Mesh contained additional edges such as the one by a semicolon. These noun phrases are a terse form of
connecting JRA to knee prothesis (see Fig. 3) and to natural language (see Fig. 4). First CMIT diseases were
granuloma. A knee prothesis can be used in the surgical “parsed” by lexical matching into Mesh. Only those CMIT
treatment of a knee destroyed by JRA, and a granuloma is terms or phrases that occurred specifically as Mesh terms
a pathological finding in JRA. We hypothesized that the were retained. This method had the advantage that all the
addition of edges or relationships from another knowledge resulting terms for characterizing diseases from CMIT
base to Mesh would augment the ability of Distance to were embedded in the hierarchy of Mesh.
reflect the decisions of people about cognitive distance. CMIT has many diseases that Mesh does not. One of
Distance applied to this augmented Mesh failed to simu- our goals was to develop an algorithm that would insert
late people. these CMIT diseases into the appropriate place in Mesh
These results may appear to be predictable because and thus improve Mesh. The strategy was first to locate all
Distance was designed to work for is-a links. However, the the diseases that existed in both Mesh and CMIT and to
property of is-a links that seemed to matter most was that use those as references in Mesh against whch conceptual
they acted like highly critical links [ l l ] , and thus, links of distance of CMIT diseases could be assessed. We identi-
is-a paths provided a good measure of conceptual Dis- fied a number of such diseases and for each disease we
tance. In this experiment, we hypothesized that because added to Mesh the edges that connected the disease to its
relations such as “cause” or “treat” are important and attributes, thus producing Mesh CMIT. +
should have high criteriality, shortest paths along those The first attempt at computing conceptual distance was
relations should also indicate conceptual distance. Instead, to treat each disease as a compound concept where the
it became clear that broader-than relationships had a pecu- attributes were ANDed. To compute the conceptual dis-
liar significance in adjudicating distance. tance between two such diseases, Distance was applied
When Distance was modified so as to cross only such between the sets of attributes, the same way that
nonhierarchical relationships when both the query and the Distance was applied in our earlier document retrieval
26 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 19, NO. 1, JANUARY/FEBRUARY 1989

experiments. To test the validity of this approach, we

determined the extent to which the shortest path length
between two disease names was correlated to the Distance
between their sets of attributes. For example, “myopia”
and “hyperopia” existed in both Mesh and CMIT. Their
distance in Mesh is two. We applied Distance to their sets
of attributes, and although we did not expect this Distance
to yield exactly two, we expected that, by taking a number
of such diseases, there would be a correlation between the
ranking on disease names and the ranking on disease
attributes. ....... 1.. ..
,.

. ._ I
I..

Ten eye diseases that had the same names in CMIT and (.. .. h 3
......... .’
.)
Mesh were first used. Distance was applied to all pairs of
the disease names and then to all pairs of descriptions of ....1..... .... 1

these eye diseases. From the two sets of scores we derived

rankings and then checked the degree of correlation be- (1’’ Q
.......T.......
-l
..’ (: j
tween the rankings based on Mesh disease names alone
1 I
versus on disease descriptions. The correlation coefficient
was 0.06-there was lack of agreement between the rank-
ings.
< > <..
..-....1.........

..............
as ‘1
Examination of the Mesh + CMIT descriptions and the Fig. 5. Sample graph G, used to illustrate discontinuity in Distance
path lengths between terms suggests that too many dis-
tances were being calculated that were not meaningful. C. Distance Near Zero
Each attribute of a disease should be treated separately.
Distance fails in some ways to capture the intuitive
For instance, the etiology feature for a disease is meaning-
notions of what it means for one concept to be close to
fully compared to the etiology feature of other diseases but
another. Consider the query (knee prothesis AND rheuma-
not to the laboratory findings feature. Taking advantage of
the breakdowns in CMIT requires treating the different toid arthritis). According to Distance on Mesh a document
indexed (joint prosthesis AND rheumatoid arthritis) is as
attributes of a disease differently. The distance between
the etiology and laboratory findings features is less impor- close to the query as a document indexed under disease6
tant than the distance between two etiologies or the dis- (see Fig. 3). Distance would also say that a document
indexed under disease is closer to the query than a docu-
tance between two laboratory findings. This is similar to
our earlier example about “the old red house” and “the mented indexed (juvenile rheumatoid arthritis AND knee
old pink mansion,” where Distance should be applied prothesis)!
Consider the case of graph G, with a root and two linear
selectively, i.e., as a feature comparison process, rather
than applying it to all pairs of elementary concepts. Unlike branches of five nodes each (see Fig. 5),
the case for documents where a mapping of index terms
along semantically distinct dimensions does not exist (and
for whch Distance performed well), a mapping not only
exists in this case but may well be essential to the success gets larger as i goes from one to four, but when i = 5, there
of Distance, i.e., a breakdown or decomposition of the is a drastic drop in the value of Distance to zero. G, can be
+
Mesh CMIT descriptions into their natural parts, like generalized to G, in which ak and a p k are k edges from
etiologies and laboratory findings, might allow for dis- a,, and the leaves are a, and a-,. The pathology true for
tances that more closely related to the distances among G, is also true for G,. The Distance between {a,, a-,, a,}
Mesh disease names. and {a,, a P i ,a , } steadily grows as i grows but drops to
The etiology components of the diseases were next iso- zero when i reaches n .
lated. The hypothesis was that on the etiologies the dis- The problem with the above cases is not so much the
eases would have distances from one another that more fact that there is a discontinuity near zero-after all, we
closely corresponded to those distances that existed be- imposed the zero property by definition. More damaging is
tween the names alone. The same steps as used for assess- that Distance values may increase as the conceptual dis-
ing the correlation between Mesh disease names and Mesh tance seems to decrease. The basic problem seems to be
+ CMIT descriptions were now used to assess the correla- that Distance between sets of nodes treats equally all pairs
tion between the rankings on etiologies and the rankings of nodes. This becomes an issue for extreme cases as the
on disease names. The degree of correlation was signifi- ones presented above. This situation is not unrelated to the
cant. Incidentally, these experiments proved that some of
the Mesh diseases are hierarchically organized according to 6Let the reader be assured that Medline indexing manuals do not allow
etiologies. a document to be simply indexed under “Disease.”
RADA et ul. : DEVELOPMENT AND APPLICATION OF A METRIC ON SEMANTIC NETS 21

problem we had originally had with the featural model. We close, but not identical, to those that humans would, and a
have considered and found weaknesses in several metric way to measure this closeness is important.
alternatives to Distance, such as the distance between In looking at the distance between sets of nodes, we
centroids for each set of nodes, and several nonmetric have had to deal with different kinds of relationships. It
alternatives, such as the path length between the closest seems that the broader-than and narrower-than relations
nodes in two sets. can be treated in basically the same way, but other rela-
tionships, like cause, merit different handling. For queries
and documents we argue that nonhierarchical relationships
should only be traversed when both query and document
V. CONCLUSION
specify that they are about that relationship. The integrity
Our research group has been for the past two years of the node-to-node path lengths in hierarchical semantic
developing methods of merging semantic nets, such as nets are the key to the success or failure of Distance as a
Mesh and CMIT [l]. As is typical for such machine measure of conceptual distance. In our evaluations, these
learning experiments, issues of representation and reason- path lengths have shown themselves to be cognitively
ing are as critical as those of learning. Our hypothesis is meaningful. Some have argued that spreading activation
that better semantic nets result from the mergers, but to does not spread across more than one link [42], [43], but
evaluate “betterness” we need a way of reasoning with a they ignore link labels, whle we have shown the impor-
semantic net. This leads to the development of a measure tance of distinguishing hierarchcal from nonhierarchical
of conceptual distance between sets of nodes in a semantic links.
net. By transforming documents and queries into sets of Distance might be implemented in an information re-
nodes, we can do information retrieval experiments that trieval system which is based on the indexing of docu-
test the value of our semantic net. ments and queries into terms from a semantic net (see Fig.
In our search for an evaluation tool, we realize that 1). Often in such systems a query retrieves more docu-
certain cognitively meaningful and mathematically conve- ments than the user wants, and the documents are arbitrar-
nient properties are desirable. Although the relationships ily ordered as they appear on the computer screen. A
in our semantic nets are directed, e.g., broader-than and measure like Distance might be applied to help rank the
narrower-than7 our early experiments suggest that for the documents to the query and allow the querist to pay most
purposes of conceptual distance, these relationships can be attention to those documents which are most like to be
treated as undirected. Accordingly, the measure of distance conceptually close to the query.
over sets of nodes has the property of symmetry. Further- There are a host of specific questions about the cognitive
more, we are interested in knowing under what conditions realism of Distance that we have not adddressed. For
one document is far from or close to another document. instance, to the extent that the semantic net is a tangled
For this purpose a property like the triangle inequality is hierarchy and has levels, should terms at different levels be
useful. The work in memory-based reasoning [24] is one treated differently [17]? Relative to Distance, “ part-of’
example where metric properties are important. Our mea- links seem to obey the same properties as is-a links, but
sure of conceptual distance, called Distance, satisfies the how would a metric on causal links look? We do not claim
properties of a metric. On the surface it is surprisingly that the brain is making Distance-like calculations in the
simple-just the average of the path lengths between pairs course of determining cognitive similarity. Nor do we
of nodes. However, it has proven remarkably powerful and argue that the tangled hierarchy and Distance are adequate
flexible. for other cognitive tasks [44]. Cognition probably does not
In our efforts to evaluate semantic nets, we have also rely on measurements that satisfy the properties of a
developed an algorithm called Indexer for automatic in- metric. A metric has, however, many attractive features
dexing of document titles into Mesh [31]. In one set of because of its mathematical and semantic tractability. We
experiments we added thousands of synonyms to the main claim that the better the semantic net on which Distance
terms of Mesh and tested the main-terms-plus-synonyms operates, the more the conceptual similarity decisions of
semantic net with Indexer. We compared the performance Distance match the conceptual similarity decisions of peo-
of Indexer against human indexers by counting the num- ple. We have been surprised at how powerful a simple
ber of hits and misses. To our surprise the synonyms did algorithm like Distance can be in evaluating hierarchical
not increase the hits any more than they increased the semantic nets.
misses. Then we refined the measure of performance by
applying Distance. When Distance measured the distance
between the human and machine results, the synonyms
proved to be helpful. In other words, the synonyms usually ACKNOWLEDGMENT
led Indexer to be closer to the human indexing. The
measure of absolute hits and misses was unrealistically Donald Bamber of the Navy Personnel Research and
demanding of Indexer. In artificial intelligence experi- Development Center provided the important examples of
ments it might be expected that the machine gives answers the weaknesses of Distance as two sets approach one
2x IEEE TRANSACTIONS O N SYSTEMS, MAN, A N D CYBERNETICS, VOL. 19, NO. 1, JANUARY/FEBRIJARY 1989

another in the graph. Referees for t h s TRANSACTIONS b) Let us prove that it is always the case. Assume that
provided guidance on substantial revisions of this paper.
Distance ( { U } , U ) # Distance( U,, - { U } , U ) ,
APPENDIX then, according to Lemma 1,
Theorem 2: Let U be a nonempty subset of V , then Distance ( U,, , U ) min {Distance ( { U } , U ) ,
Distance ( U , +) = max Distance ( { U } , U ) . Distance (U,, - { U } ,U ) }
UEV

Proof of Theorem 2: To prove Theorem 2, we first < max {Distance( { U } , U ) ,

show the following lemmas. Distance( U,, -{U } , U)}.
Lemma 1: Let V I , V,, and V, be three nonempty sets of
V , such that VI and V, are disjoint. If Distance(Vl, V,) = T h s implies that either { U } or U,, - { U } is further from
Distance( V,, V,), then U than U,,, which contradicts the definition of Urn=.

Distance( V, U V,, V,) = Distance( V I ,V,). (4) Going back to the proof of Theorem 2, let U,, be an
element of V such that
In general,
Distance ( { U,, } , U ) = max Distance ( { U } , U )
Distance(VlUV2,&) > min {Distance(V,,V,)} (sa) ('E v
I =1,2
By definition of Distance(U, +), we have
Distance( VIU V,, V,) G max {Distance( V,, V , ) } . (5b)
1 =1,2
Distance ( U , +) > Distance ( { U,, } , U )
Proof of Lemma 1: Let v1= {U:; . -,u t } , V, =
or, for some set U,,,
{ u \ , . . . , u ; } , and V,= { u : , . . . , u ; 2 } ,
Distance ( U , U,,) 2 Distance ( { U,, } ,U ) .
Distance ( VIU V2,V,)
Let us assume that
Distance ( U , U,,) > Distance ( { U,, } ,U ) .
According to Lemma 2, for all U E U,,, we have
Distance ( { U } , U ) = Distance ( U,, , U )
= Distance ( U ,+).

Therefore, for all U E U,

k Distance ( { U } , U ) > Distance( { U,, } ,U ) ,

- Distance( VI, V,)
k+q which contradicts the definition of U,,. Therefore we have

+- Distance ( V,, 6). Distance ( U , U,,) = Distance ( U , U,, { }) .

k+q
Theorem 3: Distance is a metric on the sets of concepts
When Distance( V I ,V3)= Distance( V,, V,), (4)follows im- defined on a semantic net.
mediately. When the distances are different, inequalities
(5a) and (5b) follow because both k and q are positive. Proof of Theorem 3: 1) By definition, for all U 2 V ,
Distance( U, U ) = 0. 2) Regarding symmetry, when neither
Lemma 2: Let U be a nonempty subset of V and U,, a V, ( i = 1,2) is empty, it is easy to see that Distance( V I ,V,)
subset of V such that = Distance( V2,VI). This can be done by interchanging the
Distance ( U , U,,) = Distance ( U , +), order of summation and using the symmetry of the dis-
tance d. 3) For all V, and V, subsets of V , Distance( V I ,V,)
then for all U E U,,, we have is positive because it is computed as an averaged sum of
Distance ( { U } , U ) = Distance (U,, ,U ) positive numbers. When one of the subsets is empty,
= Distance ( U , +).
Distance is then a maximum of such averaged sums:

Proof of Lemma 2: a) By putting VI = { U } , V, = U,, Distance( V,, +) = max {Distance( V I ,U ) }

u2v
- { U } , and V, = U in the previous lemma, we conclude
from (11) that if = max (Distance(U, V I ) )
u2v
Distance ( { U } , U ) = Distance (U,, - { U},U ) ,
= Distance (+, VI).
then
Distance({ . } , U ) =Distance(U,,,U) 4) For the triangle inequality, we have to prove that

= Distance ( U , 4). Distance ( V , ,V,) G Distance (VI, V,) + Distance ( V,, V,) .
a) Let V, = { U : ; . . , U : } , V, = { U ; ; . . , U $ } , and V3= c) When Lf3 is empty, we write Distance(V,, +) as Dis-
{ U:; . ., U ? } be three nonempty subsets of V. We have tance( VI, V,,,) and reduce this case to the previous one
l k q
for which the triangle inequality was proven to hold.
Distance( V,, V,) = - 1 d ( U;, ui)
kq 1-1 J’1 REFERENCES

R. Forsyth and R. Rada, Machine Leurniiig: Expert Systems m d

Iilformution Retrieuul. London: Ellis Honvood. 1986.
H . Mili and R. Rada, “Merging thesauri: Principles and evaluation,”
1 k 4 m I E E E Truns. Pattern Anal. Muchine Intell.. vol. PAMI-10. no. 2.
pp. 204-220, 1988.
R. Rada and E. Bicknell, “Ranking documents with a thesaurus.”
J . Amer. Soc. Itlform. Sci., in press.
Similarly, we can write Distance( V,, V,) as R. H. Hopkins, K. B. Campbell. and N. S. Peterson, “Representa-
tions of perceived relations among the properties and variables of a
1 ’. complex system,” / € € E Truns. Syst. Mu17 Cvhrrn., vol. SMC-17.
Distance(V2, V,) = ~

kqm
1 1 p1
-1
d(ui,Up)
J=l
pp. 52-60, Jan./Feb. 1987.
G. Loberg, G. M. Powell, A. Orefice, and J. D. Roberts, “Repre-
senting operational planning knowledge,” I E E E Truiis. S.v.st. Mon.
Adding (6) and (7) yields Cvherri.. vol. SMC-16, pp. 774-787, Nov./Dec. 1986.
S . Miyamoto, K. Oi, 0. Abe, A. Katsuya, and K. Nakayama.
Distance ( V I ,V,) + Distance ( V2,V 3 ) “Directed graph representations of association structures: A sys-
tematic approach,” I E E E Truns. Svst. Mun Cvhrrn.. vol. SMC-16.
no. 1. pp. 53-61, 1986.
E. Reingold, J. Nievergelt, and N. Deo, Comhir~utorrulA1gorithr1i.s.
Englewood Cliffs, NJ: Prcntice-Hall, 1977.
S. Y. Lu, “A tree-matching algorithm based on node splitting and
Because d satisfies the triangle inequality, we have merging,” I E E E Truns. Puttern Anul. Muchine Intell.. vol. PAMI-6.
pp. 249-256, Mar. 1984.
d ( U;, U;) + d ( up) > d ( up).
U;, U;, M. A. Eshera and K. S. Fu, “A graph distance measure for image
analysis,” I E E E Truns. Sy:~t. Mun Crhern., vol. SMC-14. pp.
From (8) and (9) it follows that 398-407, May/June 1984.
A. Tversky. “Features of similarity,” Psych. Ret,., vol. 84. pp.
Distance ( V,, V,) + Distance (V, ,V,) 327-352. 1977.
A. M. Collins and E. F. Loftus. “A spreading activation theory of
semantic processing,” Ps-vch. Rei,., vol. 82, pp. 407-428, 1975.
R. Fikes and T. Kehler, “The role of frame-based representation in
reasoning,” Commuii. Assoc. Comput. Much., vol. 28. no. 9. pp.
904-920, Sept. 1985.
Because the summand does not depend on the index j , the C. Hoede, “Similarity in knowledge graphs.” Dep. Appl. Math.,
summation over j is equivalent to multiplying the sum- Twente Univ. of Technology, 7500 AE Enschedc, The Netherlands,
Memor. 550, Jan. 1986.
mand by q. Implementing this change and eliminating q R. Prieto-Diaz and P. Freeman, “Classifying software for reusabil-
yields: ity,” I E E E Softwure, vol. 4, pp. 6-16. Jan. 1987.
D. Rumelhart and D. Norman, Rrpresenrution in M e m o r ~ , .
Distance (V,, V,) + Distance ( V,, V , ) La Jolla. CA: Center for Human Information Processing, Junc
I k m 1983.
R. Brachman. “What IS-A is and isn’t: An analysis of taxonomic
links in semantic networks,” Coniputrr. vol. 16, no. 10. pp. 30-36.
1983.
B. Adelson, “Comparing natural and abstract categories: A casc
2 Distance (V,, V,). (12) study from computer science,” Cogii. Sci., vol. 9, no. 4. pp. 417-430.
1985.
This establishes the triangle inequality in the case where D. Nau and T. C. Chang, “Problem solving knowledge in a
the three sets are nonempty. frame-based process planning system.” Inrer. J . I n t e l l . Syst. vol. 1.
b) When V, is empty, then there exists at least one no. 1, pp. 29-44, Spring 1986.
B. Buchanan and L. M. Fu. “Learning immediate concepts in
subset of V , V, such that constructing a hiearchical know,ledge based,” in Proc. 9th I n t . Join1
Conf. Artrfrcrul Intell., 1985. pp. 659-666.
Distance(V,,+) = Distance(V,,V,,,).
A. S. Pollitt, “ A rule-based system as an intermediary for searching
In the previous steps we proved that cancer therapy literature on MEDLINE.” in Intelligent Informution
Systems: Progress und Prospects, R. Davis, Ed. London: Hor-
wood, 1986, pp. 82-126.
Distance( V , , V,) < Distance( V,, V,,) P. Shoval, “Principles, procedures and rules in an expert system for
+ Distance ( V,, ,V,) . information retrieval.” Inform. Processing Management. vol. 21, no.
6, pp. 475-487, 1985.
G. Salton and M. McGill, Introduction to Modern Informution
From the definition of Distance(+, V,), we know that Retrieoul. New York: McGraw-Hill, 1983.
E. Fox, “Extending the Boolean and vector space models of infor-
Distance (V,,, V,) < Distance (+, V,). mation retrieval with P-norm queries and multiple concept types.”
Ph.D. dissertation, Dep. Comput. Sci., Cornell Univ., Ithaca, NY.
Therefore, 1983.
C. Stanfill and D. Waltz, “Toward memory-based reasoning.”
Commun. Assoc. Comut. Mach.. vol. 29, no. 12, pp. 1213-1228,
Distance(Vl, V,) < Distance(V,,@,)+Distance(+,V,). 1986.
A. Ortonq. “Beyond literal similarity.” Py.c/i R ~ T. vol. X6. pp. Ro! Rada rccci\cd the D A degree i n ps\cht>log\
161-1XO. 1979. frow Yale Uni\crsit!. Ne\\ H a l m . CT. t h e If r>
M. R. Quillian. “Semantic memory,” in Seniutirrc, Iriforni. Proc,e.s.\- degree from Ba?lor College of Xledicinc. Hoti\-
rng. M. Minsky. Ed. Cambridge. MA: MIT Pres\. 1968. ton. TX. the M S degree in coiiip~iter \cicncc
R. J. Hrachman and J. G. Schmolze. “An oveniew of the KL-ONE from the Uni\ersit\ of H o u t o n . Houston. TX.
knowledge representation system.” Cogti. Scr.. vol. 9,pp. 171-216. and the Ph.D. degree i n computer sciencc frorii
19x6. the Uni\ersit\ of Illinois at Urbana
E. J. McCluskey, “Minimization of Boolean functions.” Bell $wr. He \\as an Assistant l’rofe\sor of C’omputcr
Tech. J . . vol. 35. no. 6, pp. 1417-1444, 1956. Science at U’a!ne State Universit! from I Y X I t o
1291 D. Touretzky. “The mathematics of inheritance $)stems.” Ph.D. 1984. He \\orked from 19x5 t o 19SX as Editor of
dissertation, Dep. Comput. Sei., Carnegie-Mellon Uni\ ., Pitts- ltide\. Vedic II.\ at the Satinnal Libran of
burgh, PA, Ma! 1984. Medicine in Hethe&. MD and currentlb holds a Chair i n Computer
[30] E. E. Smith. E. J. Shoben, and L. J. Rips, “Comparison processes in Science at the Uni\crritb of Li\erponl. Hi\ research interests focus on
semantic memory,” P,~ycli.Rev., pp. 214-241. 1974. intelligent infcormation s!btems.
[311 R. Rada. L. Darden. and J. Eng. “Relating t\vo knowledge bases:
The role of identity and part-whole,” in The Role of Lutiguu~eit1
Probler?~Sol/,ing,vol. 2 , R. Jernigan. Ed. Amsterdam, The Nether- Hafedh Mili received the 13,s degree in inatlic-
lands: Elsecier, 1987, pp. 71-91. matics and phbsics from L>ccc Mi\tc dc Jcmmal.
A. Aho, J. Hopcroft. and J. Ullman. The De.~igtiurid Atiu(vsis of Tunisia. a Diploma in applied rnathcmalic\ from
(’onippturr if/,porI/hni.v Reading. MA: Addison-Wesley. 1974. Ecole Centrale de Paris. France. and t h e P11.I)
D. R . MMcCarn. “Medline: An introduction to on-line searching.” J . degree in computer xien-u from Gcoge Wash-
Auier. Soc.. Ii$orni. Sci., vol. 31. no. 3. pp. 1x1-192, May 1980. ington Uni\crsit>. b’ashington. DC.
J. Hackus. S. Davidson. and R. Rada, “Searching for patterns in the He ha\ been cmplo!cd 0 1 1 an NSF Kcwarch
Mesh vocabulary.” Bull. Med. 1-rh. Assoc.. vol. 75. no. 3. pp. Associateship and an 113M b-ello\\ ship. H i s main
221-227. July 1987. interests are i n kno\\ledgc reprc\cnt~ition and
[35] N. Libra? and Inform. Assoc. Council. (;urdelrnes f o r T/iesurciurri.~ intelligent information \?stem\.
Strut ture. Co?istruc’tron. urid Ute. New York: Amer. Nat. Stan-
dards Inst.. 1980.
J . P Schwartz. J. H. Kullback, and S. Shrier. “A frame\vork for Ellen Richiiell rccei\ed the H S. degree fr<ini Rice
taak cooperation within systems containing intelligent components.” Universit\. I I o u t o n . TX. and the Ph I>. degrec
Trcm. .y\.st. M a n Ciherti., vol. 16. no. 6. pp. 788-791, from Hro\\n Univcnit!. Pro\idence. RI. hoth i n
Nov.,’Dcc. 1986. chcrni\try
5. Sicgel. .Vo~ipuru~ietrrc~ Sturrstrc~.. New York: McGraw-Hill. She held Post-Doctoral Fellon h i p s i n Florida
1956. and Oregon and \\a\ an Associate I’rofcwor of
R. R a d a ct U / . , “ A vocabulary for medical informatics.” Coniput. Computer Science at Wa!ne State Uni\ersit! in
~ I I , I H I c , ~ / Rc-;.. vol. 20. pp 244-263. 1987. Detroit. MI. from 1977 to 19x6 From 19Xh t o
J . C:~mm.:t and A. Ralston. “The new (1982) computing reviews 19XX she \\as a Special F,upert at the National
cIa\\ifica;,on sqstcni-Final \eraion.” C’oninum. AMJC. Cor~iput. Lihrar! o f hledicine. She i \ interested i t i both
L f d . . vol 25, no. 1. pp. 13-25. Jan. 1982. computers and chemistr?
U. Radii. “Gradualness facilities knowledge refinement.” If
frcui.5 Potrern Ariui. .‘Lfat/irrie lti/e/l., vol. PAMI-7. no 5. pp.
523-530. Sept. 1985. Maria Rlettner recei\ed the Diplom and 1’11 D.
S L a t e r and R. Rada. “A method of medical knowledge base degrees i n \tatistics from the Unit ersit! of Dort-
augnientation.” Merhod~of lriforni. I.led , vol. 26, no. 1, pp. 31-39. mund. (;erman!
1987. She narked a\ a Hio5tatistician at the lnterna-
J. Holland. K. Holyoak. R. Nisbett, and P Thagard. I~idutrrori: tional Agenc? for Re\carch on Cancer in Lvon.
Prows., of 1iiferetic.e. L,eurrirri,p. urid Llr.sc.or~eri~. Cambridge. MA: France froin 19x3 t o 19x5 and thc N~itional
MIT Press. 19x6. Cancer Institute i n Iktheada. MD. from 19x5 t o
[431 A. M.13. de (;root. “The range of automatic spreading activation in 19XX Currentk. shc i \ working ;I\ a Lecturer in
word priming,” J . Verhul 1,eurrirti.y l‘erhui Rehor-ior. vol. 22. pp. the Dcpartrncnt of Statistic5 and C ~ ~ m p ~ i t ~ t i o n a l
417--436. 19x3. Mathematic\ ;it the Uni\ei-\it\ of Li\emool. Li\-
W . Walker and W. Kintsch, “Automatic and strategic aspects of crpool. England. Her m;iin interest is i n the
knmvledge retrieval.” Co,qii. S u . , \ol. 9. pp. 261-283. 19x5. application and dc\e~opmcnto f \tatistical method\ in biomcdic~nc

AMC Matrix Solution 001
70% (10)
AMC Matrix Solution 001
3 pages
Similarity Matching in CEP Systems
100% (1)
Similarity Matching in CEP Systems
15 pages
Study of Data Mining Algorithm in Social Network Analysis: Chang Zhang, Yanfeng Jin, Wei Jin, Yu Liu
No ratings yet
Study of Data Mining Algorithm in Social Network Analysis: Chang Zhang, Yanfeng Jin, Wei Jin, Yu Liu
6 pages
Assignment No. 2: Similarity and Dissimilarity Measures
No ratings yet
Assignment No. 2: Similarity and Dissimilarity Measures
11 pages
4 Semantic Networks
No ratings yet
4 Semantic Networks
6 pages
Similarity Search PDF
No ratings yet
Similarity Search PDF
233 pages
2000 Procedimientos Industriales - Formoso
100% (2)
2000 Procedimientos Industriales - Formoso
1,219 pages
Masterthesis KN Universität Bern 2019
No ratings yet
Masterthesis KN Universität Bern 2019
72 pages
NMSLib Manual
No ratings yet
NMSLib Manual
82 pages
Schvaneveldt Durso Dearholt 1989 PDF
No ratings yet
Schvaneveldt Durso Dearholt 1989 PDF
36 pages
Social Network Analysis Unit-6
No ratings yet
Social Network Analysis Unit-6
22 pages
Shervashidze 11 A
No ratings yet
Shervashidze 11 A
23 pages
11 Graphs
No ratings yet
11 Graphs
53 pages
2017 Computing Semantic Similarity of Concepts in Knowledge Graphs
No ratings yet
2017 Computing Semantic Similarity of Concepts in Knowledge Graphs
14 pages
Clustering Part4
No ratings yet
Clustering Part4
79 pages
Format Synopsis DP
No ratings yet
Format Synopsis DP
12 pages
TopoICSim A New Semantic Similarity Measure Based
No ratings yet
TopoICSim A New Semantic Similarity Measure Based
15 pages
Module 5 Document Clustering
No ratings yet
Module 5 Document Clustering
33 pages
Group 4 PRT Presentation
No ratings yet
Group 4 PRT Presentation
48 pages
Neighborhood Based Fast Graph Search in
No ratings yet
Neighborhood Based Fast Graph Search in
12 pages
Netsimile: A Scalable Approach To Size-Independent Network Similarity
No ratings yet
Netsimile: A Scalable Approach To Size-Independent Network Similarity
12 pages
Feature-Based Approaches To Semantic Similarity Assessment of Concepts Using Wikipedia
No ratings yet
Feature-Based Approaches To Semantic Similarity Assessment of Concepts Using Wikipedia
18 pages
Expert Systems With Applications: David Sánchez, Montserrat Batet, David Isern, Aida Valls
No ratings yet
Expert Systems With Applications: David Sánchez, Montserrat Batet, David Isern, Aida Valls
11 pages
A Comparison Study On Similarity and Dissimilarity Measures in Clustering Continuous Data
No ratings yet
A Comparison Study On Similarity and Dissimilarity Measures in Clustering Continuous Data
20 pages
Supervised Learning vs. Unsupervised Learning
No ratings yet
Supervised Learning vs. Unsupervised Learning
7 pages
Journal Pone 0321114
No ratings yet
Journal Pone 0321114
25 pages
Articulo IA
No ratings yet
Articulo IA
8 pages
2012 Liviu P. Dinu, Radu-Tudor Ionescu, 2012. A Rank-Based Approach of Cosine Similarity With Applications in
No ratings yet
2012 Liviu P. Dinu, Radu-Tudor Ionescu, 2012. A Rank-Based Approach of Cosine Similarity With Applications in
5 pages
Mahyuddin Databia
No ratings yet
Mahyuddin Databia
8 pages
The Google Similarity Distance: Rudi L. Cilibrasi and Paul M.B. Vit Anyi
No ratings yet
The Google Similarity Distance: Rudi L. Cilibrasi and Paul M.B. Vit Anyi
16 pages
Chapter 2
No ratings yet
Chapter 2
70 pages
2014, Ontology COKB
No ratings yet
2014, Ontology COKB
20 pages
Michael Scholkemper Damin K Uhn Gerion Nabbefeld Simon Musall BJ Orn Kampa Michael T. Schaub
No ratings yet
Michael Scholkemper Damin K Uhn Gerion Nabbefeld Simon Musall BJ Orn Kampa Michael T. Schaub
7 pages
The Size and Shape of "Idea Space"
No ratings yet
The Size and Shape of "Idea Space"
7 pages
A Survey of Numerous Text Similarity Approach
No ratings yet
A Survey of Numerous Text Similarity Approach
10 pages
A Survey of Binary Similarity and Distance Measures
No ratings yet
A Survey of Binary Similarity and Distance Measures
6 pages
Automatic Meaning Discovery Using Google
No ratings yet
Automatic Meaning Discovery Using Google
31 pages
HCMUT MATHS4CS 055263 Assignment Community Structure Identification IMP
No ratings yet
HCMUT MATHS4CS 055263 Assignment Community Structure Identification IMP
10 pages
Lecture 4-Semantic Networks
No ratings yet
Lecture 4-Semantic Networks
10 pages
Measure Term Similarity Using A Semantic Network Approach
No ratings yet
Measure Term Similarity Using A Semantic Network Approach
5 pages
Measuring Semantic Similarity Between Words and Improving Word Similarity by Augumenting PMI
No ratings yet
Measuring Semantic Similarity Between Words and Improving Word Similarity by Augumenting PMI
5 pages
A Cohesion Based Friend Recommendation System
No ratings yet
A Cohesion Based Friend Recommendation System
16 pages
Akiba KShortest 2015
No ratings yet
Akiba KShortest 2015
7 pages
Tkde 2014 26 7
No ratings yet
Tkde 2014 26 7
17 pages
A Comparative Study On Distance Measuring Approach
No ratings yet
A Comparative Study On Distance Measuring Approach
3 pages
3BSE041037-601 - en Compact HMI 6.0.1 Product Guide
No ratings yet
3BSE041037-601 - en Compact HMI 6.0.1 Product Guide
86 pages
Core Java - Munishwar Gulati
No ratings yet
Core Java - Munishwar Gulati
252 pages
Matching Final PDF
No ratings yet
Matching Final PDF
22 pages
IMPLEMENTATION OF DIJKSTRAS To Build Travel App
No ratings yet
IMPLEMENTATION OF DIJKSTRAS To Build Travel App
5 pages
Similarity Measure Based On Edge Counting Using Ontology: Vadivu Ganesan, Rajendran Swaminathan M.Thenmozhi
No ratings yet
Similarity Measure Based On Edge Counting Using Ontology: Vadivu Ganesan, Rajendran Swaminathan M.Thenmozhi
5 pages
Mathematics: Discovering Correlation Indices For Link Prediction Using Differential Evolution
No ratings yet
Mathematics: Discovering Correlation Indices For Link Prediction Using Differential Evolution
10 pages
Exposure of Document
No ratings yet
Exposure of Document
5 pages
Similarity Analysis
No ratings yet
Similarity Analysis
85 pages
A Novel Spectral Graph Distance Measure and Its Applications in Biomedical Signal Processing
No ratings yet
A Novel Spectral Graph Distance Measure and Its Applications in Biomedical Signal Processing
10 pages
A Review of Semantic Similarity Measures in WordNet
No ratings yet
A Review of Semantic Similarity Measures in WordNet
12 pages
Efficient Graph-Based Author Disambiguation by Topological Similarity in DBLP
No ratings yet
Efficient Graph-Based Author Disambiguation by Topological Similarity in DBLP
5 pages
Pattern Identification: Hierarchical, Relations, Cycles, Disjoint Graph Components, and Edge Crossings
No ratings yet
Pattern Identification: Hierarchical, Relations, Cycles, Disjoint Graph Components, and Edge Crossings
3 pages
Similarity Index Based Link Prediction Algorithms in Social Networks: A Survey
No ratings yet
Similarity Index Based Link Prediction Algorithms in Social Networks: A Survey
8 pages
Smaller World My2e09
No ratings yet
Smaller World My2e09
9 pages
Integrating Semantic Concept Similarity in Model-Based Web Applications
No ratings yet
Integrating Semantic Concept Similarity in Model-Based Web Applications
8 pages
Comparative Study of Document Similarity Algorithms and Clustering Algorithms For Sentiment Analysis
No ratings yet
Comparative Study of Document Similarity Algorithms and Clustering Algorithms For Sentiment Analysis
4 pages
Mandatory Documentation and Records: Status Interpretation Notes
100% (1)
Mandatory Documentation and Records: Status Interpretation Notes
29 pages
FinalPaperDesign and Simulation of PID Controller For Power Electronics Converter Circuits170541
No ratings yet
FinalPaperDesign and Simulation of PID Controller For Power Electronics Converter Circuits170541
6 pages
Basler RDP-110
No ratings yet
Basler RDP-110
26 pages
What's Up CAPTCHA - A CAPTCHA Based On Image Orientation
No ratings yet
What's Up CAPTCHA - A CAPTCHA Based On Image Orientation
10 pages
dt209x Manual
No ratings yet
dt209x Manual
68 pages
OOSD Unit 1.3
No ratings yet
OOSD Unit 1.3
27 pages
4E1 4 10/100M Ethernet Integrated Optical Multiplexer: User Manual
No ratings yet
4E1 4 10/100M Ethernet Integrated Optical Multiplexer: User Manual
25 pages
ABC Technical
No ratings yet
ABC Technical
1 page
FP5207
No ratings yet
FP5207
13 pages
66 Easy
No ratings yet
66 Easy
10 pages
PNG Digital Transformation Policy - 21122020 - Updated
No ratings yet
PNG Digital Transformation Policy - 21122020 - Updated
52 pages
HCIP Data Center Facility Deployment
No ratings yet
HCIP Data Center Facility Deployment
8 pages
Lol
No ratings yet
Lol
3 pages
Config Zyxel 3550
No ratings yet
Config Zyxel 3550
370 pages
English Specification 0908
No ratings yet
English Specification 0908
31 pages
DOP-HMI - ENG Manual PDF
No ratings yet
DOP-HMI - ENG Manual PDF
425 pages
Journal of Parallel and Distributed Computing: Daming Zhao, Jiantao Zhou
No ratings yet
Journal of Parallel and Distributed Computing: Daming Zhao, Jiantao Zhou
11 pages
Database Systems - BIT - University of Colombo - Year 3 (Lecture Note 3)
No ratings yet
Database Systems - BIT - University of Colombo - Year 3 (Lecture Note 3)
64 pages
Certificate - of 406 MHZ Epirb Annual Testing: Parameters Condition Good NG
No ratings yet
Certificate - of 406 MHZ Epirb Annual Testing: Parameters Condition Good NG
3 pages
Mod Menu Crash 2024 02 27-12 00 09
No ratings yet
Mod Menu Crash 2024 02 27-12 00 09
3 pages
Chapter 1 & 2 7-18-2013
No ratings yet
Chapter 1 & 2 7-18-2013
15 pages
Building Information Modelling (Bim) For Facilities Management (FM) : The Mediacity Case Study Approach
No ratings yet
Building Information Modelling (Bim) For Facilities Management (FM) : The Mediacity Case Study Approach
21 pages
Research Office Confidential Reference
No ratings yet
Research Office Confidential Reference
3 pages
The Apogee AD-8000 8-Channel, 24-Bit Converter
No ratings yet
The Apogee AD-8000 8-Channel, 24-Bit Converter
6 pages
Nour Issa
No ratings yet
Nour Issa
6 pages
How To Make Micro-SIM From Usual SIM Card
No ratings yet
How To Make Micro-SIM From Usual SIM Card
1 page
Neural Networks and Fuzzy Logic
From Everand
Neural Networks and Fuzzy Logic
C. Naga Bhaskar
No ratings yet
Relationship Extraction: Fundamentals and Applications
From Everand
Relationship Extraction: Fundamentals and Applications
Fouad Sabry
No ratings yet
Semantic Network: Fundamentals and Applications
From Everand
Semantic Network: Fundamentals and Applications
Fouad Sabry
No ratings yet

Development and Application of A Metric On Semantic Nets: S1S1tM3, 17, I, 1707 Li

Uploaded by

Development and Application of A Metric On Semantic Nets: S1S1tM3, 17, I, 1707 Li

Uploaded by

ltbk I K A N S A L I I U N S U N S1S1tM3, M A N , A l V L J C I U t . K l Y t . I I C S . V U L . 17, NU.

Development and Application

0018-9472/89/OlO0-0017$01 .OO 01989 IEEE

were treated as missing values were unsatisfactory. Accord-

and on the definition of Distance between concepts repre- algorithm [32]:

then All of our experiments use human subjects as standard

KAUA er ui.: U ~ V ~ L U Y M ~ANN UI APYLILAIIUN vk A miuc ON SEMANTIC NETS 23

separated by 24 h. For each student, the Spearman p

of Quev C. Shortest Path Lengths in Mesh

respond to a feature comparison process as prescribed by

B. Distance Applied to Featural Models

experiments. To test the validity of this approach, we

these eye diseases. From the two sets of scores we derived

Proof of Theorem 2: To prove Theorem 2, we first < max {Distance( { U } , U ) ,

Therefore, for all U E U,

k Distance ( { U } , U ) > Distance( { U,, } ,U ) ,

+- Distance ( V,, 6). Distance ( U , U,,) = Distance ( U , U,, { }) .

Proof of Lemma 2: a) By putting VI = { U } , V, = U,, Distance( V,, +) = max {Distance( V I ,U ) }

R. Forsyth and R. Rada, Machine Leurniiig: Expert Systems m d

You might also like