Evaluation of Index
Evaluation of Index
manually constructed thesaurus such as INSPEC, this problem is resolved by the use
of parenthetical qualifiers, as in the pair of homographs, bonds (chemical) and bonds
(adhesive). However, this is hard to achieve automatically.
individual terms. The advantage in normalizing the vocabulary is that variant forms
are mapped into base expressions, thereby bringing consistency to the vocabulary. As
a result, the user does not have to worry about variant forms of a term. The obvious
disadvantage is that, in order to be used effectively, the user has to be well aware of
the normalization rules used. This is certainly nontrivial and often viewed as a major
hurdle during searching (Frost 1987). In contrast, normalization rules in automatic
thesaurus construction are simpler, seldom involving more than stoplist filters and
stemming. (These are discussed in more detail later.) However, this feature can be
regarded both as a weakness and a strength.
These different features of thesauri have been presented so that the reader is aware of
some of the current differences between manual and automatic thesaurus construction
methods. All features are not equally important and they should be weighed according
to the application for which the thesaurus is being designed. This section also gives an
idea of where further research is required. Given the growing abundance of largesized document databases, it is indeed important to be challenged by the gaps between
manual and automatic thesauri.
Cleveland and Cleveland (2001) suggest the following steps for thesaurus
construction:
Identify the subject field. The boundaries of the subject field should be clearly
defined.
Identify the nature of the literature to be indexes. Is it primarily journa l
literature? Or does it include books, reports, conference papers etc. Is it
retrospective or current?
Identify the users, what are their information needs? Will their question be
broad or specific?
Identify the structure; will it be pre-coordinated or post-coordinated system?
Consult published indexes, glossaries, dictionaries and other tools in the
subject areas for a raw vocabulary. This will increase the thesaurus designers
understanding of the terminology and semantic relationship in the field.
Cluster the terms.
Establish term relationships.
Review or refine for consistency
Invert the structured thesaurus to produce an alphabetical arrangement of
entries
Test the thesaurus
Evaluation of index
To determine whether an index is good or Bad, it has to be evaluated. The task of
index evaluation would assist in determining ho w effective an index is in terms of
how many document that contain a term can be retrieved; how many of the
retrieved item actually match the needs of the user. Is the index efficient? Related
to the foregoing is the Quality of indexes which is a function its effectiveness in
terms of it being able to provide the users what they want with minimum difficulties.
The qualities of a good indexer are also discussed.
Index quality is a product of factors resident in indexers and the information
systems. Some of these include:
When the index does the foregoing, then the index quality as a retrieval
mechanism is assured.
Some checklist under which indexes can be evaluated are:
the ratio of number of relevant documents retrieved to the total number of relevant
document in the collection.
Precision refers to the capability of the system to hold back unwanted documents
and to give the user wanted documents. The precision ratio is also a quantitative
ratio of the number of relevant documents retrieved to the total number of
documents retrieved. Precision ratio is measured as (100 R/L) meaning R= Number
of relevant document retrieved, L= Total number of documents retrieved in a
search. For instance, if 100 documents are retrieved and 70 of them are relevant to
users request, then the precision ratio is 70 to 100 (7/100). Precision for the
search should be 70 percent effective.
Parameters for index quality as follows:
Coverage should be complete.
Consistency in terms choices.
Term choices are appropriate to the nature of users
There should be adequacy of cross-references, but they are not overzealously
done.
Subheadings should truly reflect the main headings.
No incorrect or missing locators is allowed
No missing proper names.
There should be consistent alphabetization throughout text .
Misspellings are all corrected
There should be no indexing of same topic under different index terms
without proper cross-reference