Cranfield Test
Cranfield Test
3) One indexer straight from library school (fresh student) with neither indexing
experience nor subject knowledge.
Each indexer was asked to prepare index entries by following the above-mentioned four
indexing
C) Time-period used: Documents were indexed 5 times using variable indexing times (2,
4, 8, 12, 16 minutes). The system was run through three phases with a view to find out
whether the level of performance increases with the increasing experience of the
system personnel
D) Materials Indexed: 100 technical reports and articles on the general field of
aeronautics and the specialized field of high-speed aerodynamics.
2) The project used manufactured queries, which were formulated before the beginning
of the actual search. Altogether 400 queries were formulated and all were processed by
the system in each of the three phases. Thus the system worked on a total of 1200
search queries.
c) Results:
Alphabetical
1- The average recall index
ratios for the different81.5%
systems were as follows:
4) The success rate in the third group of 6000 items was 3-4% greater than the second
group suggesting that the documents were indexed better in the third group. That is to
say, trained indexers without indexing experience were able to perform consistently
good indexing job in spite of lack of subject knowledge.
Significance:
• Firstly, the test proved that the performance of a system does not depend on
the experience and. subject background of the indexer
• Secondly, it showed that systems where documents are organized by a
faceted classification scheme perform poorly in comparison to the
alphabetical index and uniterm system.
• it developed for the first time the methodologies that could be applied
successfully in evaluating information retrieval systems.
• Moreover, it also proved that recall and precision are the two most important'
parameters for determining the performance of information retrieval systems,
and that these two parameters are related inversely to each other
Criticisms:
One of the main criticisms against this study was the artificiality without much
relation to the real life situation. The questions used for the test were
'manufactured' from the source documents is very much artificial. In a real life
situation queries are not based on previous knowledge of availability of the
document in the store.
• Cranfield 2 -
• The drawbacks of Cranfield necessitated the conduct of further
studies. second stage of Cranfield studies, known as Cranfield 2,
began in 1963 and completed in 1966.
• The Cranfield 2 was a controlled experiment that attempted to
investigate the components of index languages and their effect
on the performance of retrieval systems.
• Some of the drawbacks of Cranfield I were eliminated in Cranfield
2 by bringing real-life situation in it and allowing feedback
mechanisms between the indexers and users.
Test collection :
1400 reports and research papers in the field of high-speed
aerodynamics and aircraft structures were taken as input.
• Query formulation :
• Each author of 200 selected research papers was asked to formulate
questions for which he had cited the reference(s) in the paper. The authors
were also asked to point out the documents that were not cited in their
works but might have been relevant for the question they had formulated.
The abstracts of the whole set of cited: references were sent to the authors
who were asked to assess the relevance of. these with .the questions they
had formulated. The authors identified 1961 papers to be fully or partially
relevant to the 279 questions obtained. Finally, 221 questions and 1400
documents were selected for the experiment.
• Indexing :
The documents were indexed in three different ways:
a) Each document was analysed and important concepts were selected and
recorded in a natural language.
b) Concepts denoted by the single words were listed :
c) Concepts with a weighting (ranging from 1 to 3) were combined to
represent the subject contents of the documents.
Five different types of indexing languages with variations were used in this study:
• Single term index language with 8 variations like uncontrolled natural language,
controlled synonyms, controlled word form, etc.
Searching :
To conduct searches 221 questions generated by the authors of the research papers
were used. Relevancy of each document was ascertained for the questions by grading
them from 1 to 4 in the following manner:
3) Useful, providing general background of the work or dealing with a specific area
The results of the Cranfield 2 tests were unexpected because the test performing index
languages were composed of uncontrolled single words occurring in documents. However,
the variables used in the study were subject to criticisms. Each index language consisted of
different units of words, phrases, or combinations of both. Both the documents and queries
were formed in the same way. Thus the matching of questions' to documents would
evaluate the relative effectiveness of the languages of different specificity. Vickery
comments that the measures used in the second Cranfield project do not adequately
characterise those aspects of retrieval performance, those are of operational importance.
REFERENCES-
• Foskett, A.e. (1996). Subject approach to information. 5 th Ed. London:
The Library Association.
• Chowdhury, GG (2004). Introduction to modern information retrieval.
2 nd Ed. London: Facet Publishing.