0% found this document useful (0 votes)

30 views11 pages

Cranfield Test

The Cranfield Tests, conducted in the late 1950s and early 1960s, evaluated the efficiency of various indexing systems in information retrieval, with Cranfield 1 focusing on four indexing methods and their performance by different indexers. The results indicated that indexing experience did not significantly affect system performance, while Cranfield 2 improved upon the first by incorporating real-life scenarios and feedback mechanisms, revealing that uncontrolled single term indexing performed best. Overall, the studies highlighted the importance of recall and precision in retrieval systems, though they faced criticism for their artificiality and methodology.

Uploaded by

palakyadav.14feb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views11 pages

Cranfield Test

Uploaded by

palakyadav.14feb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 11

PRESENTED TO -DEPARTMENT OF LIBRARY AND INFORMATION SCIENCE

TOPIC- CRANFIELD TEST

PRESENTED BY- PALAK YADAV
MLISC.
THE CRANFIELD TESTS-
Cranfield 1
The first extensive evaluation of retrieval systems was carried out by the ASLIB with the financial
assistance from the National Science Foundation at the College of Aeronautics, Cranfield, UK,
under the supervision of C.W. Cleverdon in 1957.
Its objective was to investigate the comparative efficiency of four indexing systems: UDC, Faceted
Classification, Alphabetical Subject Headings List and Uniterm Indexing system.
• A) Input Elements-
Four indexing systems were taken into consideration:
• 1) An alphabetical subject catalogue based on a subject headings list and a set of rules for
construction of heading
• 2) A classified catalogue based on UDC with an alphabetical chain index to the class headings
constructed
• 3) A catalogue based on a Colon Classification and an alphabetical chain index to the class
headings constructed
• 4) A uniterm coordinate index controlled by an authority list of uniterms compiled during
indexing.
B) Indexers: Three indexers with varied backgrounds in proficiency in indexing were
taken:

1) One indexer with experience of indexing and subject knowledge of Aeronautics

2) One with experience of indexing but with little subject knowledge

3) One indexer straight from library school (fresh student) with neither indexing
experience nor subject knowledge.

Each indexer was asked to prepare index entries by following the above-mentioned four
indexing

C) Time-period used: Documents were indexed 5 times using variable indexing times (2,
4, 8, 12, 16 minutes). The system was run through three phases with a view to find out
whether the level of performance increases with the increasing experience of the
system personnel

D) Materials Indexed: 100 technical reports and articles on the general field of
aeronautics and the specialized field of high-speed aerodynamics.

E) Items Indexed: With 3 indexers X 4 indexing systems X 5 time periods X 3 runs =

18,000 items were indexed
b) Methodology:

1) A number of people from different organizations were asked to select documents

from the collection and in each case to frame a question to which that document would
be an answer.

2) The project used manufactured queries, which were formulated before the beginning
of the actual search. Altogether 400 queries were formulated and all were processed by
the system in each of the three phases. Thus the system worked on a total of 1200
search queries.

3) The questions were put to the indexers.

c) Results:
Alphabetical
1- The average recall index
ratios for the different81.5%
systems were as follows:

Colon classification 74%

UDC scheme 76%

Uniform indexing 82%

2) Increased time spent in indexing improved the chance of recall. Recall ratios for
different timing were as follows:

Time (in minutes) Recall (in %)

2 73
4 80
8 74
12 83
16 84

3) Success rates in retrieving documents in broad fields of Aeronautics were noted to be

4-5% better than those on the specialized field like high speed Aerodynamics.

4) The success rate in the third group of 6000 items was 3-4% greater than the second
group suggesting that the documents were indexed better in the third group. That is to
say, trained indexers without indexing experience were able to perform consistently
good indexing job in spite of lack of subject knowledge.
Significance:
• Firstly, the test proved that the performance of a system does not depend on
the experience and. subject background of the indexer
• Secondly, it showed that systems where documents are organized by a
faceted classification scheme perform poorly in comparison to the
alphabetical index and uniterm system.
• it developed for the first time the methodologies that could be applied
successfully in evaluating information retrieval systems.
• Moreover, it also proved that recall and precision are the two most important'
parameters for determining the performance of information retrieval systems,
and that these two parameters are related inversely to each other
Criticisms:
One of the main criticisms against this study was the artificiality without much
relation to the real life situation. The questions used for the test were
'manufactured' from the source documents is very much artificial. In a real life
situation queries are not based on previous knowledge of availability of the
document in the store.
• Cranfield 2 -
• The drawbacks of Cranfield necessitated the conduct of further
studies. second stage of Cranfield studies, known as Cranfield 2,
began in 1963 and completed in 1966.
• The Cranfield 2 was a controlled experiment that attempted to
investigate the components of index languages and their effect
on the performance of retrieval systems.
• Some of the drawbacks of Cranfield I were eliminated in Cranfield
2 by bringing real-life situation in it and allowing feedback
mechanisms between the indexers and users.
Test collection :
1400 reports and research papers in the field of high-speed
aerodynamics and aircraft structures were taken as input.
• Query formulation :
• Each author of 200 selected research papers was asked to formulate
questions for which he had cited the reference(s) in the paper. The authors
were also asked to point out the documents that were not cited in their
works but might have been relevant for the question they had formulated.
The abstracts of the whole set of cited: references were sent to the authors
who were asked to assess the relevance of. these with .the questions they
had formulated. The authors identified 1961 papers to be fully or partially
relevant to the 279 questions obtained. Finally, 221 questions and 1400
documents were selected for the experiment.
• Indexing :
The documents were indexed in three different ways:
a) Each document was analysed and important concepts were selected and
recorded in a natural language.
b) Concepts denoted by the single words were listed :
c) Concepts with a weighting (ranging from 1 to 3) were combined to
represent the subject contents of the documents.
Five different types of indexing languages with variations were used in this study:

• Single term index language with 8 variations like uncontrolled natural language,
controlled synonyms, controlled word form, etc.

• Simple concept index language with 15 variations by applying various controls.

• Controlled term index language with 6 similar variations.

• Title index with two variations.

• Index language generated from abstracts with 2 variations.

Searching :

To conduct searches 221 questions generated by the authors of the research papers
were used. Relevancy of each document was ascertained for the questions by grading
them from 1 to 4 in the following manner:

1) Complete answer to the question.

2) High degree of relevance

3) Useful, providing general background of the work or dealing with a specific area

4) Minimum interest, providing information like historical viewpoint.

• Results:
• A best performance result was obtained by the use of natural language single term
index, such as Uniterm, based on words occurring in documents text.
• It was suggested that the terms taken from documents may be used successfully with
minimum control in a post-coordinate index and it is helpful to eliminate synonyms. But
any measures taken to control the vocabulary are likely to decrease its efficiency.
• When broader and narrower terms were included along with the controlled languages of
the thesaurus, the performance worsened .
• Index languages formed out of titles performed better than those formed out of
abstracts.

Results and Conclusion :

The results of the Cranfield 2 tests were unexpected because the test performing index
languages were composed of uncontrolled single words occurring in documents. However,
the variables used in the study were subject to criticisms. Each index language consisted of
different units of words, phrases, or combinations of both. Both the documents and queries
were formed in the same way. Thus the matching of questions' to documents would
evaluate the relative effectiveness of the languages of different specificity. Vickery
comments that the measures used in the second Cranfield project do not adequately
characterise those aspects of retrieval performance, those are of operational importance.
REFERENCES-
• Foskett, A.e. (1996). Subject approach to information. 5 th Ed. London:
The Library Association.
• Chowdhury, GG (2004). Introduction to modern information retrieval.
2 nd Ed. London: Facet Publishing.

Development of Indexes Indexing
100% (1)
Development of Indexes Indexing
22 pages
Lis-311 Indexing and Abstracting: Lecture On
No ratings yet
Lis-311 Indexing and Abstracting: Lecture On
72 pages
Gis 1
No ratings yet
Gis 1
40 pages
ISE Information Retrieval Mod-V
No ratings yet
ISE Information Retrieval Mod-V
48 pages
Irs Unit III
No ratings yet
Irs Unit III
74 pages
Fidel, 1994
No ratings yet
Fidel, 1994
23 pages
Mod 4
No ratings yet
Mod 4
35 pages
Intro To LSA
No ratings yet
Intro To LSA
34 pages
Unit-3 Irs
No ratings yet
Unit-3 Irs
46 pages
Mix and Match Combining Terms and Operators For Successful Web Searches
No ratings yet
Mix and Match Combining Terms and Operators For Successful Web Searches
17 pages
Baker, Frank (1962) Information Retrieval Based Upon Latent Class Analysis
No ratings yet
Baker, Frank (1962) Information Retrieval Based Upon Latent Class Analysis
10 pages
Comparison of Different Dimensionality Reduction Methods For Information Retrieval and Text Mining
No ratings yet
Comparison of Different Dimensionality Reduction Methods For Information Retrieval and Text Mining
92 pages
Github Data Science Projects
No ratings yet
Github Data Science Projects
16 pages
Exercise Part B Unit 4
100% (2)
Exercise Part B Unit 4
5 pages
Irs Unit - V
No ratings yet
Irs Unit - V
6 pages
IRS Unit-2
No ratings yet
IRS Unit-2
63 pages
Designing and Building An Automatic Information Re
No ratings yet
Designing and Building An Automatic Information Re
7 pages
Information Retrieval Data Structures & Algorithms - William B. Frakes
No ratings yet
Information Retrieval Data Structures & Algorithms - William B. Frakes
630 pages
Introduction To Modern Information Retrieval (2nd Edition) : Ali Shiri
No ratings yet
Introduction To Modern Information Retrieval (2nd Edition) : Ali Shiri
3 pages
Modern Information Retrieval: A Brief Overview
No ratings yet
Modern Information Retrieval: A Brief Overview
9 pages
ISE Information Retrieval Mod-V (Uploaded by Snaptricks - In)
No ratings yet
ISE Information Retrieval Mod-V (Uploaded by Snaptricks - In)
48 pages
Anti-Serendipity: Finding Useless Documents and Similar Documents
No ratings yet
Anti-Serendipity: Finding Useless Documents and Similar Documents
9 pages
Csit1232 (2021 - 07 - 30 08 - 37 - 35 UTC)
No ratings yet
Csit1232 (2021 - 07 - 30 08 - 37 - 35 UTC)
11 pages
Lsa
No ratings yet
Lsa
17 pages
Information Retrieval Models
No ratings yet
Information Retrieval Models
4 pages
Chapter 4 IR
No ratings yet
Chapter 4 IR
56 pages
Buckley Et Al. - 2007 - Bias and The Limits of Pooling For Large Collections
No ratings yet
Buckley Et Al. - 2007 - Bias and The Limits of Pooling For Large Collections
16 pages
4 Indexing
No ratings yet
4 Indexing
59 pages
Indexing by Latent Semantic Analysis: Scott Deerwester
No ratings yet
Indexing by Latent Semantic Analysis: Scott Deerwester
17 pages
BTCOC702 New Cloud Computing
No ratings yet
BTCOC702 New Cloud Computing
1 page
IRS III Year UNIT-3 Part 1
50% (2)
IRS III Year UNIT-3 Part 1
18 pages
Performance Enhancement and Customization of Information Storage and Retrieval System
No ratings yet
Performance Enhancement and Customization of Information Storage and Retrieval System
32 pages
Testing Different Log Bases For Vector Model Weighting Technique
No ratings yet
Testing Different Log Bases For Vector Model Weighting Technique
15 pages
NLP - Module 5
No ratings yet
NLP - Module 5
58 pages
4 Indexing
No ratings yet
4 Indexing
29 pages
Indexing Database Systems
No ratings yet
Indexing Database Systems
5 pages
Effect of Query Formation On Web Search Engine Results
No ratings yet
Effect of Query Formation On Web Search Engine Results
6 pages
A Review of Business Intelligence and Analytics in Small and Medium Sized Enterprises
No ratings yet
A Review of Business Intelligence and Analytics in Small and Medium Sized Enterprises
24 pages
Irt Ans
No ratings yet
Irt Ans
9 pages
IRSUnit 2
No ratings yet
IRSUnit 2
21 pages
CSA Test Paper
No ratings yet
CSA Test Paper
65 pages
Information Retrieval (IR) Is The Science of
No ratings yet
Information Retrieval (IR) Is The Science of
10 pages
2008d Sigirforum Murdock
No ratings yet
2008d Sigirforum Murdock
4 pages
Modern Information Retrieval Amit Singhal
No ratings yet
Modern Information Retrieval Amit Singhal
9 pages
Example Based Search 2001
No ratings yet
Example Based Search 2001
7 pages
NLP Mod-V Q - A (Uploaded by Snaptricks - In)
No ratings yet
NLP Mod-V Q - A (Uploaded by Snaptricks - In)
7 pages
Information Survey
No ratings yet
Information Survey
35 pages
Relational Data Modal 11 lacture-WPS Office
No ratings yet
Relational Data Modal 11 lacture-WPS Office
24 pages
Associative Text Retrieval From A Large Document Collection Using Unorganized Neural Networks
No ratings yet
Associative Text Retrieval From A Large Document Collection Using Unorganized Neural Networks
10 pages
IR - Set 1
No ratings yet
IR - Set 1
5 pages
A Language Independent Approach To Develop URDUIR System
No ratings yet
A Language Independent Approach To Develop URDUIR System
10 pages
Letters To The Editor: 1. Theory
No ratings yet
Letters To The Editor: 1. Theory
2 pages
Study of Knowledge Discovery On The Web Using Fuzzy Approach
No ratings yet
Study of Knowledge Discovery On The Web Using Fuzzy Approach
7 pages
Design and Implementation of Electronic Library System
No ratings yet
Design and Implementation of Electronic Library System
9 pages
Dcs 7302
No ratings yet
Dcs 7302
17 pages
Information Retrieval System
No ratings yet
Information Retrieval System
4 pages
Information Retrieval Systems in Academic Libraries
No ratings yet
Information Retrieval Systems in Academic Libraries
5 pages
ADE Roadmap
No ratings yet
ADE Roadmap
28 pages
Information Retrieval Is A Complex Process Because There Is No Infallible Way To Provide A Direct Connection Between A User
No ratings yet
Information Retrieval Is A Complex Process Because There Is No Infallible Way To Provide A Direct Connection Between A User
4 pages
Multimedia
No ratings yet
Multimedia
4 pages
Sushant Tomar (12917704423) - MCA 3C AIML Assignment 2
No ratings yet
Sushant Tomar (12917704423) - MCA 3C AIML Assignment 2
11 pages
Ontology-Based Question Answering System
0% (1)
Ontology-Based Question Answering System
18 pages
ML 3
No ratings yet
ML 3
7 pages
Sun Dbms QB
No ratings yet
Sun Dbms QB
3 pages
QUIZ PollingQuestions
No ratings yet
QUIZ PollingQuestions
5 pages
6 - Venugopal - Tech Lead - 8 Yrs 6 Months
No ratings yet
6 - Venugopal - Tech Lead - 8 Yrs 6 Months
6 pages
Imp Q & A-2
No ratings yet
Imp Q & A-2
8 pages
Comprehensive Assignment
No ratings yet
Comprehensive Assignment
4 pages
dsm020 Coursework
No ratings yet
dsm020 Coursework
3 pages
Discriminative Deep Learning Based Hybrid Spectro-Temporal Features For Synthetic Voice Spoofing Detection
No ratings yet
Discriminative Deep Learning Based Hybrid Spectro-Temporal Features For Synthetic Voice Spoofing Detection
12 pages
Search Engine Using Apache Lucene
No ratings yet
Search Engine Using Apache Lucene
5 pages
5 Data Warehouse
No ratings yet
5 Data Warehouse
17 pages
ZS Hiring - Roles & Eligibility Criteria
No ratings yet
ZS Hiring - Roles & Eligibility Criteria
3 pages
2023 Toward The Third Generation Artificial Intelligenc
No ratings yet
2023 Toward The Third Generation Artificial Intelligenc
19 pages
Linux Commands and Sever Management
No ratings yet
Linux Commands and Sever Management
5 pages
BD Case Study
No ratings yet
BD Case Study
3 pages
About Similarity Check
No ratings yet
About Similarity Check
2 pages
Rashmi - Resume
No ratings yet
Rashmi - Resume
1 page
Anshu Dwivedi - DS
No ratings yet
Anshu Dwivedi - DS
1 page
A Comparative Analysis of Attention Mechanism in RNN-LSTMs For Improved Image Captioning Performance
No ratings yet
A Comparative Analysis of Attention Mechanism in RNN-LSTMs For Improved Image Captioning Performance
8 pages
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
From Everand
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
César Pérez López
No ratings yet
Control System Design: An Introduction to State-Space Methods
From Everand
Control System Design: An Introduction to State-Space Methods
Bernard Friedland
3/5 (2)
Parasitology Lab Exercises
From Everand
Parasitology Lab Exercises
John Janovy Jr.
3.5/5 (3)
Complex Variables for Scientists and Engineers: Second Edition
From Everand
Complex Variables for Scientists and Engineers: Second Edition
John D. Paliouras
5/5 (1)
Points of Departure: Rethinking Student Source Use and Writing Studies Research Methods
From Everand
Points of Departure: Rethinking Student Source Use and Writing Studies Research Methods
Tricia Serviss
No ratings yet
Quality metrics for semantic interoperability in Health Informatics
From Everand
Quality metrics for semantic interoperability in Health Informatics
Alberto Moreno Conde
No ratings yet
Applications of Combinatorial Optimization
From Everand
Applications of Combinatorial Optimization
Vangelis Th. Paschos
No ratings yet
Towards best practice in the Archetype Development Process
From Everand
Towards best practice in the Archetype Development Process
Alberto Moreno Conde
No ratings yet
The profession of air traffic controller operating safely and efficiently in a context of high reliability
From Everand
The profession of air traffic controller operating safely and efficiently in a context of high reliability
Anacna
No ratings yet
Forecasting with Time Series Analysis Methods and Applications
From Everand
Forecasting with Time Series Analysis Methods and Applications
Leonardo Guiliani
No ratings yet

Cranfield Test

Uploaded by

Cranfield Test

Uploaded by

PRESENTED TO -DEPARTMENT OF LIBRARY AND INFORMATION SCIENCE

TOPIC- CRANFIELD TEST

1) One indexer with experience of indexing and subject knowledge of Aeronautics

2) One with experience of indexing but with little subject knowledge

E) Items Indexed: With 3 indexers X 4 indexing systems X 5 time periods X 3 runs =

1) A number of people from different organizations were asked to select documents

3) The questions were put to the indexers.

Colon classification 74%

UDC scheme 76%

Uniform indexing 82%

Time (in minutes) Recall (in %)

3) Success rates in retrieving documents in broad fields of Aeronautics were noted to be

• Simple concept index language with 15 variations by applying various controls.

• Controlled term index language with 6 similar variations.

• Title index with two variations.

• Index language generated from abstracts with 2 variations.

1) Complete answer to the question.

2) High degree of relevance

4) Minimum interest, providing information like historical viewpoint.

Results and Conclusion :

You might also like