5 Retrieval Effectiveness
5 Retrieval Effectiveness
Effectiveness
Adama Science and Technology University
School of Electrical Engineering and Computing
Department of CSE
Dr. Mesfin Abebe Haile (2024)
Retrieval Effectiveness
Evaluation of IR systems,
Relevance judgement,
Performance measures:
Recall,
Precision,
Single-valued measures
etc.
Why System Evaluation?
3
Why System Evaluation?
4
Evaluation Criteria
Efficiency:
Time and space complexity:
Speed in terms of retrieval time and indexing time,
Speed of query processing,
The space taken by corpus vs. index file,
Index size: determine Index/corpus size ratio
Is there a need for compression?
Evaluation Criteria
Effectiveness:
How is a system capable of retrieving relevant documents from the
collection?
Is system X better than other systems?
User satisfaction: How “good” are the documents that are returned
as a response to user query?
Relevance of results to meet information need of users.
Types of Evaluation Strategies
System-centered evaluation:
Given documents, queries, and relevance judgments,
Try several variations of the system,
Measure which system returns the “best” hit list.
User-centered evaluation:
Given several users, and at least two retrieval systems:
Have each user try the same task on both systems,
Measure which system works the “best” for users information
need,
How to measure users satisfaction?
The Notion of Relevance Judgment
Not
retrieved
C D
“Type two error”
Retrieval of documents may result in:
False positive (Errors of omission): some irrelevant documents may be
retrieved by the system as relevant.
False negative (False drop or Errors of commission): some relevant
documents may not be retrieved by the system as irrelevant.
For many applications a good index should not permit any false drops, but
may permit a few false positives.
Measuring Retrieval Effectiveness
Relevant Not
relevant
Collection size = A+B+C+D
Relevant = A+C
Retrieved A B Retrieved = A+B
Not retrieved C D
| {Relevant} {Retrieved} |
Re call Relevant +
| {Relevant} | Relevant
Retrieved Retrieved
| {Relevant} {Retrieved} |
Pr ecision
| {Retrieved} |
Not Relevant + Not Retrieved
1 The ideal
Returns relevant
Precision
documents but
misses many Returns most relevant
useful ones too. documents but includes
0 lots of junk.
Recall 1
Exercise
Single value measures: may want a single value for each query
to evaluate performance:
Mean average precision at seen relevant documents.
Typically average performance over a large set of queries.
R- Precision
Precision at R-th relevant documents
F-Measure (2*PR/P+R)
E-Measure (1-Fm)
Problems with both precision and recall
03/28/24 19
Thank You !!!
03/28/24 20