Relevance Feedback: Improving Results
Relevance Feedback: Improving Results
Relevance Feedback
Improving results
For high recall. E.g., searching for aircraft doesn’t match
with plane; nor thermodynamic with heat
Options for improving results…
Global methods
Query expansion
Thesauri
Automatic thesaurus generation
Local methods
Relevance feedback
Pseudo relevance feedback
Introduction to Information Retrieval Sec. 9.1
Relevance Feedback
Relevance Feedback
Introduction to Information Retrieval Sec. 9.1.1
Initial query/results
Initial query: New space satellite applications
+ 1. 0.539, 08/13/91, NASA Hasn’t Scrapped Imaging Spectrometer
+ 2. 0.533, 07/09/91, NASA Scratches Environment Gear From Satellite Plan
3. 0.528, 04/04/90, Science Panel Backs NASA Satellite Plan, But Urges Launches of Smaller
Probes
4. 0.526, 09/09/91, A NASA Satellite Project Accomplishes Incredible Feat: Staying Within
Budget
5. 0.525, 07/24/90, Scientist Who Exposed Global Warming Proposes Satellites for Climate
Research
6. 0.524, 08/22/90, Report Provides Support for the Critics Of Using Big Satellites to Study
Climate
7. 0.516, 04/13/87, Arianespace Receives Satellite Launch Pact From Telesat Canada
+ 8. 0.509, 12/02/87, Telecommunications Tale of Two Companies
User then marks relevant documents with “+”.
Introduction to Information Retrieval Sec. 9.1.1
Rocchio Algorithm
The Rocchio algorithm uses the vector space model
to pick a relevance feedback query
Rocchio seeks the query qopt that maximizes
qopt arg max [cos( q , (Cr )) cos(q , (Cnr ))]
q
x x
x x
o x x
x x x x
x x
o x
o x x
o x
o o
x
x
x non-relevant documents
Optimal
query o relevant documents
Introduction to Information Retrieval Sec. 9.1.1
Subtleties to note
Tradeoff α vs. β/γ : If we have a lot of judged
documents, we want a higher β/γ.
Some weights in query vector can go negative
Negative term weights are ignored (set to 0)
Introduction to Information Retrieval Sec. 9.1.1
Violation of A1
Violation of A2
Query Expansion
Query assist
Manual thesaurus
E.g. MedLine: physician, syn: doc, doctor, MD, medico
Can be query rather than just synonyms
Global Analysis: (static; of all documents in collection)
Automatically derived thesaurus
(co-occurrence statistics)
Refinements based on query log mining
Common on the web
Local Analysis: (dynamic)
Analysis of documents in result set
Introduction to Information Retrieval Sec. 9.2.2
Co-occurrence Thesaurus
Simplest way to compute one is based on term-term similarities in C = AAT where A is term-document matrix.
wi,j = (normalized) weight for (ti ,dj)
ti What does C
contain if A is
a term-doc
incidence
M (0/1) matrix?
Introduction to Information Retrieval Sec. 9.2.3
Resources
IIR Ch 9
MG Ch. 4.7
MIR Ch. 5.2 – 5.4