CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica

CAIM: Cerca i Anàlisi d’Informació Massiva
FIB, Grau en Enginyeria Informàtica
Slides by Marta Arias, José Luis Balcázar,

Ramon Ferrer-i-Cancho, Ricard Gavaldá
Department of Computer Science, UPC
Fall 2018
http://www.cs.upc.edu/~caim
1 / 18
4. Evaluation and Relevance Feedback
Evaluation of Information Retrieval Usage, I
What are we exactly to do?
In the Boolean model, the specification is unambiguous:

We know what we are to do:
Retrieve and provide to the user
all those documents
that satisfy the query.
But, is this what the user really wants?

Sorry, but usually. . . no.
3 / 18
Evaluation of Information Retrieval Usage, II
Then, what exactly are we to optimize?
Notation:
D: set of all our documents on which the user asks one query;
A: answer set: documents that the system retrieves as
answer;
R: relevant documents: those that the user actually wishes to
see as answer.
(But no one knows this set, not even the user!)
Unreachable goal: A = R, that is:

I P r(d ∈ A|d ∈ R) = 1 and
I P r(d ∈ R|d ∈ A) = 1.
4 / 18
The Recall and Precision measures
Let’s settle for:

|R∩A|
I high recall, :
|R|
P r(d ∈ A|d ∈ R) not too much below 1,

|R∩A|
I high precision, :
|A|
P r(d ∈ R|d ∈ A) not too much below 1.
Difficult balance. More later.
5 / 18
Recall and Precision, II
Example: test for tuberculosis (TB)
I 1000 people, out of which 50 have TB

I test is positive on 40 people, of which 35 really have TB
Recall
% of true TB that test positive = 35 / 50 = 70 %
Precision
% of positives that really have TB = 35 / 40 = 87.5 %
I Large recall: few sick people go away undetected
I Large precision: few people are scared unnecessarily (few
false alarms)
6 / 18
Recall and Precision, III. Confusion matrix
Equivalent definition
Confusion matrix
Answered
relevant not relevant
relevant tp fn
Reality
not relevant fp tn
|R∩A| tp
I |R| = tp + f n I Recall = = tp+f n
|R|
I |A| = tp + f p |R∩A| tp
I Precision = =
I |R ∩ A| = tp |A| tp+f p
7 / 18
How many documents to show?
We rank all documents according to some measure.

How many should we show?
I Users won’t read too large answers.
I Long answers are likely to exhibit low precision.
I Short answers are likely to exhibit low recall.
We analyze precision and recall as functions of the number of

documents k provided as answer.
8 / 18
Rank-recall and rank-precision plots
(Source: Prof. J. J. Paijmans, Tilburg)
9 / 18
A single “precision and recall” curve
x-axis for recall, and y-axis for precision.
(Similar to, and related to, the ROC curve in predictive models.)
(Source: Stanford NLP group)

Often: Plot 11 points of interpolated precision, at 0 %, 10 %,
20 %, . . . , 100 % recall
10 / 18
Other measures of effectiveness
I AUC: Area under the curve of the plots above, relative to

best possible
2
I F-measure: 1 1
+
recall precision
I Harmonic mean. Closer to min of both than arithmetic mean
2
I α-F-measure: α 1−α
+
recall precision
11 / 18
Other measures of effectiveness, II
Take into account the documents previously known to the user.
I Coverage:
|relevant & known & retrieved| / |relevant & known|
I Novelty:
|relevant & retrieved & UNknown| / |relevant & retrieved|
12 / 18
Relevance Feedback, I
Going beyond what the user asked for
The user relevance cycle:
1. Get a query q
2. Retrieve relevant documents for q
3. Show top k to user
4. Ask user to mark them as relevant / irrelevant
5. Use answers to refine q
6. If desired, go to 2
13 / 18
Relevance Feedback, II
How to create the new query?
Vector model: queries and documents are vectors

Given a query q, and a set of documents, split into relevant R
and nonrelevant N R sets, build a new query q 0 :
Rocchio’s Rule:
1 X 1 X
q0 = α · q + β · · d−γ· · d
|R| |N R|
d∈R d∈N R
I All vectors q and d’s must be normalized (e.g., unit length).

I Weights α, β, γ, scalars, with α > β > γ ≥ 0; often γ = 0.
α: degree of trust on the original user’s query,
β: weight of positive information (terms that do not appear on
the query but do appear in relevant documents),
γ: weight of negative information.
14 / 18
Relevance Feedback, III
In practice, often:
I good improvement of the recall for first round,
I marginal for second round,
I almost none beyond.
In web search, precision matters much more than recall, so the

extra computation time and user patience may not be
productive.
15 / 18
Relevance Feedback, IV
. . . as Query Expansion
It is a form of Query Expansion:
The new query has non-zero weights on words

that were not in the original query
16 / 18
Pseudorelevance feedback
Do not ask anything from the user!
I User patience is precious resource. They’ll just walk away.

I Assume you did great in answering the query!
I That is, top-k documents in the answer are all relevant
I No interaction with user
I But don’t forget that the search will feel slower.
I Stop, at the latest, when you get the same top k
documents.
17 / 18
Pseudorelevance feedback, II
Alternative sources of feedback / query refinement:
I Links clicked / not clicked on.

I Think time / time spent looking at item.
I User’s previous history.
I Other users’ preferences!
I Co-occurring words: Add words that often occur with words
in the query - for query expansion.
18 / 18

CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica

Uploaded by

Copyright:

Available Formats

CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica

Uploaded by

Copyright:

Available Formats

CAIM: Cerca i Anàlisi d’Informació Massiva

FIB, Grau en Enginyeria Informàtica

Slides by Marta Arias, José Luis Balcázar,

In the Boolean model, the specification is unambiguous:

But, is this what the user really wants?

Unreachable goal: A = R, that is:

Let’s settle for:

P r(d ∈ A|d ∈ R) not too much below 1,

P r(d ∈ R|d ∈ A) not too much below 1.

Difficult balance. More later.

I 1000 people, out of which 50 have TB

We rank all documents according to some measure.

We analyze precision and recall as functions of the number of

(Source: Prof. J. J. Paijmans, Tilburg)

(Source: Stanford NLP group)

I AUC: Area under the curve of the plots above, relative to

Take into account the documents previously known to the user.

The user relevance cycle:

Vector model: queries and documents are vectors

I All vectors q and d’s must be normalized (e.g., unit length).

In web search, precision matters much more than recall, so the

It is a form of Query Expansion:

The new query has non-zero weights on words

Do not ask anything from the user!

I User patience is precious resource. They’ll just walk away.

Alternative sources of feedback / query refinement:

I Links clicked / not clicked on.

You might also like