PRP

Download as pdf or txt
Download as pdf or txt
You are on page 1of 60

Introduction

Probabilistic Ranking Principle


The Binary Independence Model
OKAPI
Discussion

Probabilistic Information Retrieval

Sumit Bhatia

July 16, 2009

Sumit Bhatia Probabilistic Information Retrieval 1/23


Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Overview

1 Introduction
Information Retrieval
IR Models
Probability Basics
2 Probabilistic Ranking Principle
Document Ranking Problem
Probability Ranking Principle
3 The Binary Independence Model
4 OKAPI
5 Discussion

Sumit Bhatia Probabilistic Information Retrieval 2/23


Introduction
Probabilistic Ranking Principle Information Retrieval
The Binary Independence Model IR Models
OKAPI Probability Basics
Discussion

Information Retrieval(IR) Process

1 User has some information needs

Sumit Bhatia Probabilistic Information Retrieval 3/23


Introduction
Probabilistic Ranking Principle Information Retrieval
The Binary Independence Model IR Models
OKAPI Probability Basics
Discussion

Information Retrieval(IR) Process

1 User has some information needs


2 Information Need → Query using Query Representation

Sumit Bhatia Probabilistic Information Retrieval 3/23


Introduction
Probabilistic Ranking Principle Information Retrieval
The Binary Independence Model IR Models
OKAPI Probability Basics
Discussion

Information Retrieval(IR) Process

1 User has some information needs


2 Information Need → Query using Query Representation
3 Documents → Document Representation

Sumit Bhatia Probabilistic Information Retrieval 3/23


Introduction
Probabilistic Ranking Principle Information Retrieval
The Binary Independence Model IR Models
OKAPI Probability Basics
Discussion

Information Retrieval(IR) Process

1 User has some information needs


2 Information Need → Query using Query Representation
3 Documents → Document Representation
4 IR system matches the two representations to determine the
documents that satisfy user’s information needs.

Sumit Bhatia Probabilistic Information Retrieval 3/23


Introduction
Probabilistic Ranking Principle Information Retrieval
The Binary Independence Model IR Models
OKAPI Probability Basics
Discussion

Boolean Retrieval Model

Query = Boolean Expression of terms


ex. Mitra AND Giles

Sumit Bhatia Probabilistic Information Retrieval 4/23


Introduction
Probabilistic Ranking Principle Information Retrieval
The Binary Independence Model IR Models
OKAPI Probability Basics
Discussion

Boolean Retrieval Model

Query = Boolean Expression of terms


ex. Mitra AND Giles
Document = Term-document Matrix
Aij = 1 iff i th term is present in j th document.

Sumit Bhatia Probabilistic Information Retrieval 4/23


Introduction
Probabilistic Ranking Principle Information Retrieval
The Binary Independence Model IR Models
OKAPI Probability Basics
Discussion

Boolean Retrieval Model

Query = Boolean Expression of terms


ex. Mitra AND Giles
Document = Term-document Matrix
Aij = 1 iff i th term is present in j th document.
“Bag of words”

Sumit Bhatia Probabilistic Information Retrieval 4/23


Introduction
Probabilistic Ranking Principle Information Retrieval
The Binary Independence Model IR Models
OKAPI Probability Basics
Discussion

Boolean Retrieval Model

Query = Boolean Expression of terms


ex. Mitra AND Giles
Document = Term-document Matrix
Aij = 1 iff i th term is present in j th document.
“Bag of words”
No Ranking

Sumit Bhatia Probabilistic Information Retrieval 4/23


Introduction
Probabilistic Ranking Principle Information Retrieval
The Binary Independence Model IR Models
OKAPI Probability Basics
Discussion

Vector Space Model

Query = free text query


ex. Mitra Giles

Sumit Bhatia Probabilistic Information Retrieval 5/23


Introduction
Probabilistic Ranking Principle Information Retrieval
The Binary Independence Model IR Models
OKAPI Probability Basics
Discussion

Vector Space Model

Query = free text query


ex. Mitra Giles
Query and Document → vectors in “term space”

Sumit Bhatia Probabilistic Information Retrieval 5/23


Introduction
Probabilistic Ranking Principle Information Retrieval
The Binary Independence Model IR Models
OKAPI Probability Basics
Discussion

Vector Space Model

Query = free text query


ex. Mitra Giles
Query and Document → vectors in “term space”
Cosine similarity between query and document vectors
indicates similarity

Sumit Bhatia Probabilistic Information Retrieval 5/23


Introduction
Probabilistic Ranking Principle Information Retrieval
The Binary Independence Model IR Models
OKAPI Probability Basics
Discussion

Information Retrieval(IR) Process-Revisited

1 User has some information needs


2 Information Need → Query using Query Representation
3 Documents → Document Representation
4 IR system matches the two representations to determine the
documents that satisfy user’s information needs.

Sumit Bhatia Probabilistic Information Retrieval 6/23


Introduction
Probabilistic Ranking Principle Information Retrieval
The Binary Independence Model IR Models
OKAPI Probability Basics
Discussion

Information Retrieval(IR) Process-Revisited

1 User has some information needs


2 Information Need → Query using Query Representation
3 Documents → Document Representation
4 IR system matches the two representations to determine the
documents that satisfy user’s information needs.

Problem!
Both Query and Document Representations are Uncertain

Sumit Bhatia Probabilistic Information Retrieval 6/23


Introduction
Probabilistic Ranking Principle Information Retrieval
The Binary Independence Model IR Models
OKAPI Probability Basics
Discussion

Probability Basics

Chain Rule:
P(A, B) = P(A ∩ B) = P(A|B)P(B) = P(B|A)P(A)

Partition Rule:
P(B) = P(A, B) + P(Ā, B)

Bayes Rule: h i
P(A|B) = P(B|A)P(A) = P(B|A)
P(A)
X ∈{A,Ā} P(B|X )P(X )
P
P(B)

Sumit Bhatia Probabilistic Information Retrieval 7/23


Introduction
Probabilistic Ranking Principle
Document Ranking Problem
The Binary Independence Model
Probability Ranking Principle
OKAPI
Discussion

Document Ranking Problem

Problem Statement
Given a set of documents D = {d1 , d2 , . . . , dn } and a query q, in
what order the subset of relevant documents
Dr = {dr 1 , dr 2 . . . , drm } should be returned to the user.

Sumit Bhatia Probabilistic Information Retrieval 8/23


Introduction
Probabilistic Ranking Principle
Document Ranking Problem
The Binary Independence Model
Probability Ranking Principle
OKAPI
Discussion

Document Ranking Problem

Problem Statement
Given a set of documents D = {d1 , d2 , . . . , dn } and a query q, in
what order the subset of relevant documents
Dr = {dr 1 , dr 2 . . . , drm } should be returned to the user.

Hint: We want the best document to be at rank 1, second best to


be at rank 2 and so on.

Sumit Bhatia Probabilistic Information Retrieval 8/23


Introduction
Probabilistic Ranking Principle
Document Ranking Problem
The Binary Independence Model
Probability Ranking Principle
OKAPI
Discussion

Document Ranking Problem

Problem Statement
Given a set of documents D = {d1 , d2 , . . . , dn } and a query q, in
what order the subset of relevant documents
Dr = {dr 1 , dr 2 . . . , drm } should be returned to the user.

Hint: We want the best document to be at rank 1, second best to


be at rank 2 and so on.
Solution
Rank by probability of relevance of the document w.r.t.
information need (query).
=⇒ by P(R = 1|d, q)

Sumit Bhatia Probabilistic Information Retrieval 8/23


Introduction
Probabilistic Ranking Principle
Document Ranking Problem
The Binary Independence Model
Probability Ranking Principle
OKAPI
Discussion

Probability Ranking Principle

Probability Ranking Principle (Rijsbergen, 1979)


If a reference retrieval system’s response to each request is a
ranking of the documents in the collection in order of decreasing
probability of relevance to the user who submitted the request,
where the probabilities are estimated as accurately as possible on
the basis of whatever data have been made available to the system
for this purpose, the overall effectiveness of the system to its user
will be the best that is obtainable on the basis of those data.

Sumit Bhatia Probabilistic Information Retrieval 9/23


Introduction
Probabilistic Ranking Principle
Document Ranking Problem
The Binary Independence Model
Probability Ranking Principle
OKAPI
Discussion

Probability Ranking Principle

Probability Ranking Principle (Rijsbergen, 1979)


If a reference retrieval system’s response to each request is a
ranking of the documents in the collection in order of decreasing
probability of relevance to the user who submitted the request,
where the probabilities are estimated as accurately as possible on
the basis of whatever data have been made available to the system
for this purpose, the overall effectiveness of the system to its user
will be the best that is obtainable on the basis of those data.
Observation 1: PRP maximizes the mean probability at rank k.

Sumit Bhatia Probabilistic Information Retrieval 9/23


Introduction
Probabilistic Ranking Principle
Document Ranking Problem
The Binary Independence Model
Probability Ranking Principle
OKAPI
Discussion

Probability Ranking Principle

Case 1: 1/0 Loss =⇒ No selection/retrieval costs.

Sumit Bhatia Probabilistic Information Retrieval 10/23


Introduction
Probabilistic Ranking Principle
Document Ranking Problem
The Binary Independence Model
Probability Ranking Principle
OKAPI
Discussion

Probability Ranking Principle

Case 1: 1/0 Loss =⇒ No selection/retrieval costs.

Bayes’ Optimal Decision Rule:


d is relevant iff P(R = 1|d, q) > P(R = 0|d, q)

Sumit Bhatia Probabilistic Information Retrieval 10/23


Introduction
Probabilistic Ranking Principle
Document Ranking Problem
The Binary Independence Model
Probability Ranking Principle
OKAPI
Discussion

Probability Ranking Principle

Case 1: 1/0 Loss =⇒ No selection/retrieval costs.

Bayes’ Optimal Decision Rule:


d is relevant iff P(R = 1|d, q) > P(R = 0|d, q)

Theorem 1
PRP is optimal, in the sense that it minimizes the expected loss
(Bayes Risk) under 1/0 loss.

Sumit Bhatia Probabilistic Information Retrieval 10/23


Introduction
Probabilistic Ranking Principle
Document Ranking Problem
The Binary Independence Model
Probability Ranking Principle
OKAPI
Discussion

Probability Ranking Principle

Case 1: 1/0 Loss =⇒ No selection/retrieval costs.

Bayes’ Optimal Decision Rule:


d is relevant iff P(R = 1|d, q) > P(R = 0|d, q)

Theorem 1
PRP is optimal, in the sense that it minimizes the expected loss
(Bayes Risk) under 1/0 loss.

Case 2: PRP with differential retrieval costs

C1 .P(R = 1|d, q) + C0 .P(R = 0|d, q) ≤ C1 .P(R = 1|d ′ , q) + C0 .P(R =


0|d ′ , q)

Sumit Bhatia Probabilistic Information Retrieval 10/23


Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model (BIM)


Assumptions:
1 Binary: documents are represented as binary incidence vectors
of terms. d = {d1 , d2 , . . . , dn }
di = 1 iff term i is present in d, else it is 0.

1
This is the assumption for PRP in general.
Sumit Bhatia Probabilistic Information Retrieval 11/23
Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model (BIM)


Assumptions:
1 Binary: documents are represented as binary incidence vectors
of terms. d = {d1 , d2 , . . . , dn }
di = 1 iff term i is present in d, else it is 0.
2 Independence: terms occur in documents independent of
other documents.

1
This is the assumption for PRP in general.
Sumit Bhatia Probabilistic Information Retrieval 11/23
Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model (BIM)


Assumptions:
1 Binary: documents are represented as binary incidence vectors
of terms. d = {d1 , d2 , . . . , dn }
di = 1 iff term i is present in d, else it is 0.
2 Independence: terms occur in documents independent of
other documents.
3 Relevance of a document is independent of relevance of other
documents1

1
This is the assumption for PRP in general.
Sumit Bhatia Probabilistic Information Retrieval 11/23
Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model (BIM)


Assumptions:
1 Binary: documents are represented as binary incidence vectors
of terms. d = {d1 , d2 , . . . , dn }
di = 1 iff term i is present in d, else it is 0.
2 Independence: terms occur in documents independent of
other documents.
3 Relevance of a document is independent of relevance of other
documents1
Implications:
1 Many documents have the same representation.
2 No association between terms is considered.
1
This is the assumption for PRP in general.
Sumit Bhatia Probabilistic Information Retrieval 11/23
Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model (BIM)


We wish to compute P(R|d, q).
We do it in terms of term incidence vectors ~d and ~q .
We thus compute P(R|~d, ~q ).

Sumit Bhatia Probabilistic Information Retrieval 12/23


Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model (BIM)


We wish to compute P(R|d, q).
We do it in terms of term incidence vectors ~d and ~q .
We thus compute P(R|~d, ~q ).
Using Bayes’ Rule, we have:

P(~d|R = 1, ~q ) P(R = 1|~q )


P(R = 1|~d, ~q ) = (1)
P(~d|~q )

P(~d|R = 0, ~q ) P(R = 0|~q )


P(R = 0|~d, ~q ) = (2)
P(~d|~q )

Sumit Bhatia Probabilistic Information Retrieval 12/23


Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model (BIM)


We wish to compute P(R|d, q).
We do it in terms of term incidence vectors ~d and ~q .
We thus compute P(R|~d, ~q ).
Using Bayes’ Rule, we have:

P(~d|R = 1, ~q ) P(R = 1|~q )


P(R = 1|~d, ~q ) = (1)
P(~d|~q )

P(~d|R = 0, ~q ) P(R = 0|~q )


P(R = 0|~d, ~q ) = (2)
P(~d|~q )

Prior Relevance Probability

Sumit Bhatia Probabilistic Information Retrieval 12/23


Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model

Computing the Odd ratios, we get:

P(R = 1|~q ) P(~d|R = 1, ~q )


O(R|~d, ~q ) = × (3)
P(R = 0|~q ) P(~d|R = 0, ~q )

Sumit Bhatia Probabilistic Information Retrieval 13/23


Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model

Computing the Odd ratios, we get:

P(R = 1|~q ) P(~d|R = 1, ~q )


O(R|~d, ~q ) = × (3)
P(R = 0|~q ) P(~d|R = 0, ~q )

Document Independent!

Sumit Bhatia Probabilistic Information Retrieval 13/23


Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model

Computing the Odd ratios, we get:

P(R = 1|~q ) P(~d|R = 1, ~q )


O(R|~d, ~q ) = × (3)
P(R = 0|~q ) P(~d|R = 0, ~q )

Document Independent! What for the second term?

Sumit Bhatia Probabilistic Information Retrieval 13/23


Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model

Computing the Odd ratios, we get:

P(R = 1|~q ) P(~d|R = 1, ~q )


O(R|~d, ~q ) = × (3)
P(R = 0|~q ) P(~d|R = 0, ~q )

Document Independent! What for the second term?


Naive Bayes Assumption

Sumit Bhatia Probabilistic Information Retrieval 13/23


Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model

Computing the Odd ratios, we get:

P(R = 1|~q ) P(~d|R = 1, ~q )


O(R|~d, ~q ) = × (3)
P(R = 0|~q ) P(~d|R = 0, ~q )

Document Independent! What for the second term?


Naive Bayes Assumption

m P(d~t |R = 1, ~q )
O(R|~d, ~q ) ∝ Π (4)
~t |R = 0, ~q )
t=1 P(d

Sumit Bhatia Probabilistic Information Retrieval 13/23


Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model

Observation 1: A term is either present in a document or


not.

Sumit Bhatia Probabilistic Information Retrieval 14/23


Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model

Observation 1: A term is either present in a document or


not.
m P(~dt = 1|R = 1, ~q ) m P(~dt = 0|R = 1, ~q )
O(R|~d, ~q ) ∝ Π . Π
t:dt =1 P(~dt = 1|R = 0, ~q ) t:dt =0 P(~dt = 0|R = 0, ~q )
(5)

Sumit Bhatia Probabilistic Information Retrieval 14/23


Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model

Observation 1: A term is either present in a document or


not.
m P(~dt = 1|R = 1, ~q ) m P(~dt = 0|R = 1, ~q )
O(R|~d, ~q ) ∝ Π . Π
t:dt =1 P(~dt = 1|R = 0, ~q ) t:dt =0 P(~dt = 0|R = 0, ~q )
(5)

document R =1 R=0
Term present dt = 1 pt ut
Term absent dt = 0 1 − pt 1 − ut

Sumit Bhatia Probabilistic Information Retrieval 14/23


Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model

Assumption: A term not in query is equally likey to occur in


relevant and non-relevant documents.

Sumit Bhatia Probabilistic Information Retrieval 15/23


Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model

Assumption: A term not in query is equally likey to occur in


relevant and non-relevant documents.
pt 1 − pt
O(R|~d, ~q ) ∝ Π . Π (6)
t:dt =qt =1 ut t:dt =0,qt =1 1 − ut

Sumit Bhatia Probabilistic Information Retrieval 15/23


Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model

Assumption: A term not in query is equally likey to occur in


relevant and non-relevant documents.
pt 1 − pt
O(R|~d, ~q ) ∝ Π . Π (6)
t:dt =qt =1 ut t:dt =0,qt =1 1 − ut

Manipulating:

pt (1 − ut ) 1 − pt
O(R|~d, ~q ) ∝ Π . Πt:qt =1 (7)
t:dt =qt =1 ut (1 − pt ) 1 − ut

Sumit Bhatia Probabilistic Information Retrieval 15/23


Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model

Assumption: A term not in query is equally likey to occur in


relevant and non-relevant documents.
pt 1 − pt
O(R|~d, ~q ) ∝ Π . Π (6)
t:dt =qt =1 ut t:dt =0,qt =1 1 − ut

Manipulating:

pt (1 − ut ) 1 − pt
O(R|~d, ~q ) ∝ Π . Πt:qt =1 (7)
t:dt =qt =1 ut (1 − pt ) 1 − ut
Constant for a given query!

Sumit Bhatia Probabilistic Information Retrieval 15/23


Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model

pt (1 − ut )
RSVd = log Π
t:dt =qt =1 ut (1 − pt )

(8)
X pt (1 − ut )
= log
ut (1 − pt )
t:dt =qt =1
(9)

Sumit Bhatia Probabilistic Information Retrieval 16/23


Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model

pt (1 − ut )
RSVd = log Π
t:dt =qt =1 ut (1 − pt ) Docs R=1 R=0 Total
(8) di = 1 s n-s n
di = 0 S-s (N-n)-(S-s) N-n
X pt (1 − ut )
= log Total S N-S N
ut (1 − pt )
t:dt =qt =1
(9)

Sumit Bhatia Probabilistic Information Retrieval 16/23


Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model

pt (1 − ut )
RSVd = log Π
t:dt =qt =1 ut (1 − pt ) Docs R=1 R=0 Total
(8) di = 1 s n-s n
di = 0 S-s (N-n)-(S-s) N-n
X pt (1 − ut )
= log Total S N-S N
ut (1 − pt )
t:dt =qt =1
(9)
substituting, we get:

X (s + 12 )/(S − s + 12 )
RSVd = log (10)
t:dt =qt =1
(n − s + 12 )/(N − n − S + s + 12 )

Sumit Bhatia Probabilistic Information Retrieval 16/23


Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Observations

Probabilities for non-relevant documents can be approximated


by collection statistics.
(1 − ut ) (N − n) N
=⇒ log = log ≈ log = IDF !
ut n n

Sumit Bhatia Probabilistic Information Retrieval 17/23


Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Observations

Probabilities for non-relevant documents can be approximated


by collection statistics.
(1 − ut ) (N − n) N
=⇒ log = log ≈ log = IDF !
ut n n
It is not so simple for relevant documents /
– Estimating from known relevant documents (not always
known)
– Assuming pt = constant, equivalent to IDF weighting only

Sumit Bhatia Probabilistic Information Retrieval 17/23


Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Observations

Probabilities for non-relevant documents can be approximated


by collection statistics.
(1 − ut ) (N − n) N
=⇒ log = log ≈ log = IDF !
ut n n
It is not so simple for relevant documents /
– Estimating from known relevant documents (not always
known)
– Assuming pt = constant, equivalent to IDF weighting only
Difficulties in probability estimation and drastic assumptions
makes achieving performance difficult

Sumit Bhatia Probabilistic Information Retrieval 17/23


Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

OKAPI Weighting Scheme

BIM does not consider term frequencies and document length.

BM25 weighting scheme (Okapi weighting) by was developed


to build a probabilistic model sensitive to these quantities.

BM25 today is widely used and has shown good performance


in a number of practical systems.

Sumit Bhatia Probabilistic Information Retrieval 18/23


Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

OKAPI Weighting Scheme

X N (k1 + 1)tftd (k3 + 1)tftq



RSVd = log × ×
dft ld k3 + tftq
t∈q k1 ((1 − b) + b × ( )) + tftd
lav
where:
N is the total number of documents,
dft is the document frequency, i.e.,number of documents that contain the
term t,
tftd is the frequency of term t in document d,
tftq is the frequency of term t in query q,
ld is the length of document d,
lav is the average length of documents,
k1 , k3 and b are constants which are generally set to 2, 2 and .75
respectively.
Sumit Bhatia Probabilistic Information Retrieval 19/23
Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

What Next?

Similarity between terms and documents - is this sufficient?

Sumit Bhatia Probabilistic Information Retrieval 20/23


Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

What Next?

Similarity between terms and documents - is this sufficient?

JAVA: Coffee or Computer Language or Place?

Sumit Bhatia Probabilistic Information Retrieval 20/23


Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

What Next?

Similarity between terms and documents - is this sufficient?

JAVA: Coffee or Computer Language or Place?

Time and Location of user?

Sumit Bhatia Probabilistic Information Retrieval 20/23


Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

What Next?

Similarity between terms and documents - is this sufficient?

JAVA: Coffee or Computer Language or Place?

Time and Location of user?

Different users might want different documents for same


query?

Sumit Bhatia Probabilistic Information Retrieval 20/23


Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

What Next?

Maximum Marginal Relevance [CG98] – Rank documents so


as to minimize similarity between returned documents

Sumit Bhatia Probabilistic Information Retrieval 21/23


Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

What Next?

Maximum Marginal Relevance [CG98] – Rank documents so


as to minimize similarity between returned documents

Result Diversification [Wan09]


– Rank documents so as to maximize mean relevance, given a
variance level.
– Variance here determines the risk the user is willing to take

Sumit Bhatia Probabilistic Information Retrieval 21/23


Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

References

Carbonell, Jaime and Goldstein, Jade, The use of MMR,


diversity-based reranking for reordering documents and
producing summaries, SIGIR, 1998, pp. 335–336.
Christopher D. Manning, Prabhakar Raghavan, and Hinrich
Schütze, Introduction to information retrieval, Cambridge
University Press, 2008.
Jun Wang, Mean-variance analysis: A new document ranking
theory in information retrieval, Advances in Information
Retrieval, 2009, pp. 4–16.

Sumit Bhatia Probabilistic Information Retrieval 22/23


Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

QUESTIONS???

Sumit Bhatia Probabilistic Information Retrieval 23/23

You might also like