PRP
PRP
PRP
Sumit Bhatia
Overview
1 Introduction
Information Retrieval
IR Models
Probability Basics
2 Probabilistic Ranking Principle
Document Ranking Problem
Probability Ranking Principle
3 The Binary Independence Model
4 OKAPI
5 Discussion
Problem!
Both Query and Document Representations are Uncertain
Probability Basics
Chain Rule:
P(A, B) = P(A ∩ B) = P(A|B)P(B) = P(B|A)P(A)
Partition Rule:
P(B) = P(A, B) + P(Ā, B)
Bayes Rule: h i
P(A|B) = P(B|A)P(A) = P(B|A)
P(A)
X ∈{A,Ā} P(B|X )P(X )
P
P(B)
Problem Statement
Given a set of documents D = {d1 , d2 , . . . , dn } and a query q, in
what order the subset of relevant documents
Dr = {dr 1 , dr 2 . . . , drm } should be returned to the user.
Problem Statement
Given a set of documents D = {d1 , d2 , . . . , dn } and a query q, in
what order the subset of relevant documents
Dr = {dr 1 , dr 2 . . . , drm } should be returned to the user.
Problem Statement
Given a set of documents D = {d1 , d2 , . . . , dn } and a query q, in
what order the subset of relevant documents
Dr = {dr 1 , dr 2 . . . , drm } should be returned to the user.
Theorem 1
PRP is optimal, in the sense that it minimizes the expected loss
(Bayes Risk) under 1/0 loss.
Theorem 1
PRP is optimal, in the sense that it minimizes the expected loss
(Bayes Risk) under 1/0 loss.
1
This is the assumption for PRP in general.
Sumit Bhatia Probabilistic Information Retrieval 11/23
Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion
1
This is the assumption for PRP in general.
Sumit Bhatia Probabilistic Information Retrieval 11/23
Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion
1
This is the assumption for PRP in general.
Sumit Bhatia Probabilistic Information Retrieval 11/23
Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion
Document Independent!
m P(d~t |R = 1, ~q )
O(R|~d, ~q ) ∝ Π (4)
~t |R = 0, ~q )
t=1 P(d
document R =1 R=0
Term present dt = 1 pt ut
Term absent dt = 0 1 − pt 1 − ut
Manipulating:
pt (1 − ut ) 1 − pt
O(R|~d, ~q ) ∝ Π . Πt:qt =1 (7)
t:dt =qt =1 ut (1 − pt ) 1 − ut
Manipulating:
pt (1 − ut ) 1 − pt
O(R|~d, ~q ) ∝ Π . Πt:qt =1 (7)
t:dt =qt =1 ut (1 − pt ) 1 − ut
Constant for a given query!
pt (1 − ut )
RSVd = log Π
t:dt =qt =1 ut (1 − pt )
(8)
X pt (1 − ut )
= log
ut (1 − pt )
t:dt =qt =1
(9)
pt (1 − ut )
RSVd = log Π
t:dt =qt =1 ut (1 − pt ) Docs R=1 R=0 Total
(8) di = 1 s n-s n
di = 0 S-s (N-n)-(S-s) N-n
X pt (1 − ut )
= log Total S N-S N
ut (1 − pt )
t:dt =qt =1
(9)
pt (1 − ut )
RSVd = log Π
t:dt =qt =1 ut (1 − pt ) Docs R=1 R=0 Total
(8) di = 1 s n-s n
di = 0 S-s (N-n)-(S-s) N-n
X pt (1 − ut )
= log Total S N-S N
ut (1 − pt )
t:dt =qt =1
(9)
substituting, we get:
X (s + 12 )/(S − s + 12 )
RSVd = log (10)
t:dt =qt =1
(n − s + 12 )/(N − n − S + s + 12 )
Observations
Observations
Observations
What Next?
What Next?
What Next?
What Next?
What Next?
What Next?
References
QUESTIONS???