Probabilistic IR: Giorgio Gambosi
Probabilistic IR: Giorgio Gambosi
Probabilistic IR
Giorgio Gambosi
G.Gambosi: Probabilistic IR 1 / 47
Probabilistic Approach to IR Basic Probability Theory Probability Ranking Principle Appraisal&Extensions
Overview
1 Probabilistic Approach to IR
4 Appraisal&Extensions
G.Gambosi: Probabilistic IR 2 / 47
Probabilistic Approach to IR Basic Probability Theory Probability Ranking Principle Appraisal&Extensions
G.Gambosi: Probabilistic IR 4 / 47
Probabilistic Approach to IR Basic Probability Theory Probability Ranking Principle Appraisal&Extensions
G.Gambosi: Probabilistic IR 5 / 47
Probabilistic Approach to IR Basic Probability Theory Probability Ranking Principle Appraisal&Extensions
G.Gambosi: Probabilistic IR 6 / 47
Probabilistic Approach to IR Basic Probability Theory Probability Ranking Principle Appraisal&Extensions
G.Gambosi: Probabilistic IR 8 / 47
Probabilistic Approach to IR Basic Probability Theory Probability Ranking Principle Appraisal&Extensions
G.Gambosi: Probabilistic IR 11 / 47
Probabilistic Approach to IR Basic Probability Theory Probability Ranking Principle Appraisal&Extensions
PRP in brief
If the retrieved documents (w.r.t a query) are ranked
decreasingly on their probability of relevance, then the
effectiveness of the system will be the best that is obtainable
PRP in full
If [the IR] system’s response to each [query] is a ranking of the
documents [...] in order of decreasing probability of relevance
to the [query], where the probabilities are estimated as
accurately as possible on the basis of whatever data have been
made available to the system for this purpose, the overall
effectiveness of the system to its user will be the best that is
obtainable on the basis of those data
G.Gambosi: Probabilistic IR 12 / 47
Probabilistic Approach to IR Basic Probability Theory Probability Ranking Principle Appraisal&Extensions
G.Gambosi: Probabilistic IR 13 / 47
Probabilistic Approach to IR Basic Probability Theory Probability Ranking Principle Appraisal&Extensions
G.Gambosi: Probabilistic IR 14 / 47
Probabilistic Approach to IR Basic Probability Theory Probability Ranking Principle Appraisal&Extensions
G.Gambosi: Probabilistic IR 15 / 47
Probabilistic Approach to IR Basic Probability Theory Probability Ranking Principle Appraisal&Extensions
~)
P(R|d, q) is modeled using term incidence vectors as P(R|~x , q
P(~x |R = 1, q
~ )P(R = 1|~
q)
~) =
P(R = 1|~x , q
P(~x |~
q)
P(~x |R = 0, q
~ )P(R = 0|~
q)
~) =
P(R = 0|~x , q
P(~x |~
q)
P(~x |R = 1, q
~ ) and P(~x |R = 0, q
~ ): probability that if a
relevant or nonrelevant document is retrieved, then that
document’s representation is ~x
Use statistics about the document collection to estimate these
probabilities
G.Gambosi: Probabilistic IR 16 / 47
Probabilistic Approach to IR Basic Probability Theory Probability Ranking Principle Appraisal&Extensions
~ ) + P(R = 0|~x , q
P(R = 1|~x , q ~) = 1
G.Gambosi: Probabilistic IR 17 / 47
Probabilistic Approach to IR Basic Probability Theory Probability Ranking Principle Appraisal&Extensions
G.Gambosi: Probabilistic IR 18 / 47
Probabilistic Approach to IR Basic Probability Theory Probability Ranking Principle Appraisal&Extensions
So:
M
Y P(xt |R = 1, q
~)
q) ·
~ ) = O(R|~
O(R|~x , q
P(xt |R = 0, q
~)
t=1
G.Gambosi: Probabilistic IR 19 / 47
Probabilistic Approach to IR Basic Probability Theory Probability Ranking Principle Appraisal&Extensions
~ ) Y P(xt = 0|R = 1, q
Y P(xt = 1|R = 1, q ~)
~ ) = O(R|~
O(R|~x , q q )· ·
~)
P(xt = 1|R = 0, q ~)
P(xt = 0|R = 0, q
t:xt =1 t:xt =0
G.Gambosi: Probabilistic IR 20 / 47
Probabilistic Approach to IR Basic Probability Theory Probability Ranking Principle Appraisal&Extensions
G.Gambosi: Probabilistic IR 21 / 47
Probabilistic Approach to IR Basic Probability Theory Probability Ranking Principle Appraisal&Extensions
G.Gambosi: Probabilistic IR 22 / 47
Probabilistic Approach to IR Basic Probability Theory Probability Ranking Principle Appraisal&Extensions
G.Gambosi: Probabilistic IR 23 / 47
Probabilistic Approach to IR Basic Probability Theory Probability Ranking Principle Appraisal&Extensions
pt (1 − ut ) pt ut
ct = log = log − log
ut (1 − pt ) (1 − pt ) 1 − ut
The odds ratio is the ratio of two odds: (i) the odds of the
term appearing if the document is relevant (pt /(1 − pt )), and
(ii) the odds of the term appearing if the document is
nonrelevant (ut /(1 − ut ))
ct = 0: term has equal odds of appearing in relevant and
nonrelevant docs
ct positive: higher odds to appear in relevant documents
ct negative: higher odds to appear in nonrelevant documents
G.Gambosi: Probabilistic IR 24 / 47
Probabilistic Approach to IR Basic Probability Theory Probability Ranking Principle Appraisal&Extensions
pt ut
ct = log (1−p t)
− log 1−u t
functions as a term weight.
P
Retrieval status value for document d: RSVd = xt =qt =1 ct .
So BIM and vector space model are identical on an
operational level . . .
. . . except that the term weights are different.
In particular: we can use the same data structures (inverted
index etc) for the two models.
G.Gambosi: Probabilistic IR 25 / 47
Probabilistic Approach to IR Basic Probability Theory Probability Ranking Principle Appraisal&Extensions
G.Gambosi: Probabilistic IR 26 / 47
Probabilistic Approach to IR Basic Probability Theory Probability Ranking Principle Appraisal&Extensions
Avoiding zeros
G.Gambosi: Probabilistic IR 27 / 47
Probabilistic Approach to IR Basic Probability Theory Probability Ranking Principle Appraisal&Extensions
Exercise
G.Gambosi: Probabilistic IR 28 / 47
Probabilistic Approach to IR Basic Probability Theory Probability Ranking Principle Appraisal&Extensions
Simplifying assumption
1 − ut N − df t N
log = log ≈ log
ut df t df t
This results into
pt (1 − ut ) pt N
ct = log ≈ log + log
ut (1 − pt ) (1 − pt ) df t
G.Gambosi: Probabilistic IR 29 / 47
Probabilistic Approach to IR Basic Probability Theory Probability Ranking Principle Appraisal&Extensions
G.Gambosi: Probabilistic IR 30 / 47
Probabilistic Approach to IR Basic Probability Theory Probability Ranking Principle Appraisal&Extensions
G.Gambosi: Probabilistic IR 31 / 47
Probabilistic Approach to IR Basic Probability Theory Probability Ranking Principle Appraisal&Extensions
G.Gambosi: Probabilistic IR 33 / 47
Probabilistic Approach to IR Basic Probability Theory Probability Ranking Principle Appraisal&Extensions
G.Gambosi: Probabilistic IR 34 / 47
Probabilistic Approach to IR Basic Probability Theory Probability Ranking Principle Appraisal&Extensions
G.Gambosi: Probabilistic IR 35 / 47
Probabilistic Approach to IR Basic Probability Theory Probability Ranking Principle Appraisal&Extensions
G.Gambosi: Probabilistic IR 36 / 47
Probabilistic Approach to IR Basic Probability Theory Probability Ranking Principle Appraisal&Extensions
Role of parameter k1
G.Gambosi: Probabilistic IR 37 / 47
Probabilistic Approach to IR Basic Probability Theory Probability Ranking Principle Appraisal&Extensions
Exercise
G.Gambosi: Probabilistic IR 38 / 47
Probabilistic Approach to IR Basic Probability Theory Probability Ranking Principle Appraisal&Extensions
G.Gambosi: Probabilistic IR 39 / 47
Probabilistic Approach to IR Basic Probability Theory Probability Ranking Principle Appraisal&Extensions
Document length X
Ld = tf td
t
G.Gambosi: Probabilistic IR 40 / 47
Probabilistic Approach to IR Basic Probability Theory Probability Ranking Principle Appraisal&Extensions
Role of parameter b
G.Gambosi: Probabilistic IR 41 / 47
Probabilistic Approach to IR Basic Probability Theory Probability Ranking Principle Appraisal&Extensions
Exercise
G.Gambosi: Probabilistic IR 43 / 47
Probabilistic Approach to IR Basic Probability Theory Probability Ranking Principle Appraisal&Extensions
BM25 vs tf-idf
G.Gambosi: Probabilistic IR 44 / 47
Probabilistic Approach to IR Basic Probability Theory Probability Ranking Principle Appraisal&Extensions
X N
(k1 + 1)tf td (k3 + 1)tf tq
RSVd = log · ·
t∈q
df t k1 ((1 − b) + b × (Ld /Lave )) + tf td k3 + tf tq
G.Gambosi: Probabilistic IR 46 / 47