Probabilistic Model
Probabilistic Model
2019-03-28
Overview
1 Recap
2 Probabilistic Approach to IR
5 Appraisal&Extensions
Rocchio illustrated
x
~qopt µ ~ NR x
~R − µ
x x
µ
~R
x
x
µ
~ NR
Take-away today
Boolean model
Probabilistic models support ranking and thus are better than
the simple Boolean model.
Vector space model
The vector space model is also a formally defined model that
supports ranking.
Why would we want to look for an alternative to the vector
space model?
PRP in brief
If the retrieved documents (w.r.t a query) are ranked
decreasingly on their probability of relevance, then the
effectiveness of the system will be the best that is obtainable
PRP in full
If [the IR] system’s response to each [query] is a ranking of the
documents [...] in order of decreasing probability of relevance
to the [query], where the probabilities are estimated as
accurately as possible on the basis of whatever data have been
made available to the system for this purpose, the overall
effectiveness of the system to its user will be the best that is
obtainable on the basis of those data
So:
M
Y P(xt |R = 1, ~q )
O(R|~x , ~q ) = O(R|~q ) ·
t=1
P(xt |R = 0, ~q )
Exercise
pt (1 − ut ) pt ut
ct = log = log − log
ut (1 − pt ) (1 − pt ) 1 − ut
The odds ratio is the ratio of two odds: (i) the odds of the
term appearing if the document is relevant (pt /(1 − pt )), and
(ii) the odds of the term appearing if the document is
irrelevant (ut /(1 − ut ))
ct = 0: term has equal odds of appearing in relevant and
irrelevant docs
ct positive: higher odds to appear in relevant documents
ct negative: higher odds to appear in irrelevant documents
pt ut
ct = log (1−p t)
− log 1−u t
functions as a term weight.
Retrieval status value for document d: RSVd = xt =qt =1 ct .
P
Avoiding zeros
Exercise
Simplifying assumption
N (k1 + 1)tf td
X
RSVd = log ·
t∈q df t k 1 ((1 − b) + b × (Ld /Lave )) + tf td
Exercise
Take-away today
Resources
Chapter 11 of IIR
Resources at https://www.fi.muni.cz/~sojka/PV211/
and http://cislmu.org, materials in MU IS and FI MU
library