0% found this document useful (0 votes)

157 views60 pages

PRP

The document discusses probabilistic information retrieval and probabilistic ranking models. It introduces key concepts like the boolean retrieval model, vector space model, and probability ranking principle. The probability ranking principle states that documents should be ranked by the probability of relevance to the user's information need. The document then covers the binary independence retrieval model and the OKAPI retrieval model in more detail.

Uploaded by

Mian Ahmad Zeb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

157 views60 pages

PRP

Uploaded by

Mian Ahmad Zeb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 60

Introduction

Probabilistic Ranking Principle

The Binary Independence Model
OKAPI
Discussion

Probabilistic Information Retrieval

Sumit Bhatia

July 16, 2009

Sumit Bhatia Probabilistic Information Retrieval 1/23

Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Overview

1 Introduction
Information Retrieval
IR Models
Probability Basics
2 Probabilistic Ranking Principle
Document Ranking Problem
Probability Ranking Principle
3 The Binary Independence Model
4 OKAPI
5 Discussion

Sumit Bhatia Probabilistic Information Retrieval 2/23

Introduction
Probabilistic Ranking Principle Information Retrieval
The Binary Independence Model IR Models
OKAPI Probability Basics
Discussion

Information Retrieval(IR) Process

1 User has some information needs

Sumit Bhatia Probabilistic Information Retrieval 3/23

Introduction
Probabilistic Ranking Principle Information Retrieval
The Binary Independence Model IR Models
OKAPI Probability Basics
Discussion

Information Retrieval(IR) Process

1 User has some information needs

2 Information Need → Query using Query Representation

Sumit Bhatia Probabilistic Information Retrieval 3/23

Introduction
Probabilistic Ranking Principle Information Retrieval
The Binary Independence Model IR Models
OKAPI Probability Basics
Discussion

Information Retrieval(IR) Process

1 User has some information needs

2 Information Need → Query using Query Representation
3 Documents → Document Representation

Sumit Bhatia Probabilistic Information Retrieval 3/23

Introduction
Probabilistic Ranking Principle Information Retrieval
The Binary Independence Model IR Models
OKAPI Probability Basics
Discussion

Information Retrieval(IR) Process

1 User has some information needs

2 Information Need → Query using Query Representation
3 Documents → Document Representation
4 IR system matches the two representations to determine the
documents that satisfy user’s information needs.

Sumit Bhatia Probabilistic Information Retrieval 3/23

Introduction
Probabilistic Ranking Principle Information Retrieval
The Binary Independence Model IR Models
OKAPI Probability Basics
Discussion

Boolean Retrieval Model

Query = Boolean Expression of terms

ex. Mitra AND Giles

Sumit Bhatia Probabilistic Information Retrieval 4/23

Introduction
Probabilistic Ranking Principle Information Retrieval
The Binary Independence Model IR Models
OKAPI Probability Basics
Discussion

Boolean Retrieval Model

Query = Boolean Expression of terms

ex. Mitra AND Giles
Document = Term-document Matrix
Aij = 1 iff i th term is present in j th document.

Sumit Bhatia Probabilistic Information Retrieval 4/23

Introduction
Probabilistic Ranking Principle Information Retrieval
The Binary Independence Model IR Models
OKAPI Probability Basics
Discussion

Boolean Retrieval Model

Query = Boolean Expression of terms

ex. Mitra AND Giles
Document = Term-document Matrix
Aij = 1 iff i th term is present in j th document.
“Bag of words”

Sumit Bhatia Probabilistic Information Retrieval 4/23

Introduction
Probabilistic Ranking Principle Information Retrieval
The Binary Independence Model IR Models
OKAPI Probability Basics
Discussion

Boolean Retrieval Model

Query = Boolean Expression of terms

ex. Mitra AND Giles
Document = Term-document Matrix
Aij = 1 iff i th term is present in j th document.
“Bag of words”
No Ranking

Sumit Bhatia Probabilistic Information Retrieval 4/23

Introduction
Probabilistic Ranking Principle Information Retrieval
The Binary Independence Model IR Models
OKAPI Probability Basics
Discussion

Vector Space Model

Query = free text query

ex. Mitra Giles

Sumit Bhatia Probabilistic Information Retrieval 5/23

Introduction
Probabilistic Ranking Principle Information Retrieval
The Binary Independence Model IR Models
OKAPI Probability Basics
Discussion

Vector Space Model

Query = free text query

ex. Mitra Giles
Query and Document → vectors in “term space”

Sumit Bhatia Probabilistic Information Retrieval 5/23

Introduction
Probabilistic Ranking Principle Information Retrieval
The Binary Independence Model IR Models
OKAPI Probability Basics
Discussion

Vector Space Model

Query = free text query

ex. Mitra Giles
Query and Document → vectors in “term space”
Cosine similarity between query and document vectors
indicates similarity

Sumit Bhatia Probabilistic Information Retrieval 5/23

Introduction
Probabilistic Ranking Principle Information Retrieval
The Binary Independence Model IR Models
OKAPI Probability Basics
Discussion

Information Retrieval(IR) Process-Revisited

1 User has some information needs

Sumit Bhatia Probabilistic Information Retrieval 6/23

Introduction
Probabilistic Ranking Principle Information Retrieval
The Binary Independence Model IR Models
OKAPI Probability Basics
Discussion

Information Retrieval(IR) Process-Revisited

1 User has some information needs

Problem!
Both Query and Document Representations are Uncertain

Sumit Bhatia Probabilistic Information Retrieval 6/23

Introduction
Probabilistic Ranking Principle Information Retrieval
The Binary Independence Model IR Models
OKAPI Probability Basics
Discussion

Probability Basics

Chain Rule:
P(A, B) = P(A ∩ B) = P(A|B)P(B) = P(B|A)P(A)

Partition Rule:
P(B) = P(A, B) + P(Ā, B)

Bayes Rule: h i
P(A|B) = P(B|A)P(A) = P(B|A)
P(A)
X ∈{A,Ā} P(B|X )P(X )
P
P(B)

Sumit Bhatia Probabilistic Information Retrieval 7/23

Introduction
Probabilistic Ranking Principle
Document Ranking Problem
The Binary Independence Model
Probability Ranking Principle
OKAPI
Discussion

Document Ranking Problem

Problem Statement
Given a set of documents D = {d1 , d2 , . . . , dn } and a query q, in
what order the subset of relevant documents
Dr = {dr 1 , dr 2 . . . , drm } should be returned to the user.

Sumit Bhatia Probabilistic Information Retrieval 8/23

Introduction
Probabilistic Ranking Principle
Document Ranking Problem
The Binary Independence Model
Probability Ranking Principle
OKAPI
Discussion

Document Ranking Problem

Problem Statement
Given a set of documents D = {d1 , d2 , . . . , dn } and a query q, in
what order the subset of relevant documents
Dr = {dr 1 , dr 2 . . . , drm } should be returned to the user.

Hint: We want the best document to be at rank 1, second best to

be at rank 2 and so on.

Sumit Bhatia Probabilistic Information Retrieval 8/23

Introduction
Probabilistic Ranking Principle
Document Ranking Problem
The Binary Independence Model
Probability Ranking Principle
OKAPI
Discussion

Document Ranking Problem

Problem Statement
Given a set of documents D = {d1 , d2 , . . . , dn } and a query q, in
what order the subset of relevant documents
Dr = {dr 1 , dr 2 . . . , drm } should be returned to the user.

Hint: We want the best document to be at rank 1, second best to

be at rank 2 and so on.
Solution
Rank by probability of relevance of the document w.r.t.
information need (query).
=⇒ by P(R = 1|d, q)

Sumit Bhatia Probabilistic Information Retrieval 8/23

Introduction
Probabilistic Ranking Principle
Document Ranking Problem
The Binary Independence Model
Probability Ranking Principle
OKAPI
Discussion

Probability Ranking Principle

Probability Ranking Principle (Rijsbergen, 1979)

If a reference retrieval system’s response to each request is a
ranking of the documents in the collection in order of decreasing
probability of relevance to the user who submitted the request,
where the probabilities are estimated as accurately as possible on
the basis of whatever data have been made available to the system
for this purpose, the overall effectiveness of the system to its user
will be the best that is obtainable on the basis of those data.

Sumit Bhatia Probabilistic Information Retrieval 9/23

Introduction
Probabilistic Ranking Principle
Document Ranking Problem
The Binary Independence Model
Probability Ranking Principle
OKAPI
Discussion

Probability Ranking Principle

Probability Ranking Principle (Rijsbergen, 1979)

Sumit Bhatia Probabilistic Information Retrieval 9/23

Introduction
Probabilistic Ranking Principle
Document Ranking Problem
The Binary Independence Model
Probability Ranking Principle
OKAPI
Discussion

Probability Ranking Principle

Case 1: 1/0 Loss =⇒ No selection/retrieval costs.

Sumit Bhatia Probabilistic Information Retrieval 10/23

Introduction
Probabilistic Ranking Principle
Document Ranking Problem
The Binary Independence Model
Probability Ranking Principle
OKAPI
Discussion

Probability Ranking Principle

Case 1: 1/0 Loss =⇒ No selection/retrieval costs.

Bayes’ Optimal Decision Rule:

d is relevant iff P(R = 1|d, q) > P(R = 0|d, q)

Sumit Bhatia Probabilistic Information Retrieval 10/23

Introduction
Probabilistic Ranking Principle
Document Ranking Problem
The Binary Independence Model
Probability Ranking Principle
OKAPI
Discussion

Probability Ranking Principle

Case 1: 1/0 Loss =⇒ No selection/retrieval costs.

Bayes’ Optimal Decision Rule:

d is relevant iff P(R = 1|d, q) > P(R = 0|d, q)

Theorem 1
PRP is optimal, in the sense that it minimizes the expected loss
(Bayes Risk) under 1/0 loss.

Sumit Bhatia Probabilistic Information Retrieval 10/23

Introduction
Probabilistic Ranking Principle
Document Ranking Problem
The Binary Independence Model
Probability Ranking Principle
OKAPI
Discussion

Probability Ranking Principle

Case 1: 1/0 Loss =⇒ No selection/retrieval costs.

Bayes’ Optimal Decision Rule:

d is relevant iff P(R = 1|d, q) > P(R = 0|d, q)

Theorem 1
PRP is optimal, in the sense that it minimizes the expected loss
(Bayes Risk) under 1/0 loss.

Case 2: PRP with differential retrieval costs

C1 .P(R = 1|d, q) + C0 .P(R = 0|d, q) ≤ C1 .P(R = 1|d ′ , q) + C0 .P(R =

0|d ′ , q)

Sumit Bhatia Probabilistic Information Retrieval 10/23

Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model (BIM)

Assumptions:
1 Binary: documents are represented as binary incidence vectors
of terms. d = {d1 , d2 , . . . , dn }
di = 1 iff term i is present in d, else it is 0.

1
This is the assumption for PRP in general.
Sumit Bhatia Probabilistic Information Retrieval 11/23
Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model (BIM)

1
This is the assumption for PRP in general.
Sumit Bhatia Probabilistic Information Retrieval 11/23
Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model (BIM)

1
This is the assumption for PRP in general.
Sumit Bhatia Probabilistic Information Retrieval 11/23
Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model (BIM)

Assumptions:
1 Binary: documents are represented as binary incidence vectors
of terms. d = {d1 , d2 , . . . , dn }
di = 1 iff term i is present in d, else it is 0.
2 Independence: terms occur in documents independent of
other documents.
3 Relevance of a document is independent of relevance of other
documents1
Implications:
1 Many documents have the same representation.
2 No association between terms is considered.
1
This is the assumption for PRP in general.
Sumit Bhatia Probabilistic Information Retrieval 11/23
Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model (BIM)

We wish to compute P(R|d, q).
We do it in terms of term incidence vectors ~d and ~q .
We thus compute P(R|~d, ~q ).

Sumit Bhatia Probabilistic Information Retrieval 12/23

Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model (BIM)

We wish to compute P(R|d, q).
We do it in terms of term incidence vectors ~d and ~q .
We thus compute P(R|~d, ~q ).
Using Bayes’ Rule, we have:

P(~d|R = 1, ~q ) P(R = 1|~q )

P(R = 1|~d, ~q ) = (1)
P(~d|~q )

P(~d|R = 0, ~q ) P(R = 0|~q )

P(R = 0|~d, ~q ) = (2)
P(~d|~q )

Sumit Bhatia Probabilistic Information Retrieval 12/23

Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model (BIM)

We wish to compute P(R|d, q).
We do it in terms of term incidence vectors ~d and ~q .
We thus compute P(R|~d, ~q ).
Using Bayes’ Rule, we have:

P(~d|R = 1, ~q ) P(R = 1|~q )

P(R = 1|~d, ~q ) = (1)
P(~d|~q )

P(~d|R = 0, ~q ) P(R = 0|~q )

P(R = 0|~d, ~q ) = (2)
P(~d|~q )

Prior Relevance Probability

Sumit Bhatia Probabilistic Information Retrieval 12/23

Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model

Computing the Odd ratios, we get:

P(R = 1|~q ) P(~d|R = 1, ~q )

O(R|~d, ~q ) = × (3)
P(R = 0|~q ) P(~d|R = 0, ~q )

Sumit Bhatia Probabilistic Information Retrieval 13/23

Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model

Computing the Odd ratios, we get:

P(R = 1|~q ) P(~d|R = 1, ~q )

O(R|~d, ~q ) = × (3)
P(R = 0|~q ) P(~d|R = 0, ~q )

Document Independent!

Sumit Bhatia Probabilistic Information Retrieval 13/23

Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model

Computing the Odd ratios, we get:

P(R = 1|~q ) P(~d|R = 1, ~q )

O(R|~d, ~q ) = × (3)
P(R = 0|~q ) P(~d|R = 0, ~q )

Document Independent! What for the second term?

Sumit Bhatia Probabilistic Information Retrieval 13/23

Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model

Computing the Odd ratios, we get:

P(R = 1|~q ) P(~d|R = 1, ~q )

O(R|~d, ~q ) = × (3)
P(R = 0|~q ) P(~d|R = 0, ~q )

Document Independent! What for the second term?

Naive Bayes Assumption

Sumit Bhatia Probabilistic Information Retrieval 13/23

Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model

Computing the Odd ratios, we get:

P(R = 1|~q ) P(~d|R = 1, ~q )

O(R|~d, ~q ) = × (3)
P(R = 0|~q ) P(~d|R = 0, ~q )

Document Independent! What for the second term?

Naive Bayes Assumption

m P(d~t |R = 1, ~q )
O(R|~d, ~q ) ∝ Π (4)
~t |R = 0, ~q )
t=1 P(d

Sumit Bhatia Probabilistic Information Retrieval 13/23

Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model

Observation 1: A term is either present in a document or

not.

Sumit Bhatia Probabilistic Information Retrieval 14/23

Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model

Observation 1: A term is either present in a document or

Sumit Bhatia Probabilistic Information Retrieval 14/23

Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model

Observation 1: A term is either present in a document or

document R =1 R=0
Term present dt = 1 pt ut
Term absent dt = 0 1 − pt 1 − ut

Sumit Bhatia Probabilistic Information Retrieval 14/23

Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model

Assumption: A term not in query is equally likey to occur in

relevant and non-relevant documents.

Sumit Bhatia Probabilistic Information Retrieval 15/23

Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model

Assumption: A term not in query is equally likey to occur in

relevant and non-relevant documents.
pt 1 − pt
O(R|~d, ~q ) ∝ Π . Π (6)
t:dt =qt =1 ut t:dt =0,qt =1 1 − ut

Sumit Bhatia Probabilistic Information Retrieval 15/23

Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model

Assumption: A term not in query is equally likey to occur in

relevant and non-relevant documents.
pt 1 − pt
O(R|~d, ~q ) ∝ Π . Π (6)
t:dt =qt =1 ut t:dt =0,qt =1 1 − ut

Manipulating:

pt (1 − ut ) 1 − pt
O(R|~d, ~q ) ∝ Π . Πt:qt =1 (7)
t:dt =qt =1 ut (1 − pt ) 1 − ut

Sumit Bhatia Probabilistic Information Retrieval 15/23

Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model

Assumption: A term not in query is equally likey to occur in

relevant and non-relevant documents.
pt 1 − pt
O(R|~d, ~q ) ∝ Π . Π (6)
t:dt =qt =1 ut t:dt =0,qt =1 1 − ut

Manipulating:

pt (1 − ut ) 1 − pt
O(R|~d, ~q ) ∝ Π . Πt:qt =1 (7)
t:dt =qt =1 ut (1 − pt ) 1 − ut
Constant for a given query!

Sumit Bhatia Probabilistic Information Retrieval 15/23

Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model

pt (1 − ut )
RSVd = log Π
t:dt =qt =1 ut (1 − pt )

(8)
X pt (1 − ut )
= log
ut (1 − pt )
t:dt =qt =1
(9)

Sumit Bhatia Probabilistic Information Retrieval 16/23

Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model

pt (1 − ut )
RSVd = log Π
t:dt =qt =1 ut (1 − pt ) Docs R=1 R=0 Total
(8) di = 1 s n-s n
di = 0 S-s (N-n)-(S-s) N-n
X pt (1 − ut )
= log Total S N-S N
ut (1 − pt )
t:dt =qt =1
(9)

Sumit Bhatia Probabilistic Information Retrieval 16/23

Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Binary Independence Model

pt (1 − ut )
RSVd = log Π
t:dt =qt =1 ut (1 − pt ) Docs R=1 R=0 Total
(8) di = 1 s n-s n
di = 0 S-s (N-n)-(S-s) N-n
X pt (1 − ut )
= log Total S N-S N
ut (1 − pt )
t:dt =qt =1
(9)
substituting, we get:

X (s + 12 )/(S − s + 12 )
RSVd = log (10)
t:dt =qt =1
(n − s + 12 )/(N − n − S + s + 12 )

Sumit Bhatia Probabilistic Information Retrieval 16/23

Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Observations

Probabilities for non-relevant documents can be approximated

by collection statistics.
(1 − ut ) (N − n) N
=⇒ log = log ≈ log = IDF !
ut n n

Sumit Bhatia Probabilistic Information Retrieval 17/23

Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Observations

Probabilities for non-relevant documents can be approximated

by collection statistics.
(1 − ut ) (N − n) N
=⇒ log = log ≈ log = IDF !
ut n n
It is not so simple for relevant documents /
– Estimating from known relevant documents (not always
known)
– Assuming pt = constant, equivalent to IDF weighting only

Sumit Bhatia Probabilistic Information Retrieval 17/23

Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

Observations

Probabilities for non-relevant documents can be approximated

Sumit Bhatia Probabilistic Information Retrieval 17/23

Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

OKAPI Weighting Scheme

BIM does not consider term frequencies and document length.

BM25 weighting scheme (Okapi weighting) by was developed

to build a probabilistic model sensitive to these quantities.

BM25 today is widely used and has shown good performance

in a number of practical systems.

Sumit Bhatia Probabilistic Information Retrieval 18/23

Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

OKAPI Weighting Scheme

X N (k1 + 1)tftd (k3 + 1)tftq

RSVd = log × ×
dft ld k3 + tftq
t∈q k1 ((1 − b) + b × ( )) + tftd
lav
where:
N is the total number of documents,
dft is the document frequency, i.e.,number of documents that contain the
term t,
tftd is the frequency of term t in document d,
tftq is the frequency of term t in query q,
ld is the length of document d,
lav is the average length of documents,
k1 , k3 and b are constants which are generally set to 2, 2 and .75
respectively.
Sumit Bhatia Probabilistic Information Retrieval 19/23
Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

What Next?

Similarity between terms and documents - is this sufficient?

Sumit Bhatia Probabilistic Information Retrieval 20/23

Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

What Next?

Similarity between terms and documents - is this sufficient?

JAVA: Coffee or Computer Language or Place?

Sumit Bhatia Probabilistic Information Retrieval 20/23

Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

What Next?

Similarity between terms and documents - is this sufficient?

JAVA: Coffee or Computer Language or Place?

Time and Location of user?

Sumit Bhatia Probabilistic Information Retrieval 20/23

Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

What Next?

Similarity between terms and documents - is this sufficient?

JAVA: Coffee or Computer Language or Place?

Time and Location of user?

Different users might want different documents for same

query?

Sumit Bhatia Probabilistic Information Retrieval 20/23

Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

What Next?

Maximum Marginal Relevance [CG98] – Rank documents so

as to minimize similarity between returned documents

Sumit Bhatia Probabilistic Information Retrieval 21/23

Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

What Next?

Maximum Marginal Relevance [CG98] – Rank documents so

as to minimize similarity between returned documents

Result Diversification [Wan09]

– Rank documents so as to maximize mean relevance, given a
variance level.
– Variance here determines the risk the user is willing to take

Sumit Bhatia Probabilistic Information Retrieval 21/23

Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

References

Carbonell, Jaime and Goldstein, Jade, The use of MMR,

diversity-based reranking for reordering documents and
producing summaries, SIGIR, 1998, pp. 335–336.
Christopher D. Manning, Prabhakar Raghavan, and Hinrich
Schütze, Introduction to information retrieval, Cambridge
University Press, 2008.
Jun Wang, Mean-variance analysis: A new document ranking
theory in information retrieval, Advances in Information
Retrieval, 2009, pp. 4–16.

Sumit Bhatia Probabilistic Information Retrieval 22/23

Introduction
Probabilistic Ranking Principle
The Binary Independence Model
OKAPI
Discussion

QUESTIONS???

Sumit Bhatia Probabilistic Information Retrieval 23/23

Orange - AI417 - 10 - MS (P1)
No ratings yet
Orange - AI417 - 10 - MS (P1)
4 pages
Repport Btech Final
No ratings yet
Repport Btech Final
49 pages
Microsoft Power BI DIAD Hands On Lab
No ratings yet
Microsoft Power BI DIAD Hands On Lab
152 pages
Datamining - Quiz - QCM Datamining
No ratings yet
Datamining - Quiz - QCM Datamining
4 pages
Solution: Petri Nets - Homework 1
100% (1)
Solution: Petri Nets - Homework 1
5 pages
Context Free Grammar
No ratings yet
Context Free Grammar
113 pages
TF Idf
100% (3)
TF Idf
38 pages
NLP and IR Tanvier Siddiqui, U.S. Tiwary
No ratings yet
NLP and IR Tanvier Siddiqui, U.S. Tiwary
18 pages
K Mean Clustering
No ratings yet
K Mean Clustering
36 pages
Exam 2
No ratings yet
Exam 2
2 pages
Data Mining
No ratings yet
Data Mining
6 pages
Case Study Library
No ratings yet
Case Study Library
23 pages
The Design and Implementation of Host-Based Intrusion Detection System
100% (1)
The Design and Implementation of Host-Based Intrusion Detection System
4 pages
Machine Learning CA 2
No ratings yet
Machine Learning CA 2
19 pages
Algorithms First-Fit, Next-Fit, and Best-Fit
100% (1)
Algorithms First-Fit, Next-Fit, and Best-Fit
3 pages
Week 5 Exercises Solutions
100% (1)
Week 5 Exercises Solutions
12 pages
Correction Exercice XSD
100% (1)
Correction Exercice XSD
9 pages
Information Retrieval
100% (1)
Information Retrieval
11 pages
Improved Shuffled Frog Leaping Algorithm For Continuous Optimization Problem
No ratings yet
Improved Shuffled Frog Leaping Algorithm For Continuous Optimization Problem
4 pages
Selection of Optimal Solution For Example and Model of Retrieval Based Voice Conversion
No ratings yet
Selection of Optimal Solution For Example and Model of Retrieval Based Voice Conversion
8 pages
Pycryptodome Master
100% (1)
Pycryptodome Master
82 pages
Question 01
No ratings yet
Question 01
113 pages
Idf
No ratings yet
Idf
11 pages
Fundamentals of 6G Communications and Networking - ISAC Chapter
No ratings yet
Fundamentals of 6G Communications and Networking - ISAC Chapter
21 pages
1.examen S.F.S.D - QCM - Recto-Verso
100% (1)
1.examen S.F.S.D - QCM - Recto-Verso
2 pages
Final Exam Paper Fall 2020
No ratings yet
Final Exam Paper Fall 2020
3 pages
Econometrics Mock Exam - Solutions
No ratings yet
Econometrics Mock Exam - Solutions
3 pages
BMC Plane Template
No ratings yet
BMC Plane Template
2 pages
Towards An Effective XML Keyword Search: Zhifeng Bao, Jiaheng Lu, Tok Wang Ling and Bo Chen
No ratings yet
Towards An Effective XML Keyword Search: Zhifeng Bao, Jiaheng Lu, Tok Wang Ling and Bo Chen
14 pages
Examen Textmining 20202021
No ratings yet
Examen Textmining 20202021
2 pages
Recherche D Information
No ratings yet
Recherche D Information
34 pages
UX Storytelling Guide
No ratings yet
UX Storytelling Guide
24 pages
MCDM Process Powerpoint
No ratings yet
MCDM Process Powerpoint
44 pages
IS328 Final Exam
No ratings yet
IS328 Final Exam
12 pages
Artificial Intelligence DITI 1113: Uniformed Search II
No ratings yet
Artificial Intelligence DITI 1113: Uniformed Search II
36 pages
Ecommerce Website Using Django.: Project in Python
No ratings yet
Ecommerce Website Using Django.: Project in Python
3 pages
Essay On Travelling
No ratings yet
Essay On Travelling
1 page
Probability and Statistics: (Final Exam Revision)
No ratings yet
Probability and Statistics: (Final Exam Revision)
4 pages
ATLAS Transformation Language: Rubby Casallas Grupo de Construcción de Software Uniandes
No ratings yet
ATLAS Transformation Language: Rubby Casallas Grupo de Construcción de Software Uniandes
18 pages
Association Rules FP Growth
No ratings yet
Association Rules FP Growth
32 pages
Final Year Project
No ratings yet
Final Year Project
24 pages
An To An A That It It An: I. (L, The
No ratings yet
An To An A That It It An: I. (L, The
10 pages
Vector Space Model
No ratings yet
Vector Space Model
11 pages
QUIZ-Revision1 20240129
No ratings yet
QUIZ-Revision1 20240129
9 pages
Captionomaly A Deep Learning Toolbox For Anomaly Captioning in Social Surveillance Systems
No ratings yet
Captionomaly A Deep Learning Toolbox For Anomaly Captioning in Social Surveillance Systems
9 pages
Or Assign
No ratings yet
Or Assign
13 pages
SQL Solution
No ratings yet
SQL Solution
5 pages
Decision Tree Entropy Gini
No ratings yet
Decision Tree Entropy Gini
5 pages
Les 3 DWM
No ratings yet
Les 3 DWM
21 pages
QCM - TuningSQL Revision C02
No ratings yet
QCM - TuningSQL Revision C02
5 pages
3.1 C 4.5 Algorithm-19
No ratings yet
3.1 C 4.5 Algorithm-19
10 pages
Dual Simplex Method
No ratings yet
Dual Simplex Method
14 pages
QCM - TuningSQL Revision C03
No ratings yet
QCM - TuningSQL Revision C03
4 pages
Artificial Intelligence: State Space Heuristic Function (Goal State: G)
No ratings yet
Artificial Intelligence: State Space Heuristic Function (Goal State: G)
7 pages
Distributed Databases: Solutions To Practice Exercises
No ratings yet
Distributed Databases: Solutions To Practice Exercises
4 pages
ENSA Agadir Hassane Bouzahir Chapter 1 - Graphs
No ratings yet
ENSA Agadir Hassane Bouzahir Chapter 1 - Graphs
17 pages
Exercices Modelisation
No ratings yet
Exercices Modelisation
3 pages
Ai Unit 5
No ratings yet
Ai Unit 5
16 pages
Problem Solving
No ratings yet
Problem Solving
20 pages
Adaptive Focus
No ratings yet
Adaptive Focus
6 pages
University of Tunis Fall 2013 Tunis Business School Decision & Game Theory Tutorial 3
No ratings yet
University of Tunis Fall 2013 Tunis Business School Decision & Game Theory Tutorial 3
4 pages
Travaux Pratiques: Exercice 1 (Sockets UDP)
No ratings yet
Travaux Pratiques: Exercice 1 (Sockets UDP)
13 pages
1 FIND+S+Algorithm
No ratings yet
1 FIND+S+Algorithm
2 pages
07 Handout
No ratings yet
07 Handout
43 pages
Liste Convoqués Master SESN 22-09-2023
No ratings yet
Liste Convoqués Master SESN 22-09-2023
9 pages
Unit II
No ratings yet
Unit II
73 pages
BDA3
No ratings yet
BDA3
61 pages
Unit-2 Map Reduce Notes
No ratings yet
Unit-2 Map Reduce Notes
28 pages
Automated Behavioral Analysis of Malware: A Case Study of Wannacry Ransomware
No ratings yet
Automated Behavioral Analysis of Malware: A Case Study of Wannacry Ransomware
7 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
38 pages
3.3 Complexity of Algorithms: Exercises
No ratings yet
3.3 Complexity of Algorithms: Exercises
3 pages
Java Exercises Beginning
No ratings yet
Java Exercises Beginning
1 page
TF-IDF - From - Scratch - Towards - Data - Science
No ratings yet
TF-IDF - From - Scratch - Towards - Data - Science
20 pages
Data Mining - Classification Using Frequent Pattern
No ratings yet
Data Mining - Classification Using Frequent Pattern
8 pages
TFIDF, Mean, Percentile, Median
No ratings yet
TFIDF, Mean, Percentile, Median
38 pages
Extra Feature NLP
No ratings yet
Extra Feature NLP
5 pages
Chapter 15 - MINING MEANING FROM TEXT
No ratings yet
Chapter 15 - MINING MEANING FROM TEXT
20 pages
Hate Speech Recognition Final 1
No ratings yet
Hate Speech Recognition Final 1
34 pages
Ir QB
No ratings yet
Ir QB
8 pages
? DSML U4
No ratings yet
? DSML U4
27 pages
Machine Learning With Python - Unit-5
No ratings yet
Machine Learning With Python - Unit-5
26 pages
UNIT-5 Deep Learning Applications: What Is Natural Language Processing?
No ratings yet
UNIT-5 Deep Learning Applications: What Is Natural Language Processing?
12 pages
Social Media Analytics For YouTube Comments Potential and Limitations
No ratings yet
Social Media Analytics For YouTube Comments Potential and Limitations
18 pages
A Fast Corpus-Based Stemmer
No ratings yet
A Fast Corpus-Based Stemmer
16 pages
Ai in Engeneering at Facebook PDF
No ratings yet
Ai in Engeneering at Facebook PDF
10 pages
Paper ID - 203 - Bloom's Taxonomy Question Classisfication
No ratings yet
Paper ID - 203 - Bloom's Taxonomy Question Classisfication
14 pages
Term Paper Int 423
No ratings yet
Term Paper Int 423
9 pages
Kumar 2024 Ijca 924115
No ratings yet
Kumar 2024 Ijca 924115
7 pages
Jurnal Information Retrieval
No ratings yet
Jurnal Information Retrieval
4 pages