0% found this document useful (0 votes)

283 views46 pages

IR Models: - Why IR Models? - Boolean IR Model - Vector Space IR Model - Probabilistic IR Model

The document discusses information retrieval (IR) models. It begins by introducing three common IR models: the Boolean model, vector space model, and probabilistic model. It then provides details on the Boolean model, which uses a simple binary approach to determine if a document is relevant based on whether terms are present or absent. The weighting scheme in the Boolean model is binary, assigning weights of either 0 or 1.

Uploaded by

kerya ibrahim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

283 views46 pages

IR Models: - Why IR Models? - Boolean IR Model - Vector Space IR Model - Probabilistic IR Model

Uploaded by

kerya ibrahim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 46

IR models

• Why IR models?
• Boolean IR Model
• Vector space IR model
• Probabilistic IR model
What is Information Retrieval ?
• Information retrieval is the
process of searching for relevant
documents from unstructured
large corpus that satisfy users
information need.
– It is a tool that finds and selects
from a collection of items a
subset that serves the user’s
purpose
• Much IR research focuses more specifically on text retrieval. But
there are many other interesting areas:
 Cross-language vs. multilingual information retrieval,
 Multimedia (audio, video & image) information retrieval (QBIC, WebSeek,
SaFe)
 Question-answering (AskJeeves, Answerbus).
 Digital and virtual libraries
Assignment 1 (Due: __ days)
Compare local vs. global research works on the following topic & submit
the assessment result. Your report should show the state-of-the-art
(including overview of the concept, its significance, major tasks,
architecture, approaches, concluding remarks with future research
direction & references). Share the soft-copy of your report & slides to all
the classmates, and Cc to me.. There is a 10 minutes presentation by
each group, which will start on April 09, 2012 (Monday).
1. Amharic IR system (Kifle & Martha)
2. Stemming and Thesaurus construction (Demewoz & Sintayehu)
3. IR Models (Daniel)
4. Query Expansion (Abdulkerim & Zealem)
5. Document Image Retrieval (Betsegaw & Tsegaye S.)
6. Cross Language IR (Ibsa & Eyob)
7. Multimedia IR (Besufekad, Tamirat & Kibrom)
8. Question Answering (Alemayehu & Getachew)
9. Recommender Systems (Mulalem & Brook)
10. Document Summarization (Tsegaye M. & Adey)
11. Information Extraction (Kibreab & Tesfaye)
12. Text Classification (Berihu & Yibeltal)
13. Information Filtering (Mengistu & Esubalew)
• Web IR; Document provenance; Intelligent IR;
Information Retrieval serve as a
Bridge
• An Information Retrieval System serves as a bridge
between the world of authors and the world of
readers/users,
– That is, writers present a set of ideas in a document using
a set of concepts. Then Users seek the IR system for
relevant documents that satisfy their information need.

Black box
User Documents
Typical IR System Architecture

Document
corpus

Query IR
String System

1. Doc1
2. Doc2
Ranked 3. Doc3
Relevant Documents .
.
Our focus during IR system design
• In improving Effectiveness of the system
–The concern here is retrieving more relevant documents as per
users query
–Effectiveness of the system is measured in terms of precision,
recall, …
–Main emphasis: text operations (such as stemming, stopwords
removal, normalization, etc.), weighting schemes, matching
algorithms, …
• In improving Efficiency of the system
–The concern here is
• enhancing searching time, indexing time, access time…
• reducing storage space requirement of the system
• space – time tradeoffs
–Main emphasis:
• Compression
• Index terms selection (free text or content-bearing terms)
• indexing structures
Subsystems of IR system
The two subsystems of an IR system: Indexing and
Searching
–Indexing:
• is an offline process of organizing documents
using keywords extracted from the collection
• Indexing is used to speed up access to desired
information from document collection as per
users query

–Searching
• Is an online process that scans document corpus to find
relevant documents that matches users query
Indexing Subsystem
documents
Documents Assign document identifier

document document
Tokenization
IDs
tokens
Stopword removal
non-stoplist tokens
Stemming &
stemmed terms
Normalization
Term weighting

Weighted index
terms Index File
Searching Subsystem
query parse query
query tokens
ranked non-stoplist
document Stop word
tokens
set
Ranking
Stemming & Normalize
relevant stemmed terms
document set
Similarity Query Term weighting
Measure terms
Index terms
Index
IR Models - Basic Concepts
 IR systems usually adopt index terms to
index and retrieve documents
Each document is represented by a set of
representative keywords or index terms (called
Bag of Words)
• An index term is a word useful for remembering the
document main themes
• Not all terms are equally useful for
representing the document contents:
less frequent terms allow identifying a narrower
set of documents
• But no ordering information is attached to the Bag of
Words identified from the document collection.
IR Models - Basic Concepts
•One central problem regarding IR systems is
the issue of predicting the degree of relevance
of documents for a given query
 Such a decision is usually dependent on a
ranking algorithm which attempts to
establish a simple ordering of the
documents retrieved
 Documents appearning at the top of this
ordering are considered to be more likely
to be relevant
•Thus ranking algorithms are at the core of IR
systems
 The IR models determine the predictions of
what is relevant and what is not, based on
IR models

Probabilistic
relevance
How to find relevant documents for a
query?
• Step 1: Map documents & queries into term-document vector
space. Note that queries are considered as short document
– Represent both documents & queries as N-dimensional vectors in
a term-document matrix, which shows occurrence of terms in the
document collection or query
 
d j (t1, j , t 2, j ,..., t N , j ); qk (t1,k , t 2,k ,..., t N ,k )
T1 T 2 …. TN
– Document collection is mapped to
D1 … … .. … … term-by-document matrix
D2 … … .. … … – View as vector in
: … … ..… …: multidimensional space
… … ..… … • Nearby vectors are related
DM … … .. … …
Qi … … .. … …
How to find relevant documents for a
• Step 2: Queries and documents
query? are represented as
weighted vectors, wij
 Why we need weighting techniques?
 To know the importance of a term in describing the content
of a given document?
 There are binary weights & non-binary weighting technique.
Any difference between the two?
 What method you recommend to compute weights for term i in
document j and query q; wij and wiq ?
T1 T2 …. TN
• An entry in the matrix corresponds to
the “weight” of a term in the document; D1 w11 w12 … w1N
zero means the term doesn’t exist in the D2 w21 w22 … w2N
document. : : : :
• Normalize for vector length to avoid : : : :
the effect of document length DM wM1 wM2 … wMN
Qi wi1 wi2 … wiN
How to find relevant documents for a
• Step 3: Rank documentsquery?
(in increasing or decreasing
order) based on their closeness to the query.
 Documents are ranked by the degree of their closeness to
the query.
 How closeness of the document to query measured?
 It is determined by a similarity/dissimilarity score
calculation
 How many matching (similarity/dissimilarity
measurements) you know? Which one is best for IR?
  n
d j q  w w
i 1 i , j i , q
sim( d j , q )    
n n
dj q w 2
i 1 i , j i 1 i ,qw 2
How to evaluate Models?
• We need to investigate what procedures the IR Models
follow and what techniques they use:
– What is the weighting technique used by the IR Models for
measuring importance of terms in documents?
• Are they using binary or non-binary weight?
– What is the matching technique used by the IR models?
• Are they measuring similarity or dissimilarity?
– Are they applying exact matching or partial matching in the
course of finding relevant documents for a given query?
– Are they applying best matching principle to measure the
degree of relevance of documents to display in ranked-order?
• Is there any Ranking mechanism applied before displaying
relevant documents for the users?
The Boolean Model
•Boolean model is a simple model based on
set theory
 The Boolean model imposes a binary
criterion for deciding relevance
•Terms are either present or absent. Thus,
wij  {0,1}
•sim(q,dj) = 1 - if document satisfies
T1 T2 the
…. TN
boolean query D1 w11 w12 … w1N
0 - otherwiseD w w
- Note that, no weights … w2N
2 21 22
assigned in-between 0 : : : :
and 1, just only values 0 : : : :
or 1 DM wM1 wM2 … wMN
The Boolean Model: Example
Given the following three documents, Construct Term –
document matrix and find the relevant documents
retrieved by the Boolean model for the query “gold
silver truck”
• D1: “Shipment of gold damaged in a fire”
• D2: “Delivery of silver arrived in a silver truck”
• D3: “Shipment of gold arrived in a truck”
Table below
arrive shows document
damage deliver –term
fire (tgold
i) matrix
silver ship truck
D1
D2
D3
query

Also find the documents relevant for the queries:

(a)gold delivery; (b) ship gold; (c) silver truck
The Boolean Model: Further
•
Example
Given the following determine documents retrieved by the
Boolean model based IR system
• Index Terms: K1, …,K8.
• Documents:

1. D1 = {K1, K2, K3, K4, K5}

2. D2 = {K1, K2, K3, K4}
3. D3 = {K2, K4, K6, K8}
4. D4 = {K1, K3, K5, K7}
5. D5 = {K4, K5, K6, K7, K8}
6. D6 = {K1, K2, K3, K4}
• Query: K1 (K2  K3)
• Answer: {D1, D2, D4, D6} ({D1, D2, D3, D6} {D3, D5})
= {D1, D2, D6}
Exercise
Given the following four documents with the following
contents:
– D1 = “computer information retrieval”
– D2 = “computer retrieval”
– D3 = “information”
– D4 = “computer information”

• What are the relevant documents retrieved for the

queries:
– Q1 = “information  retrieval”
– Q2 = “information  ¬computer”
Drawbacks of the Boolean Model
•Retrieval based on binary decision criteria
with no notion of partial matching
•No ranking of the documents is provided
(absence of a grading scale)
•Information need has to be translated into a
Boolean expression which most users find
awkward
•The Boolean queries formulated by the users
are most often too simplistic
 As a consequence, the Boolean model
frequently returns either too few or too
many documents in response to a user
query
Vector-Space Model
• This is the most commonly used strategy for measuring
relevance of documents for a given query. This is
because,
 Use of binary weights is too limiting
 Non-binary weights provide consideration for partial
matches
• These term weights are used to compute a degree of
similarity between a query and each document
 Ranked set of documents provides for better
matching
• The idea behind VSM is that
 the meaning of a document is conveyed by the words
used in that document
Vector-Space Model
To find relevant documens for a given query,
• First, map documents and queries into term-document
vector space.
Note that queries are considered as short document
• Second, in the vector space, queries and documents are
represented as weighted vectors, wij
There are different weighting technique; the most widely used
one is computing TF*IDF weight for each term
• Third, similarity measurement is used to rank documents
by the closeness of their vectors to the query.
To measure closeness of documents to the query cosine
similarity score is used by most search engines
Computing weights
• The vector space model with TF*IDF weights is a good
ranking strategy with general collections
• For index terms a normalized TF*IDF weight is given
by: freq (i, j )
wij  * log(N/n i )
max( freq (k , j ))
• Users query is typically treated as a short document and
also TF-IDF weighted.
For the query term weights, a suggestion is
freq (i, q )
wiq  0.5  [0.5 * ] * log(N/n i )
max( freq (k , q ))

• The vector space model is usually as good as the known

ranking alternatives. It is also simple and fast to compute.
Example: Computing weights
• A collection includes 10,000 documents
 The term tA appears 20 times in a particular document j
 The maximum appearance of term tk in document j is 50
times
 The term tA appears in 2,000 of the document
collections.

• Compute TF*IDF weight of term A?

 tf(A,j) = freq(A,j) / max(freq(k,j)) = 20/50 = 0.4
 idf(A) = log(N/DFA) = log (10,000/2,000) = log(5) = 2.32
 wAj = tf(A,j) * log(N/DFA) = 0.4 * 2.32 = 0.928
Similarity Measure
• A similarity measure is a function that computes the
degree of similarity/dissimilarity between document j
and users query.   n
d j q  w w
i 1 i , j i , q
sim(d j , q )    
n n
dj q w 2
i 1 i , j i 1 i ,q w 2

• Using a similarity score between the query and each

document:
– It is possible to apply best matching such that documents
are ranked for retrieval in the order of presumed
relevance.
– It is possible to enforce a certain threshold so that we can
control the size of the retrieved set of documents.
Vector Space with Term
Weights and Cosine Similarity
Measure
Di=(d1i,w1di;d2i, w2di;…;dti, wtdi)
Term B
Q =(q1i,w1qi;q2i, w2qi;…;qti, wtqi)
1.0 Q = (0.4,0.8)
t
D2 Q D1=(0.8,0.3)  j 1
w jq w jdi
0.8 D2=(0.2,0.7) sim(Q, Di ) 
t 2 t 2

0.6
 j 1 (w jq ) ( w
 j 1 jdi )
2 (0.4 0.2)  (0.8 0.7)
sim (Q, D 2) 
0.4
[(0.4) 2  (0.8) 2 ] [(0.2) 2  (0.7) 2 ]
D1
0.2 1 0.64
 0.98
0.42
0 0.2 0.4 0.6 0.8 1.0
.56
Term A sim(Q, D1 )  0.74
0.58
Example Vector-Space
Model
• Suppose user query for: Q = “gold silver truck”. The
database collection consists of three documents with the
following content.
D1: “Shipment of gold damaged in a fire”
D2: “Delivery of silver arrived in a silver truck”
D3: “Shipment of gold arrived in a truck”
• Show retrieval results in ranked order?
1.Assume that full text terms are used during indexing,
without removing common terms, stop words, & also no
terms are stemmed.
2.Assume that content-bearing terms are selected during
indexing
3.Also compare your result with or without normalizing
term frequency
Example VSM: Weighting
Terms
Terms Q
Counts TF
DF IDF
W = TF*IDF i

D1 D2 D3 Q D1 D2 D3

arrive 0 0 1 1 2 0.176 0 0 0.176 0.176

damage 0 1 0 0 1 0.477 0 0.477 0 0
deliver 0 0 1 0 1 0.477 0 0 0.477 0
fire 0 1 0 0 1 0.477 0 0.477 0 0
gold 1 1 0 1 2 0.176 0.176 0.176 0 0.176
silver 1 0 2 0 1 0.477 0.477 0 0.954 0
ship 0 1 0 1 2 0.176 0 0.176 0 0.176
truck 1 0 1 1 2 0.176 0.176 0 0.176 0.176
Example VSM: Weighting
Terms
Terms Q D1 D2 D3
arrive 0 0 0.176 0.176
damage 0 0.477 0 0
deliver 0 0 0.477 0
fire 0 0.477 0 0
gold 0.176 0.176 0 0.176
silver 0.477 0 0.954 0
ship 0 0.176 0 0.176
truck 0.176 0 0.176 0.176
Example VSM: similarity
Measure
•Compute similarity using cosine Sim(q,d1)
• First, for each document and query, compute all
vector lengths (zero terms ignored)
|d1|= 0.477 2  0.477 2  0.1762  0.1762 =0.517 =
0.719
0.1762  0.477 2  0.9542  0.1762 1.1996
|d2|= = =
0.176 2
 0. 176 2
 0. 176 2
 0. 176 2 0.124
1.095
|d3|= 0. 176 2
 0. 4712
 0. 176 2
0.2896
= =
0.352

|q|= = = 0.538
• Next, compute dot products (zero products
ignored)
Example VSM: Ranking
Now, compute similarity score
Sim(q,d1) = (0.0310) / (0.538*0.719) = 0.0801
Sim(q,d2) = (0.4862 ) / (0.538*1.095)= 0.8246
Sim(q,d3) = (0.0620) / (0.538*0.352)= 0.3271
Finally, we sort and rank documents in descending
order according to the similarity scores
Rank 1: Doc 2 = 0.8246
Rank 2: Doc 3 = 0.3271
Rank 3: Doc 1 = 0.0801
• Exercise: using normalized TF, rank documents
using cosine similarity measure? Hint: Normalize
TF of term i in doc j using max frequency of a
Vector-Space Model
• Advantages:
• Term-weighting improves quality of the answer set since
it helps to display relevant documents in ranked order
• Partial matching allows retrieval of documents that
approximate the query conditions
• Cosine ranking formula sorts documents according to
degree of similarity to the query

• Disadvantages:
• Assumes independence of index terms. It doesn’t relate
one term with another term
• Computationally expensive since it measures the
similarity between each document and the query
Exercise 1
Suppose the database collection consists of the following
documents.
c1: Human machine interface for Lab ABC computer applications
c2: A survey of user opinion of computer system response time
c3: The EPS user interface management system
c4: System and human system engineering testing of EPS
c5: Relation of user-perceived response time to error measure
M1: The generation of random, binary, unordered trees
M2: The intersection graph of paths in trees
M3: Graph minors: Widths of trees and well-quasi-ordering
m4: Graph minors: A survey
Query:
Find documents relevant to "human computer
interaction"
Exercise 2
• Consider these documents:
Doc 1 breakthrough drug for schizophrenia
Doc 2 new schizophrenia drug
Doc 3 new approach for treatment of schizophrenia
Doc 4 new hopes for schizophrenia patients
–Draw the term-document incidence matrix for this document
collection.
–Draw the inverted index representation for this collection.

• For the document collection shown above, what are the

returned results for the queries:
–schizophrenia AND drug
–for AND NOT(drug OR approach)
Probabilistic Model
• IR is an uncertain process
–Mapping Information need to Query is not perfect
–Mapping Documents to index terms is a logical representation
–Query terms and index terms mostly mismatch
• This situation leads to several statistical approaches:
probability theory, fuzzy logic, theory of evidence,
language modeling, etc.
• Probabilistic retrieval model is rigorous formal model that
attempts to predict the probability that a given document
will be relevant to a given query; i.e. Prob(R|(q,di))
–Use probability to estimate the “odds” of relevance of a query to
a document.
–It relies on accurate estimates of probabilities
Probability Ranking
Principle
• The relevance of a given document for users query can be
determined by the probability score
–High probability (prob(rel | di q): means more likely for users to
get relevant information by reading document di.
• A Probabilistic retrieval model follows Probability ranking
principle
–You have a collection of Documents
• A set of relevant documents needs to be returned for
queries issued by users
• Intuitively, want the “best” document to be first, second
best - second, etc…
–According to probability ranking principle, documents are
ranked in decreasing order of probability of relevance to users
information need
Terms Existence in Relevant
Document
N=the total number of documents in the collection
n= the total number of documents that contain term ti
R=the total number of relevant documents retrieved
r=the total number of relevant documents retrieved that contain term t i
Document Relevance
For term ti No of relevant No of non-relevant Total
docs docs

No of docs including r n-r n

term ti
No of docs excluding R-r N-R-(n-r) N-n
term ti
Total R N-R N
(r  0.5)( N  n  R  r  0.5)
wi log
(n  r  0.5)( R  r  0.5)
Computing term
probabilities
• Three cases: Relevance of documents for a given query
may be known, partially known or unknown
– Initially, there are no retrieved documents
– R is completely unknown
– Assume P(ti|R) is constant (usually 0.5)
– Assume P(ti|NR) approximated by distribution of ti
across collection – IDF

• This can be used to compute an initial rank using

IDF as the basic term weight
Probabilistic Model
d
Example
Document vectors <td,t>
cold day eat hot lot nine old pea pizza pot
1 1 1 1 1
2 1 1 1
3 1 1 1
4 1 1 1
5 1 1
6 1 1

wt 0.26 0.56 0.56 0.26 0.56 0.56 0.56 0.0 0.0 0.26

• q1 = eat
• q2 = eat pizza
• q4 = eat hot pizza
Improving the Ranking
• Now, suppose
– we have shown the initial ranking to the user
– the user has labeled some of the documents as
relevant ("relevance feedback")
• We now have
– N documents in collection, R are known relevant
documents
– ni documents containing ti, out of which ri are
relevant
Relevance weighted
Example
Document vectors <td,t>
d
cold day eat hot lot nine old pea pizza pot Relev
ance
1 1 1 1 1 NR
2 1 1 1 R
3 1 1 1 NR
4 1 1 1 NR
5 1 1 NR
6 1 1 NR

wt -0.33 0.00 0.00 -0.33 0.00 0.00 0.00 0.62 0.62 0.95

• query = hot pizza

• Document 2 is relevant
Probabilistic Retrieval
Example
• D1: “Cost of paper is up.” (relevant)
• D2: “Cost of jellybeans is up.” (not relevant)
• D3: “Salaries of CEO’s are up.” (not relevant)
• D4: “Paper: CEO’s labor cost up.” (????)
Probabilistic Retrieval
cost
Example
paper Jellybean salary CEO labor up Releva
nce
D1 1 1 0 0 0 0 1 R
D2 1 0 1 0 0 0 1 NR
D3 0 0 0 1 1 0 1 NR
D4 1 1 0 0 1 1 1 ??
Wij 0.477 1.176 -0.477 -0.477 -0.477 0.222 -0.222

• D1=0.477 +1.176+ -0.222

• D2=0.477 + -0.477+ -0.222
• D3= -0.477 + -0.477+ -0.222
• D4=0.477 +1.176 + -0.477 + 0.222 + -0.222
Exercise
• Consider the collection below. The collection has 5 documents and
each document is described by two terms. The initial guess of
relevance to a particular query Q is as given in the table below.
Assuming the query Q has a total of 2 relevant documents in this
collection solve the following questions
Document T1 T2 Relevance
D1 1 1 R
D2 0 1 NR
D3 1 0 NR
D4 1 0 R
D5 0 1 NR

• Using the probabilistic term weighting formula, calculate the new

weight for each of the query in Q
• Rank the documents according to their probability of relevance
with the new query
Probabilistic model
• Probabilistic model uses probability theory to model the
uncertainty in the retrieval process
– Assumptions are made explicit
– Term weight without relevance information is IDF
• Relevance feedback can improve the ranking by giving better
term probability estimates
• Advantages of probabilistic model over vector‐space
– Strong theoretical basis
– Since the base is probability theory, it is very well understood
– Easy to extend
• Disadvantages
– Models are often complicated
– No term frequency weighting
• Which is better: vector‐space or probabilistic?
– Both are approximately as good as each other
– Depends on collection, query, and other factors

IR - Models
100% (3)
IR - Models
58 pages
Ai Sanfoundry Artificial Intelligence MCQ
No ratings yet
Ai Sanfoundry Artificial Intelligence MCQ
130 pages
Introduction To Information Storage and Retrieval: Chapter Four: Indexing Structure
No ratings yet
Introduction To Information Storage and Retrieval: Chapter Four: Indexing Structure
34 pages
IR Unit III - Notes
No ratings yet
IR Unit III - Notes
18 pages
Information Retrieval MCQ PDF
100% (2)
Information Retrieval MCQ PDF
4 pages
Information Retrieval
No ratings yet
Information Retrieval
31 pages
Cs8080 Unit3 Text Classification and Clustering
No ratings yet
Cs8080 Unit3 Text Classification and Clustering
171 pages
(4th NLP'22) Final Exam
No ratings yet
(4th NLP'22) Final Exam
2 pages
Banking and Insurance
50% (2)
Banking and Insurance
13 pages
Ai QB
No ratings yet
Ai QB
28 pages
Information Storage and Retrieval: Chapter One - Introduction
No ratings yet
Information Storage and Retrieval: Chapter One - Introduction
50 pages
Ir MCQ-1
No ratings yet
Ir MCQ-1
22 pages
CS8691 Artificial Intelligence MCQ Quest
No ratings yet
CS8691 Artificial Intelligence MCQ Quest
47 pages
DBDM Unit-3
No ratings yet
DBDM Unit-3
30 pages
Unit V Easy To Learn
No ratings yet
Unit V Easy To Learn
21 pages
DM Unit 3
No ratings yet
DM Unit 3
39 pages
ML Lab Programs (1-12)
No ratings yet
ML Lab Programs (1-12)
35 pages
Software Engineering
No ratings yet
Software Engineering
44 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
23 pages
Business Intelligence Unit 1
No ratings yet
Business Intelligence Unit 1
4 pages
Automatic Indexing
100% (1)
Automatic Indexing
15 pages
Chapter 1 Introduction To ISR
No ratings yet
Chapter 1 Introduction To ISR
39 pages
Cp4152 Database Practice Lab Manual R 2021
No ratings yet
Cp4152 Database Practice Lab Manual R 2021
48 pages
TYCS - Data Science MCQ
No ratings yet
TYCS - Data Science MCQ
6 pages
4 IRModels
No ratings yet
4 IRModels
46 pages
2 - Text Operation
No ratings yet
2 - Text Operation
45 pages
8 Software Maintenance
No ratings yet
8 Software Maintenance
9 pages
Frame-Based Expert Systems
No ratings yet
Frame-Based Expert Systems
50 pages
Requirements Modeling
No ratings yet
Requirements Modeling
39 pages
Introduction To AI: Artificial Intelligence COSC-3112
No ratings yet
Introduction To AI: Artificial Intelligence COSC-3112
29 pages
Part I IR VTU M Tech SSE
No ratings yet
Part I IR VTU M Tech SSE
72 pages
Irs Unit1
No ratings yet
Irs Unit1
15 pages
Unit 5 MCQ It 8074 Soa
No ratings yet
Unit 5 MCQ It 8074 Soa
13 pages
DWDM Online Bits
No ratings yet
DWDM Online Bits
3 pages
Query Languages and Query Operation: Chapter Seven
No ratings yet
Query Languages and Query Operation: Chapter Seven
20 pages
DSF Unit IV MCQ Notes
No ratings yet
DSF Unit IV MCQ Notes
6 pages
Electrical System (HCR1500-EDII, D20II)
100% (2)
Electrical System (HCR1500-EDII, D20II)
20 pages
Irs Important Questions
0% (1)
Irs Important Questions
3 pages
Information Retrieval 1 Introduction To IR
No ratings yet
Information Retrieval 1 Introduction To IR
12 pages
Irs PPT Unit Ii
No ratings yet
Irs PPT Unit Ii
19 pages
CSI 4107 - Winter 2016 - Midterm
0% (1)
CSI 4107 - Winter 2016 - Midterm
10 pages
IR UNIT I - Notes
No ratings yet
IR UNIT I - Notes
23 pages
Question Bank - Unit 1
No ratings yet
Question Bank - Unit 1
5 pages
Question Bank For XML
No ratings yet
Question Bank For XML
23 pages
4-IR Models
No ratings yet
4-IR Models
33 pages
DWM 700
No ratings yet
DWM 700
16 pages
4 IRModels
No ratings yet
4 IRModels
32 pages
Customer Course Catalog
100% (1)
Customer Course Catalog
112 pages
5 IRModels IR
No ratings yet
5 IRModels IR
25 pages
Unit 1 - Modern Information Retrieval - WWW - Rgpvnotes.in
No ratings yet
Unit 1 - Modern Information Retrieval - WWW - Rgpvnotes.in
8 pages
Introduction To Automatic Indexing
No ratings yet
Introduction To Automatic Indexing
28 pages
Chapter One: Ju, Jit, Faculty of Computing and Informatics
100% (1)
Chapter One: Ju, Jit, Faculty of Computing and Informatics
44 pages
CS6456-Object Oriented Programming
No ratings yet
CS6456-Object Oriented Programming
15 pages
Anna University Data Warehousing and Data Mining November December 2011 Question Paper
No ratings yet
Anna University Data Warehousing and Data Mining November December 2011 Question Paper
3 pages
Java Model Paper Answer
No ratings yet
Java Model Paper Answer
18 pages
10-FDs & Normalization in DBMS - Print - Quizizz
No ratings yet
10-FDs & Normalization in DBMS - Print - Quizizz
5 pages
Ec 467 Pattern Recognition
No ratings yet
Ec 467 Pattern Recognition
2 pages
Chapter Five IR Models
No ratings yet
Chapter Five IR Models
28 pages
Information Retrieval Models
No ratings yet
Information Retrieval Models
4 pages
r05321204 Data Warehousing and Data Mining
No ratings yet
r05321204 Data Warehousing and Data Mining
5 pages
Syllabus Information Retrieval Techniques
No ratings yet
Syllabus Information Retrieval Techniques
2 pages
Case Study: A Case Study On Subledger Accounting, Oracle Release 12
No ratings yet
Case Study: A Case Study On Subledger Accounting, Oracle Release 12
13 pages
Fred Ai2 CSWK
No ratings yet
Fred Ai2 CSWK
7 pages
Top 50 SAP ABAP Interview Questions and Answers PDF
No ratings yet
Top 50 SAP ABAP Interview Questions and Answers PDF
12 pages
Dissertation On Intellectual Property Rights
100% (2)
Dissertation On Intellectual Property Rights
7 pages
Software Asset Management: What Is It and Why Do I Need It?: A Textbook on the Fundamentals in Software License Compliance, Audit Risks, Optimizing Software License ROI, Business Practices and Life Cycle Management
From Everand
Software Asset Management: What Is It and Why Do I Need It?: A Textbook on the Fundamentals in Software License Compliance, Audit Risks, Optimizing Software License ROI, Business Practices and Life Cycle Management
Carl A. Bolton
No ratings yet
Network Cabling: Making Connections With Cat5
No ratings yet
Network Cabling: Making Connections With Cat5
33 pages
Unit-1-Important Questions
No ratings yet
Unit-1-Important Questions
2 pages
Antarang Foundation
No ratings yet
Antarang Foundation
25 pages
Lauterbach Tricore App Ocds
No ratings yet
Lauterbach Tricore App Ocds
52 pages
Mud Logging
No ratings yet
Mud Logging
10 pages
BSCPL Tech Spec MLTP Botanical R00
No ratings yet
BSCPL Tech Spec MLTP Botanical R00
57 pages
SQL CREATE TABLE Statement
No ratings yet
SQL CREATE TABLE Statement
11 pages
Accounting Has Been Done Manually Till The 1980s
No ratings yet
Accounting Has Been Done Manually Till The 1980s
10 pages
Describe Middleware Layer and Its Purpose
No ratings yet
Describe Middleware Layer and Its Purpose
3 pages
Background: Alter Table
No ratings yet
Background: Alter Table
31 pages
Shelton v. Patton, Et Al. Final
No ratings yet
Shelton v. Patton, Et Al. Final
27 pages
Chapter-Two: Requirement Gathering and Techniques
No ratings yet
Chapter-Two: Requirement Gathering and Techniques
36 pages
Internetworking Models: Model Was Created by The International Organization For
No ratings yet
Internetworking Models: Model Was Created by The International Organization For
45 pages
Example of DES
No ratings yet
Example of DES
10 pages
What Are The Advantages of Ecommerce?: Ecommerce Advantage #1: Low Financial Cost
No ratings yet
What Are The Advantages of Ecommerce?: Ecommerce Advantage #1: Low Financial Cost
18 pages
Volume 5-2 (C) - ESIA For Padibe West
No ratings yet
Volume 5-2 (C) - ESIA For Padibe West
288 pages
Departement of Accuntin and Finance
No ratings yet
Departement of Accuntin and Finance
45 pages
Opa 2863
No ratings yet
Opa 2863
52 pages
ARRI Pro Cam Accs BRCH
No ratings yet
ARRI Pro Cam Accs BRCH
24 pages
Class Range Example Number of Networks Number of Hosts in Each Subnet A
No ratings yet
Class Range Example Number of Networks Number of Hosts in Each Subnet A
8 pages
List of Obcs in Tripura As Approved by The Govt. of India. Schemes For Welfare of O.B.Cs
No ratings yet
List of Obcs in Tripura As Approved by The Govt. of India. Schemes For Welfare of O.B.Cs
4 pages
Homework: - Let B.R B$u (,1:r) % % Diag (B$D (1:r) ) % % T (B$V (,1:r) )
No ratings yet
Homework: - Let B.R B$u (,1:r) % % Diag (B$D (1:r) ) % % T (B$V (,1:r) )
27 pages
Ficha Técnica Del Montacargas XC SERIES 3 WHEEL ELECTRIC FORKLIFT WITH LI-ION BATTERY 3,200-4,000LBS
No ratings yet
Ficha Técnica Del Montacargas XC SERIES 3 WHEEL ELECTRIC FORKLIFT WITH LI-ION BATTERY 3,200-4,000LBS
6 pages
Edited Gfi Seminar Report
No ratings yet
Edited Gfi Seminar Report
18 pages
BT300KTS 674 TYM Rev04
No ratings yet
BT300KTS 674 TYM Rev04
53 pages
E+H-PROMAG W 400 - Tender Text - TTW400EN
No ratings yet
E+H-PROMAG W 400 - Tender Text - TTW400EN
2 pages
School Action Plan For Literacy Catch-Up Sessions
No ratings yet
School Action Plan For Literacy Catch-Up Sessions
7 pages
Working of Gi
No ratings yet
Working of Gi
1 page
Samsung Manual-ACI3PR16001 R2
No ratings yet
Samsung Manual-ACI3PR16001 R2
32 pages
Academic and Support Services: San Carlos Campus Organizational Chart
No ratings yet
Academic and Support Services: San Carlos Campus Organizational Chart
1 page
Cbet LVL 6 Basic Electronics 4
No ratings yet
Cbet LVL 6 Basic Electronics 4
3 pages
Anthill Protocol
No ratings yet
Anthill Protocol
2 pages
References
No ratings yet
References
3 pages
Module 4 - Mindmap PDF
No ratings yet
Module 4 - Mindmap PDF
1 page
Norman Cordero Marquez, Petitioner, vs. Commission On Elections, Respondent.
No ratings yet
Norman Cordero Marquez, Petitioner, vs. Commission On Elections, Respondent.
9 pages
X. 509 Certificate Directory: Compiled & Prepared by Dr. Sambhaji Sarode CSE, MIT ADT University Pune
No ratings yet
X. 509 Certificate Directory: Compiled & Prepared by Dr. Sambhaji Sarode CSE, MIT ADT University Pune
14 pages
Oferta de Compraventa Bilingüe
No ratings yet
Oferta de Compraventa Bilingüe
6 pages
Thayer, Vice President Kamala Harris Visit To Vietnam Scene Setter
No ratings yet
Thayer, Vice President Kamala Harris Visit To Vietnam Scene Setter
3 pages
Application For Probation 175 Basmayor
No ratings yet
Application For Probation 175 Basmayor
3 pages

IR Models: - Why IR Models? - Boolean IR Model - Vector Space IR Model - Probabilistic IR Model

Uploaded by

IR Models: - Why IR Models? - Boolean IR Model - Vector Space IR Model - Probabilistic IR Model

Uploaded by

IR models

Also find the documents relevant for the queries:

1. D1 = {K1, K2, K3, K4, K5}

• What are the relevant documents retrieved for the

• The vector space model is usually as good as the known

• Compute TF*IDF weight of term A?

• Using a similarity score between the query and each

arrive 0 0 1 1 2 0.176 0 0 0.176 0.176

• For the document collection shown above, what are the

No of docs including r n-r n

• This can be used to compute an initial rank using

• query = hot pizza

• D1=0.477 +1.176+ -0.222

• Using the probabilistic term weighting formula, calculate the new

You might also like