0% found this document useful (0 votes)

246 views34 pages

Chapter 4 IR Models

Relevant documents retrieved are {3, 7, 11, 15} 15

Uploaded by

Yohannes Kefale

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

246 views34 pages

Chapter 4 IR Models

Relevant documents retrieved are {3, 7, 11, 15} 15

Uploaded by

Yohannes Kefale

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Introduction to

Information Storage and Retrieval

Chapter Four: IR models

1
IR Models - Basic Concepts
 Word evidence: Bag of words
• IR systems usually adopt index terms to index and retrieve
documents
• Each document is represented by a set of representative
keywords or index terms (called Bag of Words)

• An index term is a word useful for remembering the

document main themes
• Not all terms are equally useful for representing the document
contents:
• less frequent terms allow identifying a narrower set of
documents
• But No ordering information is attached to the Bag of Words
identified from the document collection.

2
IR Models - Basic Concepts
• One central problem regarding IR systems is the
issue of predicting which documents are relevant
and which are not
• Such a decision is usually dependent on a ranking
algorithm which attempts to establish a simple
ordering of the documents retrieved
• Documents appearning at the top of this ordering
are considered to be more likely to be relevant
• Thus ranking algorithms are at the core of IR systems
• The IR models determine the predictions of what is
relevant and what is not, based on the notion of
relevance implemented by the system
3
IR Models - Basic Concepts
• After preprocessing, N distinct terms remain which are
Unique terms that form the VOCABULARY
• Let
– ki be an index term i & dj be a document j
– K = (k1, k2, …, kN) is the set of all index terms
• Each term, i, in a document or query j, is given a real-
valued weight, wij.
– wij is a weight associated with (ki,dj). If wij = 0 , it
indicates that term does not belong to document dj
• The weight wij quantifies the importance of the index term
for describing the document contents
• Vec(dj) = (w1j, w2j, …, wtj) is a term weighted vector
associated with the document dj
4
Mapping Documents & Queries
 Represent both documents and queries as N-dimensional
vectors in a term-document matrix, which shows occurrence
of terms in the document collection or query
 
 E.g. d j  (t1, j , t2, j ,..., t N , j ); qk  (t1,q , t2,q ,..., t N ,q )
 An entry in the matrix corresponds to the “weight” of a
term in the document
– Document collection is mapped to
T1 T2 …. TN term-by-document matrix
D1 w11 w12 … w1N
– View as vector in multidimensional
D2 w21 w22 … w2N space
: : : :
• Nearby vectors are related
: : : :
DM wM1 wM2 … wMN – Normalize for vector length to avoid
the effect of document length
5
Weighting Terms in Vector Sapce
 The importance of the index terms is represented by weights
associated to them
 Problem: to show the importance of the index term for
describing the document/query contents, what weight we can
assign?
 Solution 1: Binary weights: t=1 if present, 0 otherwise
 Similarity: number of terms in common between the
document and the query
 Problem: Not all terms are equally interesting
 E.g. the vs. dog vs. cat
 Solution: Replace binary weights with non-binary weights
 
6
d j  (w1, j , w2, j ,..., wN , j ); qk  (w1,k , w2,k ,..., wN ,k )
The Boolean Model
• Boolean model is a simple model based on set theory
• Boolean model imposes a binary criterion
for deciding relevance
• Terms are either present or absent. Thus,
wij  {0,1}
• sim(q,dj) = 1, if document satisfies the boolean query
0 otherwise
T1 T2 …. TN
D1 w11 w12 … w1N
- Note that, no weights D2 w21 w22 … w2N
assigned in-between 0 and 1, : : : :
just only values 0 or 1 : : : :
DM wM1 wM2 … wMN
7
The Boolean Model: Example
• Generate the relevant documents retrieved by
the Boolean model for the query :
q = k1  (k2  k3)

k2
k1
d7
d2 d6
d4 d5
d3
d1

k3
8
The Boolean Model: Example
• Given the following determine documents retrieved by the
Boolean model based IR system
• Index Terms: K1, …,K8.
• Documents:
1. D1 = {K1, K2, K3, K4, K5}
2. D2 = {K1, K2, K3, K4}
3. D3 = {K2, K4, K6, K8}
4. D4 = {K1, K3, K5, K7}
5. D5 = {K4, K5, K6, K7, K8}
6. D6 = {K1, K2, K3, K4}
• Query: K1 (K2  K3)
• Answer: {D1, D2, D4, D6} ({D1, D2, D3, D6} {D3, D5})
9
= {D1, D2, D6}
The Boolean Model: Further Example
Given the following three documents, Construct Term
– document matrix and find the relevant
documents retrieved by the Boolean model for
given query • Also find the relevant
• D1: “Shipment of gold damaged in a fire” documents for the
queries:
• D2: “Delivery of silver arrived in a silver truck”
• D3: “Shipment of gold arrived in a truck” • (a) “gold delivery”;
• Query: “gold silver truck” • (b) ship gold;
• (c) “silver truck”
Table below shows document –term (ti) matrix

arrive damage deliver fire gold silver ship truck

D1
D2
D3
query
10
The Boolean Model: Further Example
a) gold silver truck: None
b) gold delivery: None
c) ship gold: D1,D3
d) silver truck: D2
arrive damage deliver fire gold silver ship truck
D1 0 1 0 1 1 0 1 0
D2 1 0 1 0 0 1 0 1
D3 1 0 0 0 1 0 1 1
Query a 0 0 0 0 1 1 0 1
Query b 0 0 1 0 1 0 0 0

Query c 0 0 0 0 1 0 1 0
Query d 0 0 0 0 0 1 0 1
11
Exercise
Given the following three documents with the following
contents:
 D1 = “computer information retrieval”
 D2 = “computer retrieval”
 D3 = “information”
 D4 = “computer information”

 What are the relevant documents retrieved for the

queries:
 Q1 = “information  retrieval”
12
 Q2 = “information  ¬computer”
D1 = “computer information retrieval” Q1 = (information  retrieval)
D2 = “computer retrieval” = {D1,D3,D4}  {D1,D2}
D3 = “information” = {D1}
D4 = “computer information”
Q2 = (information  ¬computer)
computer: {D1,D2,D4} = {D1,D3,D4}  {D3}
¬ computer: {D3} = {D3}
information: {D1,D3,D4}
retrieval: {D1,D2}

computer information retrieval

D1 1 1 1
D2 1 0 1
D3 0 1 0
D4 1 1 0
Q1 0 0 0
Q2 0 0 1
13
Exercise: What are the relevant documents retrieved for the
query: ((Caesar OR milton) AND (swift OR shakespeare))
Doc No Term 1 Term 2 Term 3 Term 4
1 Swift
2 Shakespeare
3 Shakespeare Swift
4 Milton
5 Milton Swift
6 Milton Shakespeare
7 Milton Shakespeare Swift
8 Caesar
9 Caesar Swift
10 Caesar Shakespeare
11 Caesar Shakespeare Swift
12 Caesar Milton
13 Caesar Milton Swift
14 Caesar Milton Shakespeare
15 Caesar Milton Shakespeare Swift
14
Q: ((caesar OR milton) AND (swift OR shakespeare))

Doc No Caesar Milton Shakespeare Swift

1 0 0 0 1 Caesar =
{8,9,10,11,12,13,14,15}
2 0 0 1 0
Milton=
3 0 0 1 1 {4,5,6,7,12,13,14,15}
4 0 1 0 0 Shakespeare=
{2,3,6,7,10,11,14,15}
5 0 1 0 1
Swift=
6 0 1 1 0 {1,3,5,7,9,11,13,15}
7 0 1 1 1
8 1 0 0 0 caesar OR milton
{4,5,6,7,8,9,10,11,12,13,14,15}
9 1 0 0 1
10 1 0 1 0 swift OR shakespeare
11 1 0 1 1 {1,2,3,5,6,7,9,10,11,13,14,15}
12 1 1 0 0
((caesar OR milton) AND (swift OR
13 1 1 0 1 shakespeare))
14 1 1 1 0 {5,6,7,9,10,11,13,14,15}
15 15 1 1 1 1
Drawbacks of the Boolean Model
• Retrieval based on binary decision criteria with no notion
of partial matching
• No ranking of the documents is provided (absence of a
grading scale)
• Information need has to be translated into a Boolean
expression which most users find awkward
• The Boolean queries formulated by the users are most often
too simplistic
• As a consequence, the Boolean model frequently returns
either too few or too many documents in response to a
user query

16
Vector-Space Model
• This is the most commonly used strategy for measuring
relevance of documents for a given query. This is
because,
• Use of binary weights is too limiting
• Non-binary weights provide consideration for partial
matches
• The term weights are used to compute a degree of
similarity between a query and each document
• Ranked set of documents provides for better matching
• The idea behind VSM is that
• the meaning of a document is conveyed by the words
used in that document and the weight it carries
17
Vector-Space Model
To find relevant documens for a given query:
• First, Documents and queries are mapped into term
vector space.
• Note that queries are considered as short document
• Short document mean with few words
• Second, In the vector space, queries and documents are
represented as weighted vectors
• There are different weighting technique; the most
widely used one is computing tf*idf for each term
• Third, similarity measurement is used to rank
documents by the closeness of their vectors to the query.
• Documents are ranked by closeness to the query. Closeness
is determined by a similarity score calculation
18
Term-document matrix.
 A collection of n documents and query can be represented
in the vector space model by a term-document matrix.
 An entry in the matrix corresponds to the “weight” of a term in
the document;
 zero means the term has no significance in the document or
it simply doesn’t exist in the document. Otherwise, wij > 0
whenever ki  dj

T1 T2 …. TN
D1 w11 w21 … w1N
D2 w21 w22 … w2N
: : : :
: : : :
DM wM1 wM2 … wMN
19
Computing weights
• How to compute weight for term i in document j (wij ) and
weight for term i in query q (wiq)?
• A good weight must take into account two effects:
– Quantification of intra-document contents (similarity)
• tf factor, the term frequency within a document
– Quantification of inter-documents separation
(dissimilarity)
• idf factor, the inverse document frequency across
documents
– As a result of which most IR systems are using tf*idf
weighting technique:
20
wij = tf(i,j) * idf(i)
Computing weights
• Let:
• N be the total number of documents in the collection
• ni be the number of documents which contain ki
• freq(i,j) total existence frequency of ki within dj
• A normalized tf factor is given by
• f(i,j) = freq(i,j)/max(freq(j))
• where the maximum is computed over all terms which
occur within the document dj
• The idf factor is computed as
• idf(i) = log (N/ni)
• the log is used to make the values of tf and idf
21
comparable. It can also be interpreted as the amount of
information associated with the term ki.
Computing weights
• The best term-weighting schemes use tf*idf weights
which are given by
wij = tf(i,j) * log(N/ni)

• For the query term weights, a suggestion is

wiq = (0.5 + [0.5 * freq(i,q) / max(freq(q)]) * log(N/ni)

• The vector space model with tf*idf weights is a good

ranking strategy with general collections
• The vector space model is usually as good as the known
ranking alternatives. It is also simple and fast to compute.

22
Example: Computing weights
• A collection includes 10,000 documents
• The term A appears 20 times in a particular
document
• The maximum appearance of any term in this
document is 50
• The term A appears in 2,000 of the collection
documents.

• Compute TF*IDF weight?

• f(i,j) = freq(i,j)/max(freq(l,j)) = 20/50 = 0.4
• idf(i) = log(N/ni) = log (10,000/2,000) = log(5) = 2.32
• wij = f(i,j) * log(N/ni) = 0.4 * 2.32 = 0.928
23
Similarity Measurej
dj


• Sim(q,dj) = cos() q
i
 

n
dj q wi , j qi ,k
sim(d j , q)     i 1

 
n n
dj q w 2 2
q
i 1 i, j i 1 i ,k

• Since wij > 0 and wiq > 0, 0 <= sim(q,dj) <=1

• A document is retrieved even if it matches the query
terms only partially
24
Vector-Space Model: Example
• Suppose we query for the query: Q: “gold silver
truck”. The database collection consists of three
documents with the following documents.
• D1: “Shipment of gold damaged in a fire”
• D2: “Delivery of silver arrived in a silver truck”
• D3: “Shipment of gold arrived in a truck”
• Assume that all terms are used, including common
terms, stop words, and also no terms are reduced
to root terms.
• Show retrieval results in ranked order?
25
Vector-Space Model: Example
Terms Q Counts TF DF IDF Wi = TF*IDF
D1 D2 D3 Q D1 D2 D3
a 0 1 1 1 3 0 0 0 0 0
arrived 0 0 1 1 2 0.176 0 0 0.176 0.176
damaged 0 1 0 0 1 0.477 0 0.477 0 0
delivery 0 0 1 0 1 0.477 0 0 0.477 0
fire 0 1 0 0 1 0.477 0 0.477 0 0
gold 1 1 0 1 2 0.176 0.176 0.176 0 0.176
in 0 1 1 1 3 0 0 0 0 0
of 0 1 1 1 3 0 0 0 0 0
silver 1 0 2 0 1 0.477 0.477 0 0.954 0
shipment 0 1 0 1 2 0.176 0 0.176 0 0.176
truck 1 0 1 1 2 0.176 0.176 0 0.176 0.176
Vector-Space Model
Terms Q D1 D2 D3
a 0 0 0 0
arrived 0 0 0.176 0.176
damaged 0 0.477 0 0
delivery 0 0 0.477 0
fire 0 0.477 0 0
gold 0.176 0.176 0 0.176
in 0 0 0 0
of 0 0 0 0
silver 0.477 0 0.954 0
shipment 0 0.176 0 0.176
truck 0.176 0 0.176 0.176
Vector-Space Model: Example
• Compute similarity using cosine Sim(q,d1)
• First, for each document and query, compute all vector
lengths (zero terms ignored)
|d1|= 0.477 2  0.477 2  0.1762  0.1762 = 0.517 = 0.719
|d2|= 0.1762  0.477 2  0.1762  0.1762 = 1.2001 = 1.095
|d3|=
0.1762  0.1762  0.1762  0.1762 = 0.124 = 0.352
|q|= 0.1762  0.4712  0.1762 = 0.2896 = 0.538
• Next, compute dot products (zero products ignored)
Q*d1= 0.176*0.167 = 0.0310
Q*d2 = 0.954*0.477 + 0.176 *0.176 = 0.4862
Q*d3 = 0.176*0.167 + 0.176*0.167 = 0.0620
Vector-Space Model: Example
Now, compute similarity score
Sim(q,d1) = (0.0310) / (0.538*0.719) = 0.0801
Sim(q,d1) = (0.4862 ) / (0.538*1.095)= 0.8246
Sim(q,d1) = (0.0620) / (0.538*0.352)= 0.3271
Finally, we sort and rank documents in descending
order according to the similarity scores
Rank 1: Doc 2 = 0.8246
Rank 2: Doc 3 = 0.3271
Rank 3: Doc 1 = 0.0801

• Exercise: using normalized TF, rank documents using

cosine similarity measure? Hint: Normalize TF of term i
29 in doc j using max frequency of a term k in document j.
Vector-Space Model
• Advantages:
• term-weighting improves quality of the answer set
since it displays in ranked order
• partial matching allows retrieval of documents that
approximate the query conditions
• cosine ranking formula sorts documents according
to degree of similarity to the query

• Disadvantages:
• assumes independence of index terms (??)

30
More Example
Suppose the database collection consists of the following documents.
c1: Human machine interface for Lab ABC computer
applications
c2: A survey of user opinion of computer system response time
c3: The EPS user interface management system
c4: System and human system engineering testing of EPS
c5: Relation of user-perceived response time to error measure
M1: The generation of random, binary, unordered trees
M2: The intersection graph of paths in trees
M3: Graph minors: Widths of trees and well-quasi-ordering
M4: Graph minors: A survey
Query:
Find documents relevant to "human computer interaction”
31
Exercises
 Given the following documents, rank documents according to
their relevance to the query using Cosine similarity, Euclidean
distance and Inner product measures?

docID words in document

1 Taipei Taiwan
2 Macao Taiwan Shanghai
3 Japan Sapporo
4 Sapporo Osaka Taiwan

Query: Taiwan Taiwan Sapporo ?

32
End of Chapter 4

33
Test
• Given the following Term-Document matrix and Query, perform:
1. Eucledian Distance between the Query and Each Document
2. Inner Product between the Query and Each Document
economy develop country
D1 1 3 2
D2 3 2 1
D3 2 1 0
Q 1 1 0

Information Security Material For Exit Exam (IT)
100% (1)
Information Security Material For Exit Exam (IT)
35 pages
Impacts of A Handset Leasing Model On Mobile Telcos
No ratings yet
Impacts of A Handset Leasing Model On Mobile Telcos
5 pages
IR Chap4
100% (1)
IR Chap4
32 pages
IR Chap4
100% (1)
IR Chap4
32 pages
Term Weighting and Similarity Measures
50% (2)
Term Weighting and Similarity Measures
54 pages
Pgdac QB C++&DS
No ratings yet
Pgdac QB C++&DS
6 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
77 pages
Advanced Database Technology: Ambo University
100% (1)
Advanced Database Technology: Ambo University
28 pages
Software Engineering - Mock Exit Exam 2023l24
No ratings yet
Software Engineering - Mock Exit Exam 2023l24
28 pages
National Exit Exam Term 1 and Term 2
No ratings yet
National Exit Exam Term 1 and Term 2
5 pages
PART-I: Multiple Choices: Jimma University
100% (1)
PART-I: Multiple Choices: Jimma University
6 pages
Final Municipality Document
0% (1)
Final Municipality Document
104 pages
Exit Exam For Computer Science (1-40)
No ratings yet
Exit Exam For Computer Science (1-40)
14 pages
Advanced Database System Simple Questions
No ratings yet
Advanced Database System Simple Questions
13 pages
AI Final Exam For 6kilo.
100% (2)
AI Final Exam For 6kilo.
4 pages
C Programming: Cloud IT Online Practices Problem
No ratings yet
C Programming: Cloud IT Online Practices Problem
24 pages
OOP For Exit Exam
100% (1)
OOP For Exit Exam
51 pages
Chapter 3 - Simple Sorting and Searching
100% (1)
Chapter 3 - Simple Sorting and Searching
18 pages
BSC in IT
100% (2)
BSC in IT
14 pages
Ambo University Woliso Campus
No ratings yet
Ambo University Woliso Campus
10 pages
Advanced Programming Exit Exam
No ratings yet
Advanced Programming Exit Exam
11 pages
Exit Exam Model2
No ratings yet
Exit Exam Model2
20 pages
Gondar University Model Exam (ComputerScienceExitExam) PDF
No ratings yet
Gondar University Model Exam (ComputerScienceExitExam) PDF
38 pages
Exit Exam 2017
No ratings yet
Exit Exam 2017
66 pages
Exit Exam Fundamentals of Database System 1 3
100% (1)
Exit Exam Fundamentals of Database System 1 3
5 pages
08
No ratings yet
08
69 pages
Ambo University Woliso Campus: Advanced Database For 2 Year
100% (2)
Ambo University Woliso Campus: Advanced Database For 2 Year
48 pages
Chapter 4 AI
No ratings yet
Chapter 4 AI
55 pages
Aa
No ratings yet
Aa
15 pages
Term Weighting
No ratings yet
Term Weighting
71 pages
Database Answers For Exit-Exam - by Aklilu Thomas - 2016 e
No ratings yet
Database Answers For Exit-Exam - by Aklilu Thomas - 2016 e
14 pages
University of Gondar: Information Storage and Retrieval System
100% (1)
University of Gondar: Information Storage and Retrieval System
29 pages
Prolog Lab Exercise Assignment by Tolosa Tafese
No ratings yet
Prolog Lab Exercise Assignment by Tolosa Tafese
7 pages
Online DS MCQs Paper-MCS 2nd Eve
100% (1)
Online DS MCQs Paper-MCS 2nd Eve
9 pages
Haramaya University
No ratings yet
Haramaya University
29 pages
MCQs - CSE322
100% (1)
MCQs - CSE322
19 pages
Exit Exam From Ministry of Education
No ratings yet
Exit Exam From Ministry of Education
90 pages
10 Emerging Wireless Networks: UWB, FSO, MANET, and Flash OFDM
No ratings yet
10 Emerging Wireless Networks: UWB, FSO, MANET, and Flash OFDM
39 pages
Wollo University: Object Oriented Programming
No ratings yet
Wollo University: Object Oriented Programming
4 pages
Ir MCQ-1
No ratings yet
Ir MCQ-1
22 pages
Irs Important Questions
0% (1)
Irs Important Questions
3 pages
Midterm Exam
100% (1)
Midterm Exam
3 pages
Data Communication and Computer Network ITec2102 Model Exam Questions
No ratings yet
Data Communication and Computer Network ITec2102 Model Exam Questions
3 pages
Distributed System: Naming System in DS
No ratings yet
Distributed System: Naming System in DS
51 pages
Features of OOP
100% (1)
Features of OOP
7 pages
Software Engineering (1 Marks Each
No ratings yet
Software Engineering (1 Marks Each
9 pages
Term Weighting and Similarity Measures
No ratings yet
Term Weighting and Similarity Measures
35 pages
Term Weighting 2021
100% (2)
Term Weighting 2021
38 pages
Information Storage and Retrieval: Chapter One - Introduction
No ratings yet
Information Storage and Retrieval: Chapter One - Introduction
50 pages
Royal University College: Department of Business Management
100% (1)
Royal University College: Department of Business Management
2 pages
Mid Examination
No ratings yet
Mid Examination
2 pages
Chap 3 Image Restoration and Reconstruction DD
No ratings yet
Chap 3 Image Restoration and Reconstruction DD
67 pages
Database Systems Questions and Answers
100% (2)
Database Systems Questions and Answers
14 pages
Information Retrieval MCQ PDF
100% (2)
Information Retrieval MCQ PDF
4 pages
TM03 Use Advanced Structured Query Language
No ratings yet
TM03 Use Advanced Structured Query Language
80 pages
Introduction To Information Storage and Retrieval: Chapter Four: Indexing Structure
No ratings yet
Introduction To Information Storage and Retrieval: Chapter Four: Indexing Structure
34 pages
Advanced DB Chapter-3
No ratings yet
Advanced DB Chapter-3
54 pages
Occupational Code: Els Dba3: Database Administration - L - Iii (Qualifaction Based)
No ratings yet
Occupational Code: Els Dba3: Database Administration - L - Iii (Qualifaction Based)
7 pages
DBMS MCQ
No ratings yet
DBMS MCQ
19 pages
Exit Exam Training
No ratings yet
Exit Exam Training
16 pages
Chapter 4
No ratings yet
Chapter 4
8 pages
A Distinct Method To Find The Critical Path and Total Float Under Fuzzy Environment
No ratings yet
A Distinct Method To Find The Critical Path and Total Float Under Fuzzy Environment
5 pages
READING
No ratings yet
READING
62 pages
6 SQL - Data Types and Constrains in SQL
No ratings yet
6 SQL - Data Types and Constrains in SQL
11 pages
Taiga Report v2
No ratings yet
Taiga Report v2
17 pages
(Code: 4340501) : Process Heat Transfer Course Code: 4340501
100% (1)
(Code: 4340501) : Process Heat Transfer Course Code: 4340501
9 pages
Digimon World Data Squad Walk Through
No ratings yet
Digimon World Data Squad Walk Through
15 pages
Star Wars Episode VI Return of The Jedi 1983
No ratings yet
Star Wars Episode VI Return of The Jedi 1983
102 pages
Field-Weakening Control (With MTPA) of PMSM
No ratings yet
Field-Weakening Control (With MTPA) of PMSM
12 pages
RA Working at Heights
No ratings yet
RA Working at Heights
2 pages
Kuby Immunology 7th Edition Owen HQ File Fast Access
No ratings yet
Kuby Immunology 7th Edition Owen HQ File Fast Access
317 pages
Geography Oral Presentation Script
No ratings yet
Geography Oral Presentation Script
2 pages
Princess Accessories For 18" Doll
No ratings yet
Princess Accessories For 18" Doll
5 pages
Basic English Grammar Am, Is, Are Worksheets Online
No ratings yet
Basic English Grammar Am, Is, Are Worksheets Online
2 pages
Regional Geology of Myanmar
No ratings yet
Regional Geology of Myanmar
20 pages
2021 Aeci Catlogue
No ratings yet
2021 Aeci Catlogue
162 pages
Adia004 - S3pacq4
No ratings yet
Adia004 - S3pacq4
11 pages
Unit 2: 2.1 - Mixtures, Solutions, Sovents
No ratings yet
Unit 2: 2.1 - Mixtures, Solutions, Sovents
5 pages
National Scientists of The Philippines
No ratings yet
National Scientists of The Philippines
3 pages
0818 Stran DWG001 MRDV Man 15M
No ratings yet
0818 Stran DWG001 MRDV Man 15M
80 pages
Blue Brain Final Ppt1
No ratings yet
Blue Brain Final Ppt1
21 pages
IRWM (Incl. Upto ACS 10) PDF
No ratings yet
IRWM (Incl. Upto ACS 10) PDF
262 pages
Sinteza Chimica Adamantan
No ratings yet
Sinteza Chimica Adamantan
4 pages
Synthetic Fibers and Plastics
No ratings yet
Synthetic Fibers and Plastics
38 pages
Sample Questions
No ratings yet
Sample Questions
29 pages
Design and Manufacturing of Digital MOSFET based-AVR For Synchronous Generator
No ratings yet
Design and Manufacturing of Digital MOSFET based-AVR For Synchronous Generator
7 pages
Cam Dynamics
No ratings yet
Cam Dynamics
8 pages
Section 2 Poultry Genetics Notes
No ratings yet
Section 2 Poultry Genetics Notes
4 pages
ST One Manual Ita
No ratings yet
ST One Manual Ita
68 pages
Importance of Forest in The Philippines
No ratings yet
Importance of Forest in The Philippines
39 pages

Chapter 4 IR Models

Uploaded by

Chapter 4 IR Models

Uploaded by

Introduction to

Information Storage and Retrieval

• An index term is a word useful for remembering the

arrive damage deliver fire gold silver ship truck

 What are the relevant documents retrieved for the

computer information retrieval

Doc No Caesar Milton Shakespeare Swift

• For the query term weights, a suggestion is

• The vector space model with tf*idf weights is a good

• Compute TF*IDF weight?

• Since wij > 0 and wiq > 0, 0 <= sim(q,dj) <=1

• Exercise: using normalized TF, rank documents using

docID words in document

Query: Taiwan Taiwan Sapporo ?

You might also like