0% found this document useful (0 votes)
194 views2 pages

Information Retrieval: Solutions To Practice Exercises

The document summarizes an algorithm to find documents that contain at least k keywords out of a set of n keywords. It works as follows: 1) Initialize an empty list L to store records of document identifiers and their reference counts. 2) For each keyword, find the list of matching document IDs and increment the reference count of any matching IDs already in L, or add a new record if not present. 3) After processing all keywords, output any records from L with a reference count of at least k, identifying documents with that minimum number of matching keywords. The algorithm runs in linear time relative to the number of keywords and total number of document IDs for all keywords.

Uploaded by

Bada Sainath
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
194 views2 pages

Information Retrieval: Solutions To Practice Exercises

The document summarizes an algorithm to find documents that contain at least k keywords out of a set of n keywords. It works as follows: 1) Initialize an empty list L to store records of document identifiers and their reference counts. 2) For each keyword, find the list of matching document IDs and increment the reference count of any matching IDs already in L, or add a new record if not present. 3) After processing all keywords, output any records from L with a reference count of at least k, identifying documents with that minimum number of matching keywords. The algorithm runs in linear time relative to the number of keywords and total number of document IDs for all keywords.

Uploaded by

Bada Sainath
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

C H A P T E R

Information Retrieval

Solutions to Practice Exercises


19.1 We do not consider the questions containing neither of the keywords as their relevance to the keywords is zero. The number of words in a question include stop words. We use the equations given in Section 19.2.1 to compute relevance; the log term in the equation is assumed to be to the base 2.

Q# #wo-rds 1 4 5 6 7 8 9 14 15 84 22 46 22 33 32 77 30 26

# #relaSQL -tion 1 0 1 1 1 1 0 1 1 1 1 1 0 1 3 1 0 1

relation SQL SQL term freq. term freq. relv. 0.0170 0.0000 0.0310 0.0641 0.0430 0.0443 0.0000 0.0473 0.0544 0.0170 0.0641 0.0310 0.0000 0.0430 0.1292 0.0186 0.0000 0.0544 0.0002 0.0000 0.0006 0.0029 0.0013 0.0013 0.0000 0.0015 0.0020

relation relv. 0.0002 0.0029 0.0006 0.0000 0.0013 0.0040 0.0002 0.0000 0.0020

Tota relv. 0.0004 0.0029 0.0013 0.0029 0.0026 0.0054 0.0002 0.0015 0.0041

19.2 Let S be a set of n keywords. An algorithm to nd all documents that contain at least k of these keywords is given below : This algorithm calculates a reference count for each document identier. A reference count of i for a document identier d means that at least i of the keywords in S occur in the document identied by d. The algorithm maintains a
91

92

Chapter 19

Information Retrieval

list of records, each having two elds a document identier, and the reference count for this identier. This list is maintained sorted on the document identier eld. initialize the list L to the empty list; for (each keyword c in S) do begin D := the list of documents identiers corresponding to c; for (each document identier d in D) do if (a record R with document identier as d is on list L) then R.ref erence count := R.ref erence count + 1; else begin make a new record R; R.document id := d; R.ref erence count := 1; add R to L; end; end; for (each record R in L) do if (R.ref erence count >= k) then output R; Note that execution of the second for statement causes the list D to merge with the list L. Since the lists L and D are sorted, the time taken for this merge is proportional to the sum of the lengths of the two lists. Thus the algorithm runs in time (at most) proportional to n times the sum total of the number of document identiers corresponding to each keyword in S. 19.3 No answer 19.4 No answer 19.5 No answer

You might also like