Information Retrieval: Solutions To Practice Exercises
Information Retrieval: Solutions To Practice Exercises
Information Retrieval: Solutions To Practice Exercises
Information Retrieval
19.2 Let S be a set of n keywords. An algorithm to find all documents that contain
at least k of these keywords is given below :
This algorithm calculates a reference count for each document identifier. A
reference count of i for a document identifier d means that at least i of the key-
words in S occur in the document identified by d. The algorithm maintains a
91
92 Chapter 19 Information Retrieval
list of records, each having two fields – a document identifier, and the refer-
ence count for this identifier. This list is maintained sorted on the document
identifier field.
initialize the list L to the empty list;
for (each keyword c in S) do
begin
D := the list of documents identifiers corresponding to c;
for (each document identifier d in D) do
if (a record R with document identifier as d is on list L) then
R.ref erence count := R.ref erence count + 1;
else begin
make a new record R;
R.document id := d;
R.ref erence count := 1;
add R to L;
end;
end;
for (each record R in L) do
if (R.ref erence count >= k) then
output R;
Note that execution of the second for statement causes the list D to “merge”
with the list L. Since the lists L and D are sorted, the time taken for this merge
is proportional to the sum of the lengths of the two lists. Thus the algorithm
runs in time (at most) proportional to n times the sum total of the number of
document identifiers corresponding to each keyword in S.
19.3 No answer
19.4 No answer
19.5 No answer