Information Retrieval: Solutions To Practice Exercises
Information Retrieval: Solutions To Practice Exercises
Information Retrieval
Q# #wo-rds 1 4 5 6 7 8 9 14 15 84 22 46 22 33 32 77 30 26
# #relaSQL -tion 1 0 1 1 1 1 0 1 1 1 1 1 0 1 3 1 0 1
relation SQL SQL term freq. term freq. relv. 0.0170 0.0000 0.0310 0.0641 0.0430 0.0443 0.0000 0.0473 0.0544 0.0170 0.0641 0.0310 0.0000 0.0430 0.1292 0.0186 0.0000 0.0544 0.0002 0.0000 0.0006 0.0029 0.0013 0.0013 0.0000 0.0015 0.0020
relation relv. 0.0002 0.0029 0.0006 0.0000 0.0013 0.0040 0.0002 0.0000 0.0020
Tota relv. 0.0004 0.0029 0.0013 0.0029 0.0026 0.0054 0.0002 0.0015 0.0041
19.2 Let S be a set of n keywords. An algorithm to nd all documents that contain at least k of these keywords is given below : This algorithm calculates a reference count for each document identier. A reference count of i for a document identier d means that at least i of the keywords in S occur in the document identied by d. The algorithm maintains a
91
92
Chapter 19
Information Retrieval
list of records, each having two elds a document identier, and the reference count for this identier. This list is maintained sorted on the document identier eld. initialize the list L to the empty list; for (each keyword c in S) do begin D := the list of documents identiers corresponding to c; for (each document identier d in D) do if (a record R with document identier as d is on list L) then R.ref erence count := R.ref erence count + 1; else begin make a new record R; R.document id := d; R.ref erence count := 1; add R to L; end; end; for (each record R in L) do if (R.ref erence count >= k) then output R; Note that execution of the second for statement causes the list D to merge with the list L. Since the lists L and D are sorted, the time taken for this merge is proportional to the sum of the lengths of the two lists. Thus the algorithm runs in time (at most) proportional to n times the sum total of the number of document identiers corresponding to each keyword in S. 19.3 No answer 19.4 No answer 19.5 No answer