Information Retrieval: Assignment 1: Problem 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Information Retrieval: Assignment 1

Problem 1. (10 points)


Try and find a query of the form [query-term-1 query-term-2] (without
quotes) that, when run on Google, produces at least one result that either
does not contain query-term-1 or does not contain query-term-2. That is,
try to find an example where Google does not interpret a two-term query
as a conjunction. (If you have difficulty with finding an appropriate query,
try one that produces very few hits, say, fewer than 20.) (i) Print out the
first page of Google results (or more if you want to) and mark each result
with 2 (both terms occur on the page), 1 (one term occurs on the page) or 0
(neither term occurs on the page) (ii) Based on this evidence, does Google
interpret all queries as a Boolean conjunction?

Problem 2. (12 points)


Shown below is a portion of a positional index in the format: term: doc1:
hposition1, position2, . . . i; doc2: hposition1, position2, . . . i; etc.

angels: 2: h36,174,252,651i; 4: h12,22,102,432i; 7: h17i;


fools: 2: h1,17,74,222i; 4: h8,78,108,458i; 7: h3,13,23,193i;
fear: 2: h87,704,722,901i; 4: h13,43,113,433i; 7: h18,328,528i;
in: 2: h3,37,76,444,851i; 4: h10,20,110,470,500i; 7: h5,15,25,195i;
rush: 2: h2,66,194,321,702i; 4: h9,69,149,429,569i; 7: h4,14,404i;
to: 2: h47,86,234,999i; 4: h14,24,774,944i; 7: h199,319,599,709i;
tread: 2: h57,94,333i; 4: h15,35,155i; 7: h20,320i;
where: 2: h67,124,393,1001i; 4: h11,41,101,421,431i; 7: h15,35,735i;

Which document(s) (if any) match each of the following queries where each
expression within quotes is a phrase query? (i) “fools rush in” (ii) “fools
rush in” AND “angels fear to tread”. At which positions do the queries
match? (iii) There is something wrong with this positional index. What is
the problem?

Problem 3. (10 points)


Compute the Levenshtein matrix for the distance between the strings
“apfel” (input) and “poems” (output). Use this format (as introduced in
class):

1
f a s t

0 1 1 2 2 3 3 4 4
1 1 2 2 3 3 4 4 5
c
1 2 1 2 2 3 3 4 4
2 2 2 1 3 3 4 4 5
a
2 3 2 3 1 2 2 3 3
3 3 3 3 2 2 3 2 4
t
3 4 3 4 2 3 2 3 2
4 4 4 4 3 2 3 3 3
s
4 5 4 5 3 4 2 3 3

Problem 4. (10 points)


We saw in class that the Levenshtein sequence of operations for con-
verting strings into each other is not unique. For example, “cat” can be
transformed into “catcat” either by insert, insert, insert, copy, copy, copy or
by copy, copy, copy, insert, insert, insert. In contrast, the minimum number
of cost-1 Levenshtein operations for converting one string to another is fixed
since the minimum is unique. Possible cost-1 operations are insert, delete
and replace. Let ni , nd , nr be the number of inserts, deletes and replaces
in a sequence of operations. Give an example of a pair of strings and two
different sequences of operations σ1 and σ2 that convert the first string into
the second such that ni (σ1 ) 6= ni (σ2 ) or nd (σ1 ) 6= nd (σ2 ) or nr (σ1 ) 6= nr (σ2 ).
Or prove that this is not possible.

Problem 5. (1 points)
If you wanted to search for s*ng in a permuterm wildcard index, what
key(s) would one do the lookup on?

Due date: Thursday, April 17, 2014, 12:15


Send your assignment to [email protected] or turn it in in class.

You might also like