0% found this document useful (0 votes)
120 views2 pages

Sheet 1

The document discusses several exercises related to efficiently evaluating Boolean queries over inverted indexes. It addresses topics like: 1) Whether queries of the form Brutus AND NOT Caesar or Brutus OR NOT Caesar can still be evaluated in linear time. 2) How to extend the postings merge algorithm to arbitrary Boolean queries and determine its time complexity. 3) Rewriting queries into disjunctive normal form and whether this would make evaluation more or less efficient. 4) Recommending an optimal processing order for a sample complex query given postings list sizes. 5) Handling negation to determine the best evaluation order for a sample query. 6) Whether processing postings lists by size is guaranteed to
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
120 views2 pages

Sheet 1

The document discusses several exercises related to efficiently evaluating Boolean queries over inverted indexes. It addresses topics like: 1) Whether queries of the form Brutus AND NOT Caesar or Brutus OR NOT Caesar can still be evaluated in linear time. 2) How to extend the postings merge algorithm to arbitrary Boolean queries and determine its time complexity. 3) Rewriting queries into disjunctive normal form and whether this would make evaluation more or less efficient. 4) Recommending an optimal processing order for a sample complex query given postings list sizes. 5) Handling negation to determine the best evaluation order for a sample query. 6) Whether processing postings lists by size is guaranteed to
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Sheet 1

Exercise 1
For the queries below, can we still run through the intersection in time O(x + y),
where x and y are the lengths of the postings lists for Brutus and Caesar? If not, what
can we achieve?
a. Brutus AND NOT Caesar
b. Brutus OR NOT Caesar

Exercise 2
Extend the postings merge algorithm to arbitrary Boolean query formulas. What is
its time complexity? For instance, consider:
a. (Brutus OR Caesar) AND NOT (Antony OR Cleopatra)
Can we always merge in linear time? Linear in what? Can we do better than this?

Exercise 3
We can use distributive laws for AND and OR to rewrite queries.
a. Show how to rewrite the query in Exercise 2 into disjunctive normal form using the
distributive laws.
b. Would the resulting query be more or less efficiently evaluated than the original form
of this query?
c. Is this result true in general or does it depend on the words and the contents of the
document collection?

Exercise 4
Recommend a query processing order for
d. (tangerine OR trees) AND (marmalade OR skies) AND (kaleidoscope OR eyes)
given the following postings list sizes:

Postings Term
size
213312 eyes
87009 kaleidoscope
107913 marmalade
271658 skies
46653 tangerine
316812 trees

Exercise 5
If the query is:
a. friends AND romans AND (NOT countrymen)
how could we use the frequency of countrymen in evaluating the best query evaluation order? In
particular, propose a way of handling negation in determining the order of query processing.
Exercise 6
For a conjunctive query, is processing postings lists in order of size guaranteed to be
optimal? Explain why it is, or give an example where it isn’t.

Exercise 7
Write out a postings merge algorithm, for an x OR y query.

Exercise 8
How should the Boolean query x AND NOT y be handled? Why is naive evaluation
of this query normally very expensive? Write out a postings merge algorithm that
evaluates this query efficiently.

Exercise 9
1. Why don’t we use grep for information retrieval?
2. Why don’t we use a relational database for information retrieval?
3. In constructing the index, which step is most expensive/complex?

You might also like