01 Intro
01 Intro
MCAE 0303
1
Information Retrieval Systems
2
Information Retrieval Systems
3
Take Away
4 4
Outline
❶ Introduction
❷ Inverted index
❹ Query optimization
5
Definition of Information Retrieval
6 6
7 7
8 8
Boolean Retrieval
❶ Introduction
❷ Inverted index
❹ Query optimization
10
Unstructured data in 1650: Shakespeare
11 11
Unstructured data in 1650
12 12
Term-Document Incidence Matrix
Anthony Julius Caesar The Hamlet Othello
and Tempest Macbeth
Cleopatra
ANTHONY 1 1 0 0 0 1
BRUTUS 1 1 0 1 0 0
CAESAR 1 1 0 1 1 1
CALPURNIA 0 1 0 0 0 0
CLEOPATRA 1 0 0 0 0 0
MERCY 1 0 1 1 1 1
WORSER 1 0 1 1 1 0
...
14 14
0/1 vector for BRUTUS
Anthony Julius The Hamlet Othello
and Caesar Tempest Macbeth
Cleopatra
ANTHONY 1 1 0 0 0 1
BRUTUS 1 1 0 1 0 0
CAESAR 1 1 0 1 1 1
CALPURNIA 0 1 0 0 0 0
CLEOPATRA 1 0 0 0 0 0
MERCY 1 0 1 1 1 1
WORSER 1 0 1 1 1 0
...
result: 1 0 0 1 0 0
15 15
Answers to query
16 16
17
18
Sec. 6.2
19
Sec. 8.3
Relevant Nonrelevant
Retrieved tp fp
Not Retrieved fn tn
20
• Recall = Number of pages that were retrieved and relevant / Total
number of relevant pages.
• Example: Let us say there exist a total of 5 pages labelled P1, P2, P3,
P4 and P5. Let us assume that for the query “weather in Los Angeles”,
the pages that are relevant are P3, P4 and P5 (the green pages shown
below). So the total number of relevant pages is 3. Let us assume that
a search engine returns the pages P2 and P3. So the number of
retrieved pages is 2.
21
22
• The search engine returns the pages P2 and P3 but only P3 is
relevant. So the number of pages that are retrieved and relevant is 1
(only P3).
• Precision = 1 / 2 = 0.5
23
• Now let us think about why we need both precision and recall.
• Suppose we are trying to build our own search engine. In one case,
say we design our search engine to return only one page for any
query. If that one page is relevant,
24
• If there are actually 1000 relevant pages that exist, the recall will be 1
/ 1000 which is 0.1%.
• Clearly, this system is not performing well with such a poor recall.
25
Bigger Collections
26 26
Can’t build the incidence matrix
27 27
Inverted Index
dictionary postings
28 28
Inverted Index
dictionary postings
29 29
Inverted Index
dictionary postings
30 30
Inverted index construction
❶ Collect the documents to be indexed:
32 32
Generate posting
33 33
Sort postings
34 34
Create postings lists, determine document frequency
35 35
Split the result into dictionary and postings file
dictionary postings
36 36
Questions
• The posting list in an inverted index is sorted by
• A. Term frequency
• B. Document frequency
• C. Term ID
• D. Document ID
37
Questions
• Stemming is a technique used for
• A. Tokenization
• B. Normalization
• C. Document ranking
• D. Case folding
38
Questions
• Dictionary in inverted index is sorted?
• A. because it looks good
• B. it is easy to apply linear search
• C. it is easy to apply binary search
39
Questions
• Normalization helps in:
• A. reducing dictionary size
• B. making search fast
• C. reduce posting list size
40
Draw inverted index
• Draw the inverted index that would be built for the following
document collection.
41
Later in this course
42
Outline
❶ Introduction
❷ Inverted index
❹ Query optimization
43
Simple conjunctive query (two terms)
44 44
Intersecting two posting lists
𝑎𝑛𝑠𝑤𝑒𝑟 = <>
𝑑𝑜𝑐𝐼𝐷(𝑝1 ) = 1
𝑑𝑜𝑐𝐼𝐷(𝑝2 ) = 2
45 45
Intersecting two posting lists
𝑎𝑛𝑠𝑤𝑒𝑟 = <>
𝑑𝑜𝑐𝐼𝐷(𝑝1 ) = 1
𝑑𝑜𝑐𝐼𝐷(𝑝2 ) = 2
46 46
Intersecting two posting lists
𝑎𝑛𝑠𝑤𝑒𝑟 = <>
𝑑𝑜𝑐𝐼𝐷(𝑝1 ) = 1 2
𝑑𝑜𝑐𝐼𝐷(𝑝2 ) = 2
47 47
Intersecting two posting lists
𝑎𝑛𝑠𝑤𝑒𝑟 = <>
𝑑𝑜𝑐𝐼𝐷(𝑝1 ) = 1 2
𝑑𝑜𝑐𝐼𝐷(𝑝2 ) = 2
48 48
Intersecting two posting lists
𝑎𝑛𝑠𝑤𝑒𝑟 = <>
𝑑𝑜𝑐𝐼𝐷(𝑝1 ) = 1 2
𝑑𝑜𝑐𝐼𝐷(𝑝2 ) = 2
49 49
Intersecting two posting lists
50 50
Intersecting two posting lists
51 51
Intersecting two posting lists
52 52
Intersecting two posting lists
53 53
Intersecting two posting lists
54 54
Intersecting two posting lists
55 55
Intersecting two posting lists
56 56
Intersecting two posting lists
57 57
Intersecting two posting lists
58 58
Intersecting two posting lists
59 59
Intersecting two posting lists
60 60
Intersecting two posting lists
61 61
Intersecting two posting lists
62 62
Intersecting two posting lists
63 63
Intersecting two posting lists
64 64
Intersecting two posting lists
65 65
Intersecting two posting lists
66 66
Intersecting two posting lists
67 67
Intersecting two posting lists
68 68
Intersecting two posting lists
69 69
Intersecting two posting lists
70 70
Intersecting two posting lists
71 71
Intersecting two posting lists
72 72
Intersecting two posting lists
73 73
Intersecting two posting lists
74 74
Intersecting two posting lists
75 75
Intersecting two posting lists
76 76
Intersecting two posting lists
77 77
Intersecting two posting lists
78 78
Intersecting two posting lists
79 79
Intersecting two posting lists
80 80
Intersecting two posting lists
81 81
Intersecting two posting lists
82 82
Intersecting two posting lists
83 83
Intersecting two posting lists
84 84
Intersecting two posting lists
85 85
Intersecting two posting lists
86 86
Intersecting two posting lists
87 87
Intersecting two posting lists
88 88
Intersecting two posting lists
89 89
Intersecting two posting lists
90 90
Intersecting two posting lists
91 91
Intersecting two posting lists
92 92
Intersecting two posting lists
93 93
Intersecting two posting lists
94 94
Intersecting two posting lists
95 95
Intersecting two posting lists
96 96
Intersecting two posting lists
97 97
Intersecting two posting lists
98 98
Intersecting two posting lists
99 99
Intersecting two posting lists
100 100
Intersecting two posting lists
101 101
Intersecting two posting lists
102 102
Intersecting two posting lists
103 103
Intersecting two posting lists
104 104
Intersecting two posting lists
105 105
Intersecting two posting lists
106 106
Intersecting two posting lists
107 107
Intersecting two posting lists
108 108
Intersecting two posting lists
2 4 8 16 32 64 128 Brutus
2 8
1 2 3 5 8 13 21 34 Caesar
109 109
Query processing: Exercise
110 110
Boolean queries
The Boolean retrieval model can answer any query that is a
Boolean expression.
Boolean queries are queries that use AND, OR and NOT to join
query terms.
Views each document as a set of terms.
Is precise: Document matches condition or not.
Primary commercial retrieval tool for 3 decades
Many professional searchers (e.g., lawyers) still like Boolean
queries.
You know exactly what you are getting.
Many search systems you use are also Boolean: spotlight,
email, intranet etc.
111 111
Commercially successful Boolean retrieval: Westlaw
Largest commercial legal search service in terms of the number of
paying subscribers
Over half a million subscribers performing millions of searches a day
over tens of terabytes of text data
The service was started in 1975.
In 2005, Boolean search (called “Terms and Connectors” by Westlaw)
was still the default, and used by a large percentage of users . . .
. . . although ranked retrieval has been available since 1992.
112 112
OR two posting lists
2 4 8 16 32 64 128 Brutus
1 2 3 5 8 13 21 34 Caesar
1 2 3 4 5 8 13 16 21 32 34 64 128
113
OR two posting lists 2 4 8 16 32 64 128 Brutus
1 2 3 5 8 13 21 34 Caesar
1 2 3 4 5 8 13 16 21 32 34 64 128
• 𝑂𝑅 𝑝1 , 𝑝2
1. 𝑎𝑛𝑠𝑤𝑒𝑟 ← <>
2. 𝒘𝒉𝒊𝒍𝒆 𝑝1 ≠ 𝑁𝐼𝐿 𝑎𝑛𝑑 𝑝2 ≠ 𝑁𝐼𝐿
3. 𝒅𝒐 𝒊𝒇 𝑑𝑜𝑐𝐼𝐷 𝑝1 = 𝑑𝑜𝑐𝐼𝐷 𝑝2
4. 𝒕𝒉𝒆𝒏 𝐴𝐷𝐷 𝑎𝑛𝑠𝑤𝑒𝑟, 𝑑𝑜𝑐𝐼𝐷 𝑝1 While break as 𝑝2 becomes NIL
5. 𝑝1 ← 𝑛𝑒𝑥𝑡 𝑝1 but
6. 𝑝2 ← 𝑛𝑒𝑥𝑡(𝑝2) 64 and 128 should also be
7. 𝒆𝒍𝒔𝒆 𝒊𝒇 𝑑𝑜𝑐𝐼𝐷 𝑝1 < 𝑑𝑜𝑐𝐼𝐷 𝑝2 part of answer!
8. 𝒕𝒉𝒆𝒏 𝐴𝐷𝐷 𝑎𝑛𝑠𝑤𝑒𝑟, 𝑑𝑜𝑐𝐼𝐷 𝑝1
9. 𝑝1 ← 𝑛𝑒𝑥𝑡 𝑝1
10. 𝒆𝒍𝒔𝒆 𝐴𝐷𝐷 𝑎𝑛𝑠𝑤𝑒𝑟, 𝑑𝑜𝑐𝐼𝐷 𝑝2
11. 𝑝2 ← 𝑛𝑒𝑥𝑡 𝑝2
114
OR two posting lists 2 4 8 16 32 64 128 Brutus
1 2 3 5 8 13 21 34 Caesar
1 2 3 4 5 8 13 16 21 32 34 64 128
• 𝑂𝑅 𝑝1 , 𝑝2
1. 𝑎𝑛𝑠𝑤𝑒𝑟 ← <>
2. 𝒘𝒉𝒊𝒍𝒆 𝑝1 ≠ 𝑁𝐼𝐿 𝑎𝑛𝑑 𝑝2 ≠ 𝑁𝐼𝐿
3. 𝒅𝒐 𝒊𝒇 𝑑𝑜𝑐𝐼𝐷 𝑝1 = 𝑑𝑜𝑐𝐼𝐷 𝑝2
4. 𝒕𝒉𝒆𝒏 𝐴𝐷𝐷 𝑎𝑛𝑠𝑤𝑒𝑟, 𝑑𝑜𝑐𝐼𝐷 𝑝1 While break as 𝑝2 becomes NIL
5. 𝑝1 ← 𝑛𝑒𝑥𝑡 𝑝1 but
6. 𝑝2 ← 𝑛𝑒𝑥𝑡(𝑝2) 64 and 128 should also be
7. 𝒆𝒍𝒔𝒆 𝒊𝒇 𝑑𝑜𝑐𝐼𝐷 𝑝1 < 𝑑𝑜𝑐𝐼𝐷 𝑝2 part of answer!
8. 𝒕𝒉𝒆𝒏 𝐴𝐷𝐷 𝑎𝑛𝑠𝑤𝑒𝑟, 𝑑𝑜𝑐𝐼𝐷 𝑝1
9. 𝑝1 ← 𝑛𝑒𝑥𝑡 𝑝1
10. 𝒆𝒍𝒔𝒆 𝐴𝐷𝐷 𝑎𝑛𝑠𝑤𝑒𝑟, 𝑑𝑜𝑐𝐼𝐷 𝑝2
11. 𝑝2 ← 𝑛𝑒𝑥𝑡 𝑝2
115
OR two posting lists 2 4 8 16 32 64 128 Brutus
1 2 3 5 8 13 21 34 Caesar
1 2 3 4 5 8 13 16 21 32 34 64 128
• 𝑂𝑅 𝑝1 , 𝑝2
1. 𝑎𝑛𝑠𝑤𝑒𝑟 ← <>
2. 𝒘𝒉𝒊𝒍𝒆 𝑝1 ≠ 𝑁𝐼𝐿 𝑜𝑟 𝑝2 ≠ 𝑁𝐼𝐿
3. 𝒅𝒐 𝒊𝒇 𝑑𝑜𝑐𝐼𝐷 𝑝1 = 𝑑𝑜𝑐𝐼𝐷 𝑝2
4. 𝒕𝒉𝒆𝒏 𝐴𝐷𝐷 𝑎𝑛𝑠𝑤𝑒𝑟, 𝑑𝑜𝑐𝐼𝐷 𝑝1 While break as 𝑝2 becomes NIL
5. 𝑝1 ← 𝑛𝑒𝑥𝑡 𝑝1 but
6. 𝑝2 ← 𝑛𝑒𝑥𝑡(𝑝2) 64 and 128 should also be
7. 𝒆𝒍𝒔𝒆 𝒊𝒇 𝑑𝑜𝑐𝐼𝐷 𝑝1 < 𝑑𝑜𝑐𝐼𝐷 𝑝2 part of answer!
8. 𝒕𝒉𝒆𝒏 𝐴𝐷𝐷 𝑎𝑛𝑠𝑤𝑒𝑟, 𝑑𝑜𝑐𝐼𝐷 𝑝1
9. 𝑝1 ← 𝑛𝑒𝑥𝑡 𝑝1
10. 𝒆𝒍𝒔𝒆 𝐴𝐷𝐷 𝑎𝑛𝑠𝑤𝑒𝑟, 𝑑𝑜𝑐𝐼𝐷 𝑝2
11. 𝑝2 ← 𝑛𝑒𝑥𝑡 𝑝2
116
• 𝑂𝑅 𝑝1 , 𝑝2
2 4 8 16 32 64 128 Brutus
1. 𝑎𝑛𝑠𝑤𝑒𝑟 ← <>
2. 𝒘𝒉𝒊𝒍𝒆 𝑝1 ≠ 𝑁𝐼𝐿 𝑜𝑟 𝑝2 ≠ 𝑁𝐼𝐿 1 2 3 5 8 13 21 34 Caesar
3. 𝒅𝒐 𝒊𝒇 𝑑𝑜𝑐𝐼𝐷 𝑝1 = 𝑑𝑜𝑐𝐼𝐷 𝑝2
4. 𝒕𝒉𝒆𝒏 𝐴𝐷𝐷 𝑎𝑛𝑠𝑤𝑒𝑟, 𝑑𝑜𝑐𝐼𝐷 𝑝1
5. 𝑝1 ← 𝑛𝑒𝑥𝑡 𝑝1
6. 𝑝2 ← 𝑛𝑒𝑥𝑡(𝑝2)
7. 𝒆𝒍𝒔𝒆 𝒊𝒇 𝑑𝑜𝑐𝐼𝐷 𝑝1 < 𝑑𝑜𝑐𝐼𝐷 𝑝2
8. 𝒕𝒉𝒆𝒏 𝐴𝐷𝐷 𝑎𝑛𝑠𝑤𝑒𝑟, 𝑑𝑜𝑐𝐼𝐷 𝑝1 Not dry run!
9. 𝑝1 ← 𝑛𝑒𝑥𝑡 𝑝1 To check if this change is
10. 𝒆𝒍𝒔𝒆 𝐴𝐷𝐷 𝑎𝑛𝑠𝑤𝑒𝑟, 𝑑𝑜𝑐𝐼𝐷 𝑝2 enough?
11. 𝑝2 ← 𝑛𝑒𝑥𝑡 𝑝2
12. 𝒓𝒆𝒕𝒖𝒓𝒏 𝑎𝑛𝑠𝑤𝑒𝑟
117
• 𝑂𝑅 𝑝1 , 𝑝2
2 4 8 16 32 64 128 Brutus
1. 𝑎𝑛𝑠𝑤𝑒𝑟 ← <>
2. 𝒘𝒉𝒊𝒍𝒆 𝑝1 ≠ 𝑁𝐼𝐿 𝑜𝑟 𝑝2 ≠ 𝑁𝐼𝐿 1 2 3 5 8 13 21 34 Caesar
3. 𝒅𝒐 𝒊𝒇 𝑑𝑜𝑐𝐼𝐷 𝑝1 = 𝑑𝑜𝑐𝐼𝐷 𝑝2
4. 𝒕𝒉𝒆𝒏 𝐴𝐷𝐷 𝑎𝑛𝑠𝑤𝑒𝑟, 𝑑𝑜𝑐𝐼𝐷 𝑝1
5. 𝑝1 ← 𝑛𝑒𝑥𝑡 𝑝1
6. 𝑝2 ← 𝑛𝑒𝑥𝑡(𝑝2)
7. 𝒆𝒍𝒔𝒆 𝒊𝒇 𝑑𝑜𝑐𝐼𝐷 𝑝1 < 𝑑𝑜𝑐𝐼𝐷 𝑝2
8. 𝒕𝒉𝒆𝒏 𝐴𝐷𝐷 𝑎𝑛𝑠𝑤𝑒𝑟, 𝑑𝑜𝑐𝐼𝐷 𝑝1 but only this change in while
9. 𝑝1 ← 𝑛𝑒𝑥𝑡 𝑝1 statement is not enough!
10. 𝒆𝒍𝒔𝒆 𝐴𝐷𝐷 𝑎𝑛𝑠𝑤𝑒𝑟, 𝑑𝑜𝑐𝐼𝐷 𝑝2
11. 𝑝2 ← 𝑛𝑒𝑥𝑡 𝑝2
12. 𝒓𝒆𝒕𝒖𝒓𝒏 𝑎𝑛𝑠𝑤𝑒𝑟
118
• 𝑂𝑅 𝑝1 , 𝑝2
1. 𝑎𝑛𝑠𝑤𝑒𝑟 ← <> 2 4 8 16 32 64 128 Brutus
2.
3.
𝒘𝒉𝒊𝒍𝒆 𝑝1 ≠ 𝑁𝐼𝐿 𝑜𝑟 𝑝2 ≠ 𝑁𝐼𝐿
𝒅𝒐 𝒊𝒇 𝑑𝑜𝑐𝐼𝐷 𝑝1 = 𝑑𝑜𝑐𝐼𝐷 𝑝2
1 2 3 5 8 13 21 34 Caesar
4. 𝒕𝒉𝒆𝒏 𝐴𝐷𝐷 𝑎𝑛𝑠𝑤𝑒𝑟, 𝑑𝑜𝑐𝐼𝐷 𝑝1
5. 𝑝1 ← 𝑛𝑒𝑥𝑡 𝑝1
6. 𝑝2 ← 𝑛𝑒𝑥𝑡(𝑝2)
7. 𝒆𝒍𝒔𝒆 𝒊𝒇 𝑑𝑜𝑐𝐼𝐷 𝑝1 < 𝑑𝑜𝑐𝐼𝐷 𝑝2
8. 𝒕𝒉𝒆𝒏 𝐴𝐷𝐷 𝑎𝑛𝑠𝑤𝑒𝑟, 𝑑𝑜𝑐𝐼𝐷 𝑝1
9. 𝑝1 ← 𝑛𝑒𝑥𝑡 𝑝1
10. 𝒆𝒍𝒔𝒆 𝐴𝐷𝐷 𝑎𝑛𝑠𝑤𝑒𝑟, 𝑑𝑜𝑐𝐼𝐷 𝑝2
11. 𝑝2 ← 𝑛𝑒𝑥𝑡 𝑝2 Now dry run and verify!
12. 𝒊𝒇 𝑝1 = 𝑁𝐼𝐿 𝑎𝑛𝑑 𝑝2 ≠ 𝑁𝐼𝐿
13. 𝒕𝒉𝒆𝒏 𝐴𝐷𝐷 𝑎𝑛𝑠𝑤𝑒𝑟, 𝑑𝑜𝑐𝐼𝐷 𝑝2
14. 𝑝2 ← 𝑛𝑒𝑥𝑡 𝑝2
15. 𝒊𝒇 𝑝1 ≠ 𝑁𝐼𝐿 𝑎𝑛𝑑 𝑝2 = 𝑁𝐼𝐿
16. 𝒕𝒉𝒆𝒏 𝐴𝐷𝐷 𝑎𝑛𝑠𝑤𝑒𝑟, 𝑑𝑜𝑐𝐼𝐷 𝑝1
17. 𝑝1 ← 𝑛𝑒𝑥𝑡 𝑝1
18. 𝒓𝒆𝒕𝒖𝒓𝒏 𝑎𝑛𝑠𝑤𝑒𝑟
119
What is the disadvantage of Boolean retrieval
model?
• a) Easy to implement
120
What is the disadvantage of Boolean retrieval
model?
• a) Easy to implement
121
An inverted index is a database index that
____.
• a) stores, for each term t, the list of all documents that contain term t
122
An inverted index is a database index that
____.
• a) stores, for each term t, the list of all documents that contain term t
123
Boolean queries often result in:
• A. Too many or too few results
• B. None of the above.
• C. Too few results
• D. Too many results.
124
Boolean queries often result in:
• A. Too many or too few results
• B. None of the above.
• C. Too few results
• D. Too many results.
125
Term-document incidence matrix is:
• A. Sparse
• B. Depends upon the data
• C. Dense
• D. Cannot predict
126
Postings list should be sorted by:
• A. Document Frequency
• B. DocID
• C. TermID
• D. Term frequency
127
Postings list should be sorted by:
• A. Document Frequency
• B. DocID
• C. TermID
• D. Term frequency
128
A large repository of documents in IR is called
as:
• A. Corpus
• B. Database
• C. Dictionary
• D. Collection
129
A large repository of documents in IR is called
as:
• A. Corpus
• B. Database
• C. Dictionary
• D. Collection
130
Basic Terminologies
131
NOT Brutus
1
2
Document
Collection 10
133
A better idea to build a term-document matrix
is ______ where we record only the things that
do occur and their links
• A. Incidence matrix.
• B. Adjacency matrix.
• C. index
• D. Inverted index
134
Calculate the posting list for PARIS and LEAR
• A. 15
• B. 12
• C. 6
• D. 10
135
Calculate the posting list for PARIS OR LEAR
136
Calculate the posting list for PARIS
𝐴𝑁𝐷 𝑁𝑂𝑇 LEAR
137
And Not two posting lists
𝐴𝑛𝑑𝑁𝑜𝑡 𝑝1 , 𝑝2 2 4 8 16 32 64 128 Brutus
1. 𝑎𝑛𝑠𝑤𝑒𝑟 ← <>
2. 𝒘𝒉𝒊𝒍𝒆 𝑝1 ≠ 𝑁𝐼𝐿 1 2 3 5 8 13 21 34 Caesar
3. 𝒅𝒐 𝒊𝒇 𝑑𝑜𝑐𝐼𝐷 𝑝1 = 𝑑𝑜𝑐𝐼𝐷 𝑝2
4. 𝒕𝒉𝒆𝒏 𝑝1 ← 𝑛𝑒𝑥𝑡 𝑝1
5. 𝑝2 ← 𝑛𝑒𝑥𝑡(𝑝2)
6. 𝒆𝒍𝒔𝒆 𝒊𝒇 𝑑𝑜𝑐𝐼𝐷 𝑝1 < 𝑑𝑜𝑐𝐼𝐷 𝑝2 or p2 = 𝑁𝐼𝐿
7. 𝒕𝒉𝒆𝒏 𝐴𝐷𝐷 𝑎𝑛𝑠𝑤𝑒𝑟, 𝑑𝑜𝑐𝐼𝐷 𝑝1
8. 𝑝1 ← 𝑛𝑒𝑥𝑡 𝑝1
9. 𝒆𝒍𝒔𝒆 𝑝2 ← 𝑛𝑒𝑥𝑡 𝑝2
10. 𝒓𝒆𝒕𝒖𝒓𝒏 𝑎𝑛𝑠𝑤𝑒𝑟
138
Westlaw: Example queries
Information need: Information on the legal theories involved in
preventing the disclosure of trade secrets by employees formerly
employed by a competing company Query: “trade secret” /s
disclos! /s prevent /s employe! Information need: Requirements
139 139
Westlaw: Comments
Proximity operators: /3 = within 3 words, /s = within a
sentence, /p = within a paragraph
Space is disjunction, not conjunction! (This was the default
in search pre-Google.)
Long, precise queries: incrementally developed, not like
web search
Why professional searchers often like Boolean search:
precision, transparency, control
When are Boolean queries the best way of searching?
Depends on: information need, searcher, document
collection, . . .
140 140
Outline
❶ Introduction
❷ Inverted index
❹ Query optimization
141
Query optimization
142 142
Query optimization
143 143
Optimized intersection algorithm for
conjunctive queries
144 144
More general optimization
145 145