0% found this document useful (0 votes)

98 views44 pages

IR ch4 - Inverted-Index

The document discusses inverted indexes, which are a common data structure used in information retrieval systems like search engines. An inverted index stores a list of documents that contain each word in the vocabulary. It discusses how inverted indexes are constructed by parsing documents, building term-document matrices, sorting the postings lists, merging entries, and writing the results to a dictionary file and postings file. It also compares different implementations of inverted indexes using arrays, linked lists, B-trees, hash tables, and discusses issues like dynamic indexing when documents are frequently added, deleted or updated.

Uploaded by

Bushra Mamoud

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

98 views44 pages

IR ch4 - Inverted-Index

Uploaded by

Bushra Mamoud

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

CS444: Information Retrieval

and Web Search

Fall 2021

CHAPTER 4:
INVERTED INDEX
Abstraction of search engine architecture
Indexed corpus
Crawler
Ranking procedure

Feedback Evaluation
Doc Analyzer
(Query)
Doc Representation
Query Rep User

Indexer Index Ranker results

CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 2
INVERTED
INDEX
Inverted Files: Main Concepts
In practice, document vectors are not stored directly; an inverted organization provides much
better efficiency.
Inverted index: a word-oriented mechanism for indexing a text collection to speed up the
searching task
The inverted index structure is composed of two elements: the vocabulary and the
occurrences
The vocabulary is the set of all different words in the text
For each token in the vocabulary the index stores the documents which contain that word
(inverted index)
The keyword-to-document index can be implemented as:
◦ a sorted array, a tree-based data structure (trie, B-tree),a hash table

CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 3
Inverted Files: Main Concepts
Term-document matrix: the simplest way to represent the documents that contain each word of
the vocabulary

CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 4
Inverted Files: Main Concepts
The main problem of this simple solution is that it requires too much space
As this is a sparse matrix, the solution is to associate a list of documents with
each word
The set of all those lists is called the occurrences

CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 5
Inverted Index Construction: Steps
For each term T, we must store a list of all documents that contain T.
• Do we use an array or a list for this?
Keyword-To – Documents Relation

. .
(KEYWORDS) (Documents contains KEYWORDS)

CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 6
Inverted Index Construction: Steps
Linked lists generally preferred to arrays
• Dynamic space allocation
• Insertion of terms into documents easy
• Space overhead of pointers

CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 7
Inverted Index Construction: Steps
• Sequence of (Modified token, Document ID) pairs.

CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 8
Inverted Index Construction:
Sorting
Sort by terms

CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 9
Inverted Index Construction:
Merge
Multiple term entries in a single document are merged.
• Frequency information is added.

CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 10
Inverted Index Construction:
Steps
• The result is split into
a Dictionary file and
a Postings file.

Dictionary file Postings file

CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 11
Inverted Index Construction:
Sorting issues
As we build index, we parse documents one at a time.
▶ The final postings for any term are incomplete until the end.
▶ Can we keep all postings in memory and then do the sort in-memory at the end?
▶ No, not for large collections
▶ At 10–12 bytes per postings entry, we need a lot of space for large collections.
▶ In-memory index construction does not scale for large collections.
▶ Thus: We need to store intermediate results on disk.
▶ the same issues will be on disk.
▶ We need an external sorting algorithm (using few disk seeks).

CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 12
“External” sorting algorithm
To reduce space requirements, a technique called block addressing is used
The documents are divided into blocks, and the occurrences point to the blocks where the
word appears
We can easily fit that many postings into memory.
Basic idea of algorithm:
For each block:
accumulate postings,
sort in memory,
write to disk

Then merge the blocks into one long sorted order.

CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 13
CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 14
“External” sorting algorithm

CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 15
“External” sorting algorithm PROBLEM
Our assumption was: we can keep the dictionary in memory.
We need the dictionary (which grows dynamically) in order to implement a term to termID
mapping.
Actually, we could work with term,docID postings instead of termID,docID postings . . .
. . . but then intermediate files become very large. (We would end up with a scalable, but very
slow index construction method.)

CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 16
“External” sorting algorithm SOLUTION
Key idea 1: Generate separate dictionaries for each block – no need to maintain term-termID
mapping across blocks.
Key idea 2: Don’t sort. Accumulate postings in postings lists as they occur.
With these two ideas we can generate a complete inverted index for each block.
These separate indexes can then be merged into one big index.

CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 17
Other index implementations
Vocabulary and therefore dimensionality of vectors can be very large, ~104 .
However, most documents and queries do not contain most words, so vectors are sparse (i.e.
most entries are 0).
Need efficient methods for storing and computing with sparse vectors.
We showed sparse vectors as Linked Lists (sorted arrays)
Store vectors as linked lists:
◦ Space proportional to number of unique tokens (n) in document.
◦ Requires linear search of the list to find (or change) a specific token.
◦ Requires quadratic time in worst case to compute vector for a document:
O(n 2 )

CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 18
Inverted Index as B Trees
Index tokens in a document in a balanced binary tree with weights stored with tokens at the
leaves.

CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 19
B+ Tree
B+ tree stores data pointers only at the leaf nodes of the tree.
data pointers are present only at the leaf nodes,
◦ the leaf nodes must necessarily store all the key values along with their corresponding data pointers to
the disk file block, in order to access them.
◦ Moreover, the leaf nodes are linked to providing ordered access to the records.
◦ The leaf nodes, therefore form the first level of the index, with the internal nodes forming the other
levels of a multilevel index.
◦ Some of the key values of the leaf nodes also appear in the internal nodes, to simply act as a medium to
control the searching of a record.

CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 20
B tree
Space overhead for tree structure: ~2n nodes.
O(log n) time to find or update weight of a specific token.
O(n log n) time to construct vector.
Need software package to support such data structures.

CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 22
Inverted Index as HashTables
Store tokens in hashtable, with token string as key and weight as value.
◦ Storage overhead for hashtable ~1.5n.
◦ Table must fit in main memory.
◦ Constant time to find or update weight of a specific token.
◦ O(n) time to construct vector

CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 23
Hash index

CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 24
B trees VS Hash index
You can find the difference form last slides!

CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 25
Dynamic indexing
Thus far, we have assumed that the document collection is static.
But most collections are modified frequently ( added, deleted, and updated).
This means that new terms need to be added to the dictionary, and postings lists need to be updated
for existing terms.

The simplest way to achieve this is to periodically reconstruct the index from scratch.
This is a good solution if the number of changes over time is small and a delay in making new
documents searchable is acceptable
one solution is to maintain two indexes: a large main index and a small auxiliary index that
stores new documents.
The auxiliary index is kept in memory.
Searches are run across both indexes and results merged.
Deletions are stored in an invalidation bit vector.
We can then filter out deleted documents before returning the search result.
CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 26
CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 27
Distributed indexing
Collections are often so large that we cannot perform index construction efficiently on a single
machine.
This is particularly true of the World Wide Web for which we need large computer clusters to
construct any reasonably sized web index.
Web search engines, therefore, use distributed indexing algorithms for index construction.
The result of the construction process is a distributed index that is partitioned across several
machines - either according to term or according to document.

CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 28
Signature Files
Signature files are word-oriented index structures based on hashing
They pose a low overhead, at the cost of forcing a sequential search over the index
Since their search complexity is linear, it is suitable only for not very large texts
Nevertheless, inverted indexes outperform signature files for most applications

CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 29
Signature Files Structure
A signature divides the text in blocks of b words each, and maps words to bit masks of B bits
This mask is obtained by bit-wise ORing the signatures of all the words in the text block

CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 30
Signature Files Structure
If a word is present in a text block, then its signature is also set in the bit mask of the text block
Hence, if a query signature is not in the mask of the text block, then the word is not present in
the text block
However, it is possible that all the corresponding bits are set even though the word is not there
This is called a false drop

A delicate part of the design of a signature file is:

to ensure the probability of a false drop is low, and
to keep the signature file as short as possible Indexing

The hash function is forced to deliver bit masks which have at least # bits set
A good model assumes that # bits are randomly set in the mask (with possible repetition)

CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 31
Search within inverted index
Query processing
Procedures
◦ Perform the same processing procedures as on documents to the input query
◦ Tokenization->normalization->stemming->stopwords removal
Lookup query term in the dictionary
◦ Retrieve the posting lists
Operation
◦ AND: intersect the posting lists
◦ OR: union the posting list
◦ NOT: diff the posting list

CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 32
Single Word Queries:
The simplest type of search is that for the occurrences of a single word
The vocabulary search can be carried out using any suitable data structure
Ex: hashing, tries, or B-trees
We note that the vocabulary is in most cases sufficiently small so as to stay in
main memory
The occurrence lists, on the other hand, are usually fetched from disk

CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 33
Multiple Word Queries:
If the query has more than one word, we have to consider two cases:
conjunctive (AND operator) queries
disjunctive (OR operator) queries
Differentiate (NOT operator) queries
Conjunctive queries imply to search for all the words in the query, obtaining
one inverted list for each word
Following, we have to intersect all the inverted lists to obtain the documents
that contain all these words
For disjunctive queries the lists must be merged
The first case is popular in the Web due to the size of the document collection

CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 34
The Boolean Model
The user describes their information need using Boolean constraints (e.g., AND, OR, and AND
NOT)
• Unranked Boolean Retrieval Model: retrieves documents that satisfy the constraints in no
particular order
• Ranked Boolean Retrieval Model: retrieves documents that satisfy the constraints and ranks
them based on the number of ways they satisfy the constraints
• Also known as ‘exact-match’ retrieval models
• Advantages and disadvantages?

CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 35
The Boolean Model
Advantages:
‣ Easy for the system
‣ Users get transparency: it is easy to understand why a document was or was not retrieved
‣ Users get control: it easy to determine whether the query is too specific (few results) or too
broad (many results)
• Disadvantages:
‣ The burden is on the user to formulate a good Boolean query

CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 36
Query processing
Consider processing the query:
Brutus AND Caesar
• Locate Brutus in the Dictionary; Retrieve its postings.
• Locate Caesar in the Dictionary; Retrieve its postings.
• “Merge” the two postings:

If the list lengths are x and y, the merge takes O(x+y) operations.
postings sorted by docID.

CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 37
Intersecting Algorithm

CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 38
Exercise

CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 39
Phrase Queries
Phrase queries are more difficult to solve with inverted indexes
The lists of all elements must be traversed to find places where
all the words appear in sequence (for a phrase)
this algorithm is similar to a list intersection algorithm
Another solution for phrase queries is based on indexing two-word phrases
and using similar algorithms over pairs of words
however the index will be much larger as the number of word pairs is not
linear

CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 40
Positional indexes
for each term in the vocabulary, we store postings of the form
docID: <position1, position2, ...>,
where each position is a token index in the document.
Each posting will also usually record the term frequency

CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 41
Positional indexes
Postings lists in a positional index: each posting is a docID and a list of positions

CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 42
CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 43
Exercise

CS444 INFORMATION RETRIVAL & WEB SEARCH ENGIN BY ZAINAB AHMED MOHAMMED 44

Introduction To MS Access 2019 v3.2
100% (1)
Introduction To MS Access 2019 v3.2
125 pages
Chap5 Index Construction
No ratings yet
Chap5 Index Construction
38 pages
Indexing 1
No ratings yet
Indexing 1
61 pages
Lecture 5p1 - Index Construction & Compressing
No ratings yet
Lecture 5p1 - Index Construction & Compressing
42 pages
Lecture 4 - Index Construction - Compressing
No ratings yet
Lecture 4 - Index Construction - Compressing
90 pages
Lec6 InvretedIndex pt2
No ratings yet
Lec6 InvretedIndex pt2
38 pages
Information Retrieval - 2
No ratings yet
Information Retrieval - 2
24 pages
4.index Construction - New
No ratings yet
4.index Construction - New
46 pages
IR Unit III - Notes
No ratings yet
IR Unit III - Notes
18 pages
L05
No ratings yet
L05
33 pages
FOP Efficiency Indexing 13
No ratings yet
FOP Efficiency Indexing 13
22 pages
CHAP 4 Inverted Index
No ratings yet
CHAP 4 Inverted Index
21 pages
1726119671-4 Index Construction
No ratings yet
1726119671-4 Index Construction
19 pages
Building Fast Search Engines
No ratings yet
Building Fast Search Engines
21 pages
Learning Guide Unit 2
No ratings yet
Learning Guide Unit 2
15 pages
IR Ch23 Text Representation
No ratings yet
IR Ch23 Text Representation
36 pages
Lecture 2 Inverted Index PDF
No ratings yet
Lecture 2 Inverted Index PDF
24 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
49 pages
Ir Mod4 Notes
No ratings yet
Ir Mod4 Notes
19 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
50 pages
Index Construction
No ratings yet
Index Construction
37 pages
Mini Google
No ratings yet
Mini Google
34 pages
IRS Module 5
No ratings yet
IRS Module 5
24 pages
05 Index Construction
No ratings yet
05 Index Construction
47 pages
AI6122 Topic 3.1 - Index
No ratings yet
AI6122 Topic 3.1 - Index
40 pages
C3 IndexConstruction
No ratings yet
C3 IndexConstruction
46 pages
Completed UNIT-III 20.9.17
No ratings yet
Completed UNIT-III 20.9.17
61 pages
Ir Chapter Three
No ratings yet
Ir Chapter Three
41 pages
CS583 Info Retrieval
No ratings yet
CS583 Info Retrieval
33 pages
09 Indexes2
No ratings yet
09 Indexes2
5 pages
IR Journal
No ratings yet
IR Journal
36 pages
C10 IR M2021 IndexConstruction SimpleandDistributed
No ratings yet
C10 IR M2021 IndexConstruction SimpleandDistributed
42 pages
Course Name: Advanced Information Retrieval
No ratings yet
Course Name: Advanced Information Retrieval
6 pages
Chap 5
No ratings yet
Chap 5
64 pages
Text Mining
No ratings yet
Text Mining
23 pages
Introduction To Information Storage and Retrieval: Chapter Four: Indexing Structure
No ratings yet
Introduction To Information Storage and Retrieval: Chapter Four: Indexing Structure
34 pages
Chapter 3 Indexing
No ratings yet
Chapter 3 Indexing
48 pages
Lect 2 - Boolean Retrieval
No ratings yet
Lect 2 - Boolean Retrieval
38 pages
04 Index Construction
No ratings yet
04 Index Construction
48 pages
Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval
No ratings yet
Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval
33 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
54 pages
Chapter 2
No ratings yet
Chapter 2
23 pages
Lecture 3 Distributed and Dynamic Indexing
No ratings yet
Lecture 3 Distributed and Dynamic Indexing
13 pages
Lecture4-Indexconstruction Ch2 and Ch4
No ratings yet
Lecture4-Indexconstruction Ch2 and Ch4
49 pages
Chapter4 Indexconstruction
No ratings yet
Chapter4 Indexconstruction
49 pages
Module 5 - Indexing and Searching
No ratings yet
Module 5 - Indexing and Searching
15 pages
IRS Module5-I
No ratings yet
IRS Module5-I
15 pages
Web Search Engines: Rooted in Information Retrieval (IR) Systems
No ratings yet
Web Search Engines: Rooted in Information Retrieval (IR) Systems
48 pages
Dynamic Indexing
No ratings yet
Dynamic Indexing
53 pages
Elementary IR: Scalable Boolean Text Search: (Compare With R & G 27.1-3)
No ratings yet
Elementary IR: Scalable Boolean Text Search: (Compare With R & G 27.1-3)
22 pages
IR - ch5 - Vector Space Model
No ratings yet
IR - ch5 - Vector Space Model
23 pages
Chapter - 3 and 4
No ratings yet
Chapter - 3 and 4
47 pages
Chap 2
No ratings yet
Chap 2
29 pages
Module 1 Part BInformation Retrieval Webdocuments
No ratings yet
Module 1 Part BInformation Retrieval Webdocuments
49 pages
Lecture 4-Indexconstruction
No ratings yet
Lecture 4-Indexconstruction
45 pages
Lec4 IR
No ratings yet
Lec4 IR
53 pages
IR Chap3
No ratings yet
IR Chap3
45 pages
CS583 Info Retrieval
No ratings yet
CS583 Info Retrieval
34 pages
04const Flat
No ratings yet
04const Flat
54 pages
Learn MongoDB in 24 Hours
From Everand
Learn MongoDB in 24 Hours
Alex Nordeen
5/5 (2)
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
Op Amp Electronics Lab Report
No ratings yet
Op Amp Electronics Lab Report
4 pages
Chapter - 04 Oscillators
100% (1)
Chapter - 04 Oscillators
46 pages
ES Chapter3 6
No ratings yet
ES Chapter3 6
10 pages
IR - ch6 - Web Crawler
No ratings yet
IR - ch6 - Web Crawler
21 pages
Sap Abap Overview
No ratings yet
Sap Abap Overview
110 pages
Data Mining (Gtu Sem-6) 001
No ratings yet
Data Mining (Gtu Sem-6) 001
2 pages
Django For Beginners PDF
No ratings yet
Django For Beginners PDF
166 pages
SQL Practice 2020 PDF
No ratings yet
SQL Practice 2020 PDF
2 pages
Advanced Manufacturing Process Analysis (Course 4) - Key Takeaways
No ratings yet
Advanced Manufacturing Process Analysis (Course 4) - Key Takeaways
4 pages
Chapter 2 Modeling Data in The Organization
No ratings yet
Chapter 2 Modeling Data in The Organization
48 pages
Dbms Unit2
No ratings yet
Dbms Unit2
22 pages
pl-300 983fefa7c47d
No ratings yet
pl-300 983fefa7c47d
261 pages
WinKQCL 5 Endotoxin Detection Software
No ratings yet
WinKQCL 5 Endotoxin Detection Software
32 pages
Functional and Modular Programming: Computer Programming BFC 20802 Prepared by DR Goh Wan Inn
No ratings yet
Functional and Modular Programming: Computer Programming BFC 20802 Prepared by DR Goh Wan Inn
17 pages
Relational Notation
No ratings yet
Relational Notation
3 pages
Microsoft - DP 900.VMar 2024.by .Isac .111q
No ratings yet
Microsoft - DP 900.VMar 2024.by .Isac .111q
74 pages
Automate Change Capture For Siebel OLTP
No ratings yet
Automate Change Capture For Siebel OLTP
10 pages
2014 2015 Spring M275 Final
No ratings yet
2014 2015 Spring M275 Final
5 pages
SQL Server Tutorial NEW
No ratings yet
SQL Server Tutorial NEW
204 pages
Add A Calculated Field To A Table
No ratings yet
Add A Calculated Field To A Table
60 pages
Unit - I - 2
No ratings yet
Unit - I - 2
63 pages
The Strengths & Weaknesses of Face2Vec - FaceNet
No ratings yet
The Strengths & Weaknesses of Face2Vec - FaceNet
6 pages
Breadth First Search Animat Ion
No ratings yet
Breadth First Search Animat Ion
25 pages
ServiceNow Certified System Administrator Practice Exam 2019 Set 1
No ratings yet
ServiceNow Certified System Administrator Practice Exam 2019 Set 1
47 pages
Amazon EBS
No ratings yet
Amazon EBS
16 pages
تقرير حاسبات دورة حياة تطوير البرمجيات
No ratings yet
تقرير حاسبات دورة حياة تطوير البرمجيات
5 pages
Dba l-3 Final
No ratings yet
Dba l-3 Final
147 pages
MAD Unit-4
No ratings yet
MAD Unit-4
5 pages
Apex Interview Questions
No ratings yet
Apex Interview Questions
17 pages
Answers
No ratings yet
Answers
5 pages
VB Lab PDF
No ratings yet
VB Lab PDF
29 pages
BSC Project Guidlines2022 23
No ratings yet
BSC Project Guidlines2022 23
2 pages
Project 1 Part 2 Report
No ratings yet
Project 1 Part 2 Report
9 pages

IR ch4 - Inverted-Index

Uploaded by

IR ch4 - Inverted-Index

Uploaded by

CS444: Information Retrieval

and Web Search

Indexer Index Ranker results

Dictionary file Postings file

Then merge the blocks into one long sorted order.

A delicate part of the design of a signature file is:

You might also like