Information Retrieval - 3
Information Retrieval - 3
UNIT – 3
Prepared by : SRIDHAR U
Outline
2
Distributed Indexing
Index Compression
Dictionary compression
Postings compression
3 Distributed indexing
Distributed indexing
4
Inverters
….systilesyzygeticsyzygialsyzygyszaibelyiteszczecinszomo….
Front-coding:
Sorted words commonly have long common prefix –
store differences only
(for last k-1 in a block of k)
8automata8automate9automatic10automation
8automat*a1e2ic3ion
Extra length
Encodes automat
beyond automat.
Technique Size in MB
Aim:
For arachnocentric, we will use ~20 bits/gap entry.
For the, we will use ~1 bit/gap entry.