Unit - I - IR
Unit - I - IR
Crawls and Feeds- Deciding what to search, Crawling the Web, Crawling
Documents and Email, Document Feeds, The Conversion Problem, Storing the
Documents, Detecting Duplicates
Processing Text - From Words to Terms, Text Statistics, Document Parsing,
Document Structure and Markup, Link Analysis, Information Extraction,
Internationalization
Unit III – 08 hours :Ranking with Indexes
Overview, Abstract Model of Ranking, Inverted indexes, Compression, Auxiliary
Structures, Index Construction, Query Processing