Chap5 Query Processing

The document discusses the architecture and processes involved in information retrieval, particularly focusing on search engine indexing and query processing techniques. It outlines two main approaches for scoring documents: Document-at-a-Time and Term-at-a-Time, along with various optimization techniques to enhance performance. Additionally, it covers threshold methods for optimizing query processing by determining the minimum score required for documents to be displayed to users.

Uploaded by

hihifi1326

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views17 pages

Chap5 Query Processing

Uploaded by

hihifi1326

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Search Engines

Information Retrieval in Practice

All slides ©Addison Wesley, 2008

With changes by Crista Lopes
Simple Inverted
Index
Index Construction
• Simple in-memory indexer

List<Posting>()

It.append(Posting(n))

Write to a file
Architecture
Index Creation Index Ranking
Log

Querying Process
Preprocessing Steps

Text Transformation Evaluation

Local
Text Acquisition Document UI
Store

Web Pages
Query Processing
• Document-at-a-time
– Calculates complete scores for documents by
processing all term lists, one document at a time
• Term-at-a-time
– Accumulates scores for documents by processing
term lists one at a time
• Both approaches have optimization
techniques that significantly reduce time
required to generate scores
Document-At-A-Time
Pseudocode Function Descriptions
• getCurrentDocument()
– Returns the document number of the current posting of the inverted
list.
• skipForwardToDocument(d)
– Moves forward in the inverted list until getCurrentDocument() <= d.
This function may read to the end of the list.
• movePastDocument(d)
– Moves forward in the inverted list until getCurrentDocument() < d.
• moveToNextDocument()
– Moves to the next document in the list. Equivalent to
movePastDocument(getCurrentDocument()).
• getNextAccumulator(d)
– returns the first document number d' >= d that has already has an
accumulator.
• removeAccumulatorsBetween(a, b)
– Removes all accumulators for documents numbers between a and b.
Ad will be removed iff a < d < b.
Document-At-A-Time
Term-At-A-Time
Term-At-A-Time
Optimization Techniques
• Term-at-a-time uses more memory for
accumulators, but accesses disk more
efficiently
• Two classes of optimization
– Read less data from inverted lists
• e.g., skip lists
• better for simple feature functions
– Calculate scores for fewer documents
• e.g., conjunctive processing
• better for complex feature functions
Conjunctive
Term-at-a-Time
Conjunctive
Document-at-a-Time
Threshold Methods
• Threshold methods use number of top-ranked
documents needed (k) to optimize query
processing
– for most applications, k is small
• For any query, there is a minimum score that each
document needs to reach before it can be shown
to the user
– score of the kth-highest scoring document
– gives threshold τ
– optimization methods estimate τ′ to ignore
documents
Threshold Methods
• For document-at-a-time processing, use score
of lowest-ranked document so far for τ′
– for term-at-a-time, have to use kth-largest score in
the accumulator table
• MaxScore method compares the maximum
score that remaining documents could have to
τ′
– safe optimization in that ranking will be the same
without optimization
MaxScore Example

• Indexer computes μtree

– maximum score for any document containing just “tree”
• Assume k =3, τ′ is lowest score after first three docs
• Likely that τ ′ > μtree
– τ ′ is the score of a document that contains both query
terms
• Can safely skip over all gray postings
Other Approaches
• Early termination of query processing
– ignore high-frequency word lists in term-at-a-time
– ignore documents at end of lists in doc-at-a-time
– unsafe optimization
• List ordering
– order inverted lists by quality metric (e.g.,
PageRank) or by partial score
– makes unsafe (and fast) optimizations more likely
to produce good documents

Dms All Unit Imp Questions
No ratings yet
Dms All Unit Imp Questions
140 pages
Software Modelling and Design: Unit IIII
No ratings yet
Software Modelling and Design: Unit IIII
57 pages
Information Retrieval
No ratings yet
Information Retrieval
72 pages
C3 IndexConstruction
No ratings yet
C3 IndexConstruction
46 pages
Bash Shell from Zero to Hero: An SRE's Practical Guide to Terminal Skills, Scripting, and Automation
From Everand
Bash Shell from Zero to Hero: An SRE's Practical Guide to Terminal Skills, Scripting, and Automation
Nolan Reeves
No ratings yet
Lecture10 Efficient Scoring
No ratings yet
Lecture10 Efficient Scoring
19 pages
Chapter - 3 and 4
No ratings yet
Chapter - 3 and 4
47 pages
Unit - 3:: Explain Briefly About Automatic Indexing? Explain About Types of Classes Automatic Indexing?
No ratings yet
Unit - 3:: Explain Briefly About Automatic Indexing? Explain About Types of Classes Automatic Indexing?
28 pages
Week 6
No ratings yet
Week 6
98 pages
Certificate: T.Y.Bsc Cs
No ratings yet
Certificate: T.Y.Bsc Cs
120 pages
Irs Unit - 3
No ratings yet
Irs Unit - 3
68 pages
Information Retrieval Techniques
No ratings yet
Information Retrieval Techniques
63 pages
Dynamic Indexing
No ratings yet
Dynamic Indexing
53 pages
chapter2-MA212-Indexing & Preprocessing
No ratings yet
chapter2-MA212-Indexing & Preprocessing
68 pages
Week 2 - Information Retrieval Basics
No ratings yet
Week 2 - Information Retrieval Basics
74 pages
Indexing 1
No ratings yet
Indexing 1
61 pages
11 Multimedia Media IR
No ratings yet
11 Multimedia Media IR
19 pages
IR Unit III - Notes
No ratings yet
IR Unit III - Notes
18 pages
Lecture4-Indexconstruction Ch2 and Ch4
No ratings yet
Lecture4-Indexconstruction Ch2 and Ch4
49 pages
Ir 1
No ratings yet
Ir 1
14 pages
Ir Journal
No ratings yet
Ir Journal
41 pages
Module 6 Updated Final
No ratings yet
Module 6 Updated Final
48 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
50 pages
4.index Construction - New
No ratings yet
4.index Construction - New
46 pages
Informaiton Retrieval and Web Search
No ratings yet
Informaiton Retrieval and Web Search
44 pages
Ir Chapter Three
No ratings yet
Ir Chapter Three
41 pages
03 - Lect3 Search Engines-Part2
No ratings yet
03 - Lect3 Search Engines-Part2
32 pages
IR Journal
No ratings yet
IR Journal
36 pages
Chap5 Index Construction
No ratings yet
Chap5 Index Construction
38 pages
L05
No ratings yet
L05
33 pages
IR-Lec1 - Ch1-2023
No ratings yet
IR-Lec1 - Ch1-2023
41 pages
Dbms Bca 4th Sem Imp Q&A by BC
No ratings yet
Dbms Bca 4th Sem Imp Q&A by BC
32 pages
2 Introduction To Information Retrieval
No ratings yet
2 Introduction To Information Retrieval
38 pages
IR ch4 - Inverted-Index
No ratings yet
IR ch4 - Inverted-Index
44 pages
Lect 3 Inverted Index
No ratings yet
Lect 3 Inverted Index
24 pages
03lecture 3 - Biomedical IR-indexing
No ratings yet
03lecture 3 - Biomedical IR-indexing
27 pages
CS583 Info Retrieval
No ratings yet
CS583 Info Retrieval
33 pages
Lec6 InvretedIndex pt2
No ratings yet
Lec6 InvretedIndex pt2
38 pages
Document Indexing in Information Retrieval
No ratings yet
Document Indexing in Information Retrieval
19 pages
Big Book of Data Warehousing and Bi v9 122723 Final 0
No ratings yet
Big Book of Data Warehousing and Bi v9 122723 Final 0
88 pages
Unit1 Mot
No ratings yet
Unit1 Mot
22 pages
Ir Mod4 Notes
No ratings yet
Ir Mod4 Notes
19 pages
Resume Divya
No ratings yet
Resume Divya
1 page
FOP Efficiency Indexing 13
No ratings yet
FOP Efficiency Indexing 13
22 pages
Learning Guide Unit 2
No ratings yet
Learning Guide Unit 2
15 pages
Lecture7b Efficient Scoring
No ratings yet
Lecture7b Efficient Scoring
18 pages
All Unit 2 Mark
No ratings yet
All Unit 2 Mark
15 pages
CS583 Info Retrieval
No ratings yet
CS583 Info Retrieval
34 pages
22103071-Assignment - Ii
No ratings yet
22103071-Assignment - Ii
7 pages
Unit-5 Adt
No ratings yet
Unit-5 Adt
11 pages
IRS Unit-3
100% (2)
IRS Unit-3
28 pages
Module 1-1
No ratings yet
Module 1-1
12 pages
IR Cheatsheet Final
No ratings yet
IR Cheatsheet Final
3 pages
C1 Intro
No ratings yet
C1 Intro
10 pages
Lecture 2 Inverted Index PDF
No ratings yet
Lecture 2 Inverted Index PDF
24 pages
NLP See
No ratings yet
NLP See
9 pages
Ir End Pyq Sols
No ratings yet
Ir End Pyq Sols
8 pages
Ir
No ratings yet
Ir
4 pages
Lecture 4-Indexconstruction
No ratings yet
Lecture 4-Indexconstruction
45 pages
Information Retrievalpdf
No ratings yet
Information Retrievalpdf
7 pages
Index Construction
No ratings yet
Index Construction
37 pages
600 Computer Mcqs
No ratings yet
600 Computer Mcqs
23 pages
An Overview of Information Retrieval Outline: A (Simple) Database Example Databases vs. IR
No ratings yet
An Overview of Information Retrieval Outline: A (Simple) Database Example Databases vs. IR
16 pages
Text Mining
No ratings yet
Text Mining
23 pages
Presentation Topics For Distributed Computing
No ratings yet
Presentation Topics For Distributed Computing
9 pages
Information Retrieval
No ratings yet
Information Retrieval
5 pages
Btech All 4 Sem Database Management System Becs2208 2018
No ratings yet
Btech All 4 Sem Database Management System Becs2208 2018
2 pages
DP 900
No ratings yet
DP 900
59 pages
Chap 16 ANN
No ratings yet
Chap 16 ANN
42 pages
Module 3
No ratings yet
Module 3
53 pages
Student Electronic Voting Security Syste
No ratings yet
Student Electronic Voting Security Syste
3 pages
Swapnil - Arwandekar - DM - DA
No ratings yet
Swapnil - Arwandekar - DM - DA
5 pages
Assignment 2 752
No ratings yet
Assignment 2 752
1 page
CSC034 - Group Project - Aug - Dec2022
No ratings yet
CSC034 - Group Project - Aug - Dec2022
4 pages
Mubashir's Resume
No ratings yet
Mubashir's Resume
3 pages
Glossary Terms From Module 2
No ratings yet
Glossary Terms From Module 2
3 pages
Catarina Ferreira Da Silva: Ciência-IUL
No ratings yet
Catarina Ferreira Da Silva: Ciência-IUL
17 pages
Tamil CV
No ratings yet
Tamil CV
3 pages
Report
No ratings yet
Report
23 pages
A Blockchain-Based AI Framework For Efficient Healthcare Data Sharing in Smart Cities
No ratings yet
A Blockchain-Based AI Framework For Efficient Healthcare Data Sharing in Smart Cities
11 pages
Short Response Questions Class9
No ratings yet
Short Response Questions Class9
2 pages
2023 Aug - Mock - Exam
No ratings yet
2023 Aug - Mock - Exam
9 pages
Library Management System MLV
No ratings yet
Library Management System MLV
7 pages
A Document Co-Citation Analysis Method For Investigating Emerging Trends and New Developments - A Case of Twenty-Four Leading Business Journals
No ratings yet
A Document Co-Citation Analysis Method For Investigating Emerging Trends and New Developments - A Case of Twenty-Four Leading Business Journals
13 pages
Amrit Kumar BudhaMagar
No ratings yet
Amrit Kumar BudhaMagar
4 pages
Aarthi - K - Resume - 12 06 2023 15 37 13
No ratings yet
Aarthi - K - Resume - 12 06 2023 15 37 13
1 page
Masonresume 2024
No ratings yet
Masonresume 2024
2 pages
SF1 - 2020 - Grade 8 (Year II) - 8-NEON
No ratings yet
SF1 - 2020 - Grade 8 (Year II) - 8-NEON
3 pages
YunseokHwang Resume
No ratings yet
YunseokHwang Resume
1 page
Two Column CV Template With ModernCV 1
No ratings yet
Two Column CV Template With ModernCV 1
1 page

Chap5 Query Processing

Uploaded by

Chap5 Query Processing

Uploaded by

Search Engines

Information Retrieval in Practice

All slides ©Addison Wesley, 2008

Text Transformation Evaluation

• Indexer computes μtree

You might also like