0% found this document useful (0 votes)

10 views18 pages

lecture7b-efficient-scoring

l;,.

Uploaded by

Farah Jahangir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views18 pages

lecture7b-efficient-scoring

l;,.

Uploaded by

Farah Jahangir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 18

Introduction to Information Retrieval

Information Retrieval and

Data Mining
Dr. Abdul Majid , DCIS

Lecture 7b: Efficient scoring

Introduction to Information Retrieval

Today’s focus
 Retrieval – get docs matching query from inverted
index
 Scoring+ranking
 Assign a score to each doc
 Pick K highest scoring docs
 Our emphasis today will be on doing this efficiently,
rather than on the quality of the ranking

2
Introduction to Information Retrieval

Background
 Score computation is a large (10s of %) fraction of
the CPU work on a query
 Generally, we have a tight budget on latency (say, 250ms)
 CPU provisioning doesn’t permit exhaustively scoring every
document on every query
 Today we’ll look at ways of cutting CPU usage for
scoring, without compromising the quality of results
(much)
 Basic idea: avoid scoring docs that won’t make it into
the top K
3
Introduction to Information Retrieval Ch. 6

Recap: Queries as vectors

 Vector space scoring
 We have a weight for each term in each doc
 Represent queries as vectors in the space
 Rank documents according to their cosine similarity to the
query in this space
 Or something more complex: BM25, proximity, …
 Vector space scoring is
 Entirely query dependent
 Additive on term contributions – no conditionals etc.
 Context insensitive (no interactions between query terms)
Introduction to Information Retrieval

TAAT vs DAAT techniques

 TAAT = “Term At A Time”
 Scores for all docs computed concurrently, one query term
at a time
 DAAT = “Document At A Time”
 Total score for each doc (incl all query terms) computed,
before proceeding to the next
 Each has implications for how the retrieval index is
structured and stored

5
Introduction to Information Retrieval Sec. 7.1

Efficient cosine ranking

 Find the K docs in the collection “nearest” to the
query  K largest query-doc cosines.
 Efficient ranking:
 Choosing the K largest cosine values efficiently.
 Can we do this without computing all N cosines?
Introduction to Information Retrieval

Safe vs non-safe ranking

 The terminology “safe ranking” is used for methods
that guarantee that the K docs returned are the K
absolute highest scoring documents
 (Not necessarily just under cosine similarity)
 Is it ok to be non-safe?
 If it is – then how do we ensure we don’t get too far
from the safe solution?
 How do we measure if we are far?

7
Introduction to Information Retrieval

Non-safe ranking
 Non-safe ranking may be okay
 Ranking function is only a proxy for user happiness
 Documents close to top K may be just fine
 Index elimination
 Only consider high-idf query terms
 Only consider docs containing many query terms
 Champion lists
 High/low lists, tiered indexes
 Order postings by g(d) (query-indep. quality score)

8
Introduction to Information Retrieval

Safe ranking
 When we output the top K docs, we have a proof
that these are indeed the top K
 Does this imply we always have to compute all N
cosines?
 We’ll look at pruning methods
 So we only fully score some J documents
 Do we have to sort the J cosine scores?

9
Introduction to Information Retrieval Sec. 7.1

Computing the K largest cosines:

selection vs. sorting
 Typically we want to retrieve the top K docs (in the
cosine ranking for the query)
 not to totally order all docs in the collection
 Can we pick off docs with K highest cosines?
 Let J = number of docs with nonzero cosines
 We seek the K best of these J
Introduction to Information Retrieval Sec. 7.1

Use heap for selecting top K

 Binary tree in which each node’s value > the values of
children
 Takes 2J operations to construct, then each of K
“winners” read off in O(log J) steps.
 For J=1M, K=100, this is about 10% of the cost of
sorting.
10
.9 .3

.3 .8 .1 .2

.1
Introduction to Information Retrieval

WAND scoring
 An instance of DAAT scoring
 Basic idea reminiscent (Serving to bring to mind) of
branch and bound
 We maintain a running threshold score – e.g., the Kth
highest score computed so far
 We prune away all docs whose cosine scores are
guaranteed to be below the threshold
 We compute exact cosine scores for only the un-pruned
docs

Broder et al. Efﬁcient Query Evaluation using a Two-Level Retrieval

Process.
12
Introduction to Information Retrieval

Index structure for WAND

 Postings ordered by docID
 Assume a special iterator on the postings of the form
“go to the first docID greater than or equal to X”
 Typical state: we have a “finger” at some docID in the
postings of each query term
 Each finger moves only to the right, to larger docIDs
 Invariant – all docIDs lower than any finger have
already been processed, meaning
 These docIDs are either pruned away or
 Their cosine scores have been computed
13
Introduction to Information Retrieval

Upper bounds
 At all times for each query term t, we maintain an
upper bound UBt on the score contribution of any doc
to the right of the finger
 Max (over docs remaining in t’s postings) of wt(doc)

finger

t 3 7 11 17 29 38 57 79 UBt = wt(38)

As finger moves right, UB drops

14
Introduction to Information Retrieval

Pivoting
 Query: catcher in the rye
 Let’s say the current finger positions are as below
Threshold = 6.8

catcher 273 UBcatcher =

2.3

304 UBrye = 1.8

rye

in 589 UBin = 3.3

the 762 UBthe = 4.3

Pivot
15
Introduction to Information Retrieval

Prune docs that have no hope

 Terms sorted in order of finger positions
 Move fingers to 589 or right
Threshold = 6.8

catcher 273 Hopeless docs UBcatcher =

2.3

304 UBrye = 1.8

rye Hopeless docs

in 589 UBin = 3.3

the 762 UBthe = 4.3

Update UB’s
Pivot
16
Introduction to Information Retrieval

Compute 589’s score if need be

 If 589 is present in enough postings, compute its full
cosine score – else some fingers to right of 589
 Pivot again …

catcher 589

rye 589

in 589

the 762
17
Introduction to Information Retrieval

WAND summary
 In tests, WAND leads to a 90+% reduction in score
computation
 Better gains on longer queries
 Nothing we did was specific to cosine ranking
 We need scoring to be additive by term
 WAND and variants give us safe ranking
 Possible to devise “careless” variants that are a bit faster
but not safe (see summary in Ding+Suel 2011)
 Ideas combine some of the non-safe scoring we
considered

2nd Ed 1 To 3 Exercise Answer and Qutsion Compilers Principles, Techniques, & Tools (Purple Dragon Book) Second Edition Exercise Answers
No ratings yet
2nd Ed 1 To 3 Exercise Answer and Qutsion Compilers Principles, Techniques, & Tools (Purple Dragon Book) Second Edition Exercise Answers
59 pages
Lecture10 Efficient Scoring
No ratings yet
Lecture10 Efficient Scoring
19 pages
Information Retrieval: Introduction To
No ratings yet
Information Retrieval: Introduction To
48 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
48 pages
lecture7a-vectorspace Computing Scores
No ratings yet
lecture7a-vectorspace Computing Scores
43 pages
lecture12-efficient-scoring.pptx
No ratings yet
lecture12-efficient-scoring.pptx
52 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
52 pages
IRCh 7 Slides
No ratings yet
IRCh 7 Slides
52 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
40 pages
4_Lec_2025
No ratings yet
4_Lec_2025
57 pages
lecture6-tfidf Vector Space Model (2)
No ratings yet
lecture6-tfidf Vector Space Model (2)
45 pages
Ranked Retrieval: Thus Far, Our Queries Have All Been Boolean
No ratings yet
Ranked Retrieval: Thus Far, Our Queries Have All Been Boolean
40 pages
IR_2 unit
No ratings yet
IR_2 unit
46 pages
Lecture8-Evaluation 2013
No ratings yet
Lecture8-Evaluation 2013
44 pages
6 Tfidf
No ratings yet
6 Tfidf
48 pages
Lecture 6 - Scoring, Term Weighting, Vector Space Model - Part 2
No ratings yet
Lecture 6 - Scoring, Term Weighting, Vector Space Model - Part 2
44 pages
Vector Space and IR Evaluation
No ratings yet
Vector Space and IR Evaluation
41 pages
Index Construction
No ratings yet
Index Construction
37 pages
Ip 8
No ratings yet
Ip 8
51 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
50 pages
Evaluation and Result Summaries
No ratings yet
Evaluation and Result Summaries
60 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
50 pages
C3 IndexConstruction
No ratings yet
C3 IndexConstruction
46 pages
Materi Pertemuan Ke-1-Dno 2018-1
No ratings yet
Materi Pertemuan Ke-1-Dno 2018-1
42 pages
14 Vcat
No ratings yet
14 Vcat
66 pages
Information Storage and Retrival
No ratings yet
Information Storage and Retrival
31 pages
Lecture4-Indexconstruction Ch2 and Ch4
No ratings yet
Lecture4-Indexconstruction Ch2 and Ch4
49 pages
Chapter4 Indexconstruction
No ratings yet
Chapter4 Indexconstruction
49 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
49 pages
An Overview of Information Retrieval Outline: A (Simple) Database Example Databases vs. IR
No ratings yet
An Overview of Information Retrieval Outline: A (Simple) Database Example Databases vs. IR
16 pages
Lecture 5-Dictionaries and Tolerant Retrieval
No ratings yet
Lecture 5-Dictionaries and Tolerant Retrieval
48 pages
TFIDF
No ratings yet
TFIDF
45 pages
week6
No ratings yet
week6
98 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
48 pages
03 Dictionaries
No ratings yet
03 Dictionaries
112 pages
03 Dictionaries
No ratings yet
03 Dictionaries
112 pages
Lecture 4-Indexconstruction
No ratings yet
Lecture 4-Indexconstruction
45 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
60 pages
AZ Lecture7-Queryexpansion
No ratings yet
AZ Lecture7-Queryexpansion
49 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
33 pages
Lect 13-Text Ranking
No ratings yet
Lect 13-Text Ranking
58 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
48 pages
Relevance Feedback
No ratings yet
Relevance Feedback
47 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
67 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
33 pages
IR Evaluation Tugas Kampus
No ratings yet
IR Evaluation Tugas Kampus
25 pages
lecture5-6
No ratings yet
lecture5-6
30 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
46 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
51 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
48 pages
Relevance Feedback: Improving Results
No ratings yet
Relevance Feedback: Improving Results
41 pages
Ir 1
No ratings yet
Ir 1
59 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
54 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
115 pages
Boolean Retrieval
No ratings yet
Boolean Retrieval
34 pages
Lecture 4-Dictionaries and Tolerant Retrieval
No ratings yet
Lecture 4-Dictionaries and Tolerant Retrieval
50 pages
Week 2 - Information Retrieval Basics
No ratings yet
Week 2 - Information Retrieval Basics
74 pages
Learning Guide Unit 5 _ Home
No ratings yet
Learning Guide Unit 5 _ Home
12 pages
Lecture3-Tolerant-retrieval Dictionaries and Tolerant Retrieval CH 3
No ratings yet
Lecture3-Tolerant-retrieval Dictionaries and Tolerant Retrieval CH 3
47 pages
Search Tree: Fundamentals and Applications
From Everand
Search Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Lec_08 SIP CIS-322 Im Enhan
No ratings yet
Lec_08 SIP CIS-322 Im Enhan
10 pages
Ethical Relativism-1
No ratings yet
Ethical Relativism-1
12 pages
Lecture - 06 (Shared Memory Programming With OpenMP)
No ratings yet
Lecture - 06 (Shared Memory Programming With OpenMP)
65 pages
PIC Based Security Alarm Project
100% (1)
PIC Based Security Alarm Project
11 pages
Mobile Application Test Summary Report
No ratings yet
Mobile Application Test Summary Report
1 page
Fully Actuated vs. Semi-Actuated Traffic Signal Systems
No ratings yet
Fully Actuated vs. Semi-Actuated Traffic Signal Systems
6 pages
Practicising Your Skills With Projects
No ratings yet
Practicising Your Skills With Projects
64 pages
SERVO MOTOR CODING WITH MANUAL
No ratings yet
SERVO MOTOR CODING WITH MANUAL
4 pages
Flash Analog Clock With Action Script
No ratings yet
Flash Analog Clock With Action Script
4 pages
XLPE User Guide
No ratings yet
XLPE User Guide
24 pages
ITP421 WEEK 2 - Motherboards
No ratings yet
ITP421 WEEK 2 - Motherboards
29 pages
ACE 101 Chapter-1-2
No ratings yet
ACE 101 Chapter-1-2
47 pages
Newton's Third Law: Examples of Interaction Force Pairs
No ratings yet
Newton's Third Law: Examples of Interaction Force Pairs
2 pages
Python Thin Aung Htwe Part2
No ratings yet
Python Thin Aung Htwe Part2
328 pages
RFGS Inorganic and Analytical Chemistry CourseGuide - Docx 1
No ratings yet
RFGS Inorganic and Analytical Chemistry CourseGuide - Docx 1
5 pages
E - Gizmo Kits & Modules
No ratings yet
E - Gizmo Kits & Modules
19 pages
modal-naturalism
No ratings yet
modal-naturalism
86 pages
2D_Control_Reversal_Lec_4
No ratings yet
2D_Control_Reversal_Lec_4
12 pages
Sankalp 022W - 1-3 - LOT-p1-PH-2-CPT-1-PTC
No ratings yet
Sankalp 022W - 1-3 - LOT-p1-PH-2-CPT-1-PTC
18 pages
Aastho Code of Soil PDF
No ratings yet
Aastho Code of Soil PDF
21 pages
04 StaisticalMethods - 1
No ratings yet
04 StaisticalMethods - 1
101 pages
Application of Neural Networks To Explore Manufact Ok
No ratings yet
Application of Neural Networks To Explore Manufact Ok
14 pages
Chemistry Merged Questions
No ratings yet
Chemistry Merged Questions
142 pages
Matlab Assignment - 1
No ratings yet
Matlab Assignment - 1
3 pages
PizzaOrder Assignment Working Coding
No ratings yet
PizzaOrder Assignment Working Coding
3 pages
On The Natural Frequencies and Mode Shapes of A Uniform Multi-Span Beam Carrying Multiple Point Masses
No ratings yet
On The Natural Frequencies and Mode Shapes of A Uniform Multi-Span Beam Carrying Multiple Point Masses
17 pages
Octopus-Eight Houses Horoscope
100% (1)
Octopus-Eight Houses Horoscope
6 pages
Siprec Ods
No ratings yet
Siprec Ods
10 pages
AntennaCalculator PDF
No ratings yet
AntennaCalculator PDF
8 pages
Spectro Maxx.
No ratings yet
Spectro Maxx.
30 pages
Download ebooks file Engineering Surveys for Industry Alojz Kopáčik all chapters
100% (3)
Download ebooks file Engineering Surveys for Industry Alojz Kopáčik all chapters
55 pages
Cls 6 SC HY 2020-21
No ratings yet
Cls 6 SC HY 2020-21
4 pages
Bond Pricing
No ratings yet
Bond Pricing
70 pages

lecture7b-efficient-scoring

Uploaded by

lecture7b-efficient-scoring

Uploaded by

Introduction to Information Retrieval

Information Retrieval and

Lecture 7b: Efficient scoring

Recap: Queries as vectors

TAAT vs DAAT techniques

Efficient cosine ranking

Safe vs non-safe ranking

Computing the K largest cosines:

Use heap for selecting top K

Broder et al. Efﬁcient Query Evaluation using a Two-Level Retrieval

Index structure for WAND

As finger moves right, UB drops

catcher 273 UBcatcher =

304 UBrye = 1.8

in 589 UBin = 3.3

the 762 UBthe = 4.3

Prune docs that have no hope

catcher 273 Hopeless docs UBcatcher =

304 UBrye = 1.8

in 589 UBin = 3.3

the 762 UBthe = 4.3

Compute 589’s score if need be

You might also like