Lecture5 6
Lecture5 6
Lecture 5-6
Indexes
• Indexes: are data structures designed to make
search faster
• Text search has unique requirements, which
leads to unique data structures
• Most common data structure is inverted index
– general name for a class of structures
– “inverted” because documents are associated
with words, rather than words with documents
• similar to a concordance
Indexes and Ranking
• Indexes are designed to support search
– faster response time, supports updates
• Text search engines use a particular form of
search: ranking
– documents are retrieved in sorted order according to
a score computing using the document
representation, the query, and a ranking algorithm
• What is a reasonable abstract model for ranking?
– enables discussion of indexes without details of
retrieval model
Abstract Model of Ranking
Inverted Index
• Each index term is associated with an inverted
list
– Contains lists of documents, or lists of word
occurrences in documents, and other information
– Each entry is called a posting
– The part of the posting that refers to a specific
document or location is called a pointer
– Each document in the collection is given a unique
number
– Lists are usually document-ordered (sorted by
document number)
Example “Collection”
Proximity Matches
• Matching phrases or words within a window
– e.g., "tropical fish", or “find tropical within
5 words of fish”
• Word positions in inverted lists make these
types of query features efficient
– e.g.,
Fields and Extents
• Document structure is useful in search
– field restrictions
• e.g., date, from:, etc.
– some fields more important
• e.g., title
• Options:
– separate inverted lists for each field type
– add information about fields to postings
– use extent lists
Extent Lists
• An extent is a contiguous region of a
document
– represent extents using word positions
– inverted list records all extents for a given field
type
– e.g.,
extent list
Other Issues
• Precomputed scores in inverted list
– e.g., list for “fish” [(1:3.6), (3:2.2)], where 3.6 is
total feature value for document 1
– improves speed but reduces flexibility
• Score-ordered lists
– query processing engine can focus only on the top
part of each inverted list, where the highest-
scoring documents are recorded
– very efficient for single-word queries
Compression
• Inverted lists are very large
– e.g., 25-50% of collection for TREC collections using
Indri search engine
– Much higher if n-grams are indexed
• Compression of indexes saves disk and/or memory
space
– Typically have to decompress lists to use them
– Best compression techniques have good compression
ratios and are easy to decompress
• Lossless compression – no information lost
Distributed Indexing
• Distributed processing driven by need to index
and analyze huge amounts of data (i.e., the
Web)
• Large numbers of inexpensive servers used
rather than larger, more expensive machines
• MapReduce is a distributed programming tool
designed for indexing and analysis tasks
Caching
• Query distributions similar to Zipf
– About ½ each day are unique, but some are very
popular
• Caching can significantly improve effectiveness
– Cache popular query results
– Cache common inverted lists
• Inverted list caching can help with unique
queries
• Cache must be refreshed to prevent stale data
Performance Evaluation
of Information Retrieval Systems
14
Why System Evaluation?
• There are many retrieval models/ algorithms/
systems, which one is the best?
• What is the best component for:
– Ranking function (dot-product, cosine, …)
– Term selection (stopword removal, stemming…)
– Term weighting
• How far down the ranked list will a user need
to look to find some/all relevant documents?
15
Difficulties in Evaluating IR Systems
• Effectiveness is related to the relevancy of retrieved
items.
• Relevancy is not typically binary but continuous.
• Even if relevancy is binary, it can be a difficult
judgment to make.
• Relevancy, from a human standpoint, is:
– Subjective: Depends upon a specific user’s judgment.
– Situational: Relates to user’s current needs.
– Cognitive: Depends on human perception and behavior.
– Dynamic: Changes over time.
16
Precision and Recall
relevant irrelevant
Entire document retrieved
Not retrieved
collection Relevant Retrieved &
documents documents & irrelevant
irrelevant
17
Precision and Recall
• Precision
– The ability to retrieve top-ranked documents that are mostly relevant.
• Recall
– The ability of the search to find all of the relevant items in the corpus.
18
Determining Recall is Difficult
• Total number of relevant items is sometimes
not available:
– Sample across the database and perform
relevance judgment on these items.
– Apply different retrieval algorithms to the same
database for the same query. The aggregate of
relevant items is taken as the total relevant set.
19
Trade-off between Recall and Precision
Returns relevant documents but
misses many useful ones too The ideal
1
Precision
0 1
Recall Returns most relevant
documents but includes
lots of junk
20
Computing Recall/Precision Points
• For a given query, produce the ranked list of
retrievals.
• Adjusting a threshold on this ranked list produces
different sets of retrieved documents, and therefore
different recall/precision measures.
• Mark each document in the ranked list that is
relevant according to the gold standard.
• Compute a recall/precision pair for each position in
the ranked list that contains a relevant document.
21
Computing Recall/Precision Points:
Example 1
n doc # relevant
Let total # of relevant docs = 6
1 588 x
Check each new recall point:
2 589 x
3 576
R=1/6=0.167; P=1/1=1
4 590 x
5 986
R=2/6=0.333; P=2/2=1
6 592 x
7 984 R=3/6=0.5; P=3/4=0.75
8 988
9 578 R=4/6=0.667; P=4/6=0.667
10 985
11 103 Missing one
12 591 relevant document.
Never reach
13 772 x R=5/6=0.833; p=5/13=0.38 100% recall
14 990
22
Computing Recall/Precision Points:
Example 2
n doc # relevant
Let total # of relevant docs = 6
1 588 x
Check each new recall point:
2 576
3 589 x
R=1/6=0.167; P=1/1=1
4 342
5 590 x
R=2/6=0.333; P=2/3=0.667
6 717
7 984
8 772 x
9 321 x
10 498
11 113
12 628
13 772
14 592 x R=6/6=1.0; p=6/14=0.429
23
Average Recall/Precision Curve
• Typically average performance over a large set
of queries.
• Compute average precision at each standard
recall level across all queries.
• Plot average precision/recall curves to
evaluate overall system performance on a
document/query corpus.
24
Compare Two or More Systems
• The curve closest to the upper right-hand
corner of the graph indicates the best
performance
1
0.8 NoStem Stem
Precision
0.6
0.4
0.2
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recall
25
F-Measure
• One measure of performance that takes into
account both recall and precision.
• Harmonic mean of recall and precision:
2 PR 2
F 1 1
P R RP
27
Mean Average Precision
(MAP)
• Average Precision: Average of the precision
values at the points at which each relevant
document is retrieved.
– Ex1: (1 + 1 + 0.75 + 0.667 + 0.38 + 0)/6 = 0.633
– Ex2: (1 + 0.667 + 0.6 + 0.5 + 0.556 + 0.429)/6 = 0.625
28
Other Factors to Consider
• User effort: Work required from the user in
formulating queries, conducting the search, and
screening the output.
• Response time: Time interval between receipt of a
user query and the presentation of system responses.
• Form of presentation: Influence of search output
format on the user’s ability to utilize the retrieved
materials.
• Collection coverage: Extent to which any/all relevant
items are included in the document corpus.
29
Experimental Setup for Benchmarking
• Analytical performance evaluation is difficult for
document retrieval systems because many
characteristics such as relevance, distribution of
words, etc., are difficult to describe with
mathematical precision.
• Performance is measured by benchmarking. That is,
the retrieval effectiveness of a system is evaluated on
a given set of documents, queries, and relevance
judgments.
• Performance data is valid only for the environment
under which the system is evaluated.
30