Chapter 4: Query Languages: Baeza-Yates, 1999 Modern Information Retrieval
Chapter 4: Query Languages: Baeza-Yates, 1999 Modern Information Retrieval
Baeza-Yates, 1999
Modern Information Retrieval
Outline
Keyword-Based Querying
Patten Matching
Structural Queries
Query Protocols
Trends and Research Issues
Keyword-Based Querying
A query is formulation of a user information need
Keyword-based queries are popular
Fuzzy Boolean
Retrieve documents appearing in some operands (The AND
may require it to appear in more operands than the OR)
Natural Language
Generalization of “fuzzy Boolean”
A query is an enumeration of words and context
queries
All the documents matching a portion of the user
query are retrieved
Pattern Matching
Data retrieval
A pattern is a set of syntactic features that must
occur in a text segment
Types
Words
Prefixes
e.q ‘comput’->’computer’ ,’computation’,’computing’,etc
Suffixes
e.q ‘ters’->’computers’,’testers’,’painters’,etc
Substrings
e.q ‘tal’->’coastal’,’talk’,’metallic’,etc
Ranges
between ‘held’ and ‘hold’->’hoax’ and ‘hissing’
Allowing errors
Retrieve all text words which all ‘similar’ to the
given word
edit distance:
the minimum number of character insertions,
deletions, and replacements needed to make two
strings equal, e.q , ‘flower’ and ‘flo wer’
maximum allowed edit distance:
query specifies the maximum number of allowed
errors for a word to match the pattern
Regular expressions
union: if e1 and e2 are regular expressions , then(e1|e2)
matches what e1 or e2 matches
concatenation: if e1 and e2 are regular expressions, the
occurrences of (e1e2) are formed by the occurrences of e1
immediately followed by those of e2
repetition: if e is a regular expression , then (e*)
matches a sequence of zero or more contiguous
occurrence of e
‘pro(blem|tein)(s|є)(0|1|2)*’->’problem2’ and
‘proteins’
Structural Queries
Mixing contents and structure in queries
- contents: words, phrases, or patterns
- structural constraints: containment, proximity,
or other restrictions on structural elements
Three main structures
- Fixed structure
- Hypertext structure
- Hierarchical structure
Fixed Structure
Document:a fixed set of fields
EX: a mail has a sender, a receiver, a date, a subject and a body field
Search for the mails sent to a given person with “football” in the
Subject field
Hypertext
A hypertext is a directed graph where nodes hold some
text (text contents)
the links represent connections between nodes or
between positions inside nodes (structural connectivity)
Hypertext : WebGlimpse
Z39.50 Repository
Server Digital library
Z39.50 Client
WAIS (Wide Area Information Service)