0% found this document useful (0 votes)
35 views6 pages

4.6 Algorithms For Select and Join Operations

Uploaded by

Medha Harini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views6 pages

4.6 Algorithms For Select and Join Operations

Uploaded by

Medha Harini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

II IV

CS8492
DATABASE MANAGEMENT SYSTEMS
(Common to CSE & IT)

UNIT NO. 4

4.6 QUERY PROCESSING OVERVIEW,


ALGORITHMS FOR SELECT AND JOIN
OPERATIONS

Version: 1.0
CS8492

DATABASE MANAGEMENT SYSTEMS


(Common to CSE & IT)

Algorithms for SELECT and JOIN Operations

Implementing the SELECT Operation

There are many algorithms for executing a SELECT operation, which is basically a search
operation to locate the records in a disk file that satisfy a certain condition. Some of the search
algorithms depend on the file having specific access paths, and they may apply only to certain types
of selection conditions. We discuss some of the algorithms for implementing SELECT in this
section. We will use the following operations, specified on the relational database in Figure 3.5, to
illustrate our dis-cussion:

OP1: σSsn = ‘123456789’ (EMPLOYEE)


OP2: σDnumber > 5 (DEPARTMENT)
OP3: σDno = 5 (EMPLOYEE)
OP4: σDno = 5 AND Salary > 30000 AND Sex = ‘F’ (EMPLOYEE)
OP5: σEssn=‘123456789’ AND Pno =10(WORKS_ON)

Search Methods for Simple Selection. A number of search algorithms are pos-sible for
selecting records from a file. These are also known as file scans, because they scan the records of a
file to search for and retrieve records that satisfy a selec-tion condition.4 If the search algorithm
involves the use of an index, the index search is called an index scan. The following search methods
(S1 through S6) are examples of some of the search algorithms that can be used to implement a
select operation:

S1—Linear search (brute force algorithm). Retrieve every record in the file, and test whether
its attribute values satisfy the selection condition. Since the records are grouped into disk blocks,
each disk block is read into a main memory buffer, and then a search through the records within the
disk block is conducted in main memory.
S2—Binary search. If the selection condition involves an equality compari-son on a key
attribute on which the file is ordered, binary search—which is

more efficient than linear search—can be used. An example is OP1 if Ssn is the ordering attribute for
the EMPLOYEE file.5
S3a—Using a primary index. If the selection condition involves an equality comparison on a
key attribute with a primary index—for example, Ssn = ‘123456789’ in OP1—use the primary
index to retrieve the record. Note that this condition retrieves a single record (at most).

S3b—Using a hash key. If the selection condition involves an equality com-parison on a key
attribute with a hash key—for example, Ssn = ‘123456789’ in OP1—use the hash key to retrieve
the record. Note that this condition retrieves a single record (at most).

S4—Using a primary index to retrieve multiple records. If the comparison condition is >,
>=, <, or <= on a key field with a primary index—for exam-ple, Dnumber > 5 in OP2—use the index
CS8492

DATABASE MANAGEMENT SYSTEMS


(Common to CSE & IT)

to find the record satisfying the cor-responding equality condition (Dnumber = 5), then retrieve all
subsequent records in the (ordered) file. For the condition Dnumber < 5, retrieve all the preceding
records.

S5—Using a clustering index to retrieve multiple records. If the selection condition involves
an equality comparison on a nonkey attribute with a clustering index—for example, Dno = 5 in
OP3—use the index to retrieve all the records satisfying the condition.

S6—Using a secondary (B+-tree) index on an equality comparison. This search method can
be used to retrieve a single record if the indexing field is a key (has unique values) or to retrieve
multiple records if the indexing field is not a key. This can also be used for comparisons involving
>, >=, <, or <=.

In Section 19.8, we discuss how to develop formulas that estimate the access cost of these
search methods in terms of the number of block accesses and access time. Method S1 (linear
search) applies to any file, but all the other methods depend on having the appropriate access path on
the attribute used in the selection condition. Method S2 (binary search) requires the file to be sorted
on the search attribute. The methods that use an index (S3a, S4, S5, and S6) are generally referred to
as index searches, and they require the appropriate index to exist on the search attribute. Methods S4
and S6 can be used to retrieve records in a certain range—for example, 30000 <= Salary <= 35000.
Queries involving such conditions are called range queries.

Search Methods for Complex Selection. If a condition of a SELECT operation is a conjunctive


condition—that is, if it is made up of several simple conditions connected with the AND logical
connective such as OP4 above—the DBMS can use the following additional methods to implement
the operation:

S7—Conjunctive selection using an individual index. If an attribute involved in any single


simple condition in the conjunctive select condition has an access path that permits the use of one of
the methods S2 to S6, use that condition to retrieve the records and then check whether each
retrieved record satisfies the remaining simple conditions in the conjunctive select condition.

S8—Conjunctive selection using a composite index. If two or more attrib-utes are involved in
equality conditions in the conjunctive select condition and a composite index (or hash structure)
exists on the combined fields— for example, if an index has been created on the composite key
(Essn, Pno) of the WORKS_ON file for OP5—we can use the index directly.

S9—Conjunctive selection by intersection of record pointers.6 If second-ary indexes (or


other access paths) are available on more than one of the fields involved in simple conditions in the
conjunctive select condition, and if the indexes include record pointers (rather than block pointers),
then each index can be used to retrieve the set of record pointers that satisfy the indi-vidual
condition. The intersection of these sets of record pointers gives the record pointers that satisfy the
conjunctive select condition, which are then used to retrieve those records directly. If only some of
the conditions have secondary indexes, each retrieved record is further tested to determine whether it
CS8492

DATABASE MANAGEMENT SYSTEMS


(Common to CSE & IT)

satisfies the remaining conditions.7 In general, method S9 assumes that each of the indexes is on a
nonkey field of the file, because if one of the conditions is an equality condition on a key field, only
one record will satisfy the whole condition.

Whenever a single condition specifies the selection—such as OP1, OP2, or OP3— the
DBMS can only check whether or not an access path exists on the attribute involved in that
condition. If an access path (such as index or hash key or sorted file) exists, the method
corresponding to that access path is used; otherwise, the brute force, linear search approach of
method S1 can be used. Query optimization for a SELECT operation is needed mostly for
conjunctive select conditions whenever more than one of the attributes involved in the conditions
have an access path. The optimizer should choose the access path that retrieves the fewest records in
the most efficient way by estimating the different costs (see Section 19.8) and choosing the method
with the least estimated cost.

Selectivity of a Condition. When the optimizer is choosing between multiple simple


conditions in a conjunctive select condition, it typically considers the selectivity of each condition.
The selectivity (sl) is defined as the ratio of the num-ber of records (tuples) that satisfy the condition
to the total number of records (tuples) in the file (relation), and thus is a number between zero and
one. Zero selec-tivity means none of the records in the file satisfies the selection condition, and a
selectivity of one means that all the records in the file satisfy the condition. In gen-eral, the
selectivity will not be either of these two extremes, but will be a fraction that estimates the
percentage of file records that will be retrieved.

Although exact selectivities of all conditions may not be available, estimates of selectivities
are often kept in the DBMS catalog and are used by the optimizer. For example, for an equality
condition on a key attribute of relation r(R), s = 1/|r(R)|, where |r(R)| is the number of tuples in
relation r(R). For an equality condition on a nonkey attribute with i distinct values, s can be
estimated by (|r(R)|/i)/|r(R)| or 1/i, assuming that the records are evenly or uniformly distributed
among the distinct values. Under this assumption, |r(R)|/i records will satisfy an equality condition
on this attribute. In general, the number of records satisfying a selection condition with selectivity sl
is estimated to be |r(R)| * sl. The smaller this estimate is, the higher the desirability of using that
condition first to retrieve records. In certain cases, the actual distribution of records among the
various distinct values of the attribute is kept by the DBMS in the form of a histogram, in order to
get more accurate esti-mates of the number of records that satisfy a particular condition.

Disjunctive Selection Conditions. Compared to a conjunctive selection condition, a


disjunctive condition (where simple conditions are connected by the OR logical connective rather
than by AND) is much harder to process and optimize. For example, consider OP4 :

OP4 : σDno=5 OR Salary > 30000 OR Sex=‘F’ (EMPLOYEE)

With such a condition, little optimization can be done, because the records satisfying the
disjunctive condition are the union of the records satisfying the individual conditions. Hence, if any
one of the conditions does not have an access path, we are compelled to use the brute force, linear
CS8492

DATABASE MANAGEMENT SYSTEMS


(Common to CSE & IT)

search approach. Only if an access path exists on every simple condition in the disjunction can we
optimize the selection by retrieving the records satisfying each condition—or their record ids—and
then applying the union operation to eliminate duplicates.

A DBMS will have available many of the methods discussed above, and typically many
additional methods. The query optimizer must choose the appropriate one for executing each
SELECT operation in a query. This optimization uses formulas that estimate the costs for each
available access method, as we will discuss in Section 19.8. The optimizer chooses the access
method with the lowest estimated cost.
CS8492

DATABASE MANAGEMENT SYSTEMS


(Common to CSE & IT)

You might also like