Query Processing in DBMS
Query Processing in DBMS
Query Processing is the activity performed in extracting data from the database. In
query processing, it takes various steps for fetching the data from the database. The
2. Optimization
3. Evaluation
users queries get translated in high-level database languages such as SQL. It gets
translated into expressions that can be further used at the physical level of the file
system. After this, the actual evaluation of the queries and a variety of query -
best suitable choice for humans. But, it is not perfectly suitable for the internal
representation of the query to the system. Relational algebra is well suited for the
internal representation of a query. The translation process in query processing is
similar to the parser of a query. When a user executes any query, for generating the
internal form of the query, the parser in the system checks the syntax of the query,
verifies the name of the relation in the database, the tuple, and finally the required
attribute value. The parser creates a tree of the query, known as 'parse-tree.'
Further, translate it into the form of relational algebra. With this, it evenly replaces
methods of extracting the data from the database. In SQL, a user wants to fetch the
records of the employees whose salary is greater than or equal to 10000. For doing
Thus, to make the system understand the user query, it needs to be translated in the
form of relational algebra. We can bring this query in the relational algebra form
as:
After translating the given query, we can execute each relational algebra operation
by using different algorithms. So, in this way, a query processing begins its
working.
Evaluation
For this, with addition to the relational algebra translation, it is required to annotate
the translated relational algebra expression with the instructions used for
specifying and evaluating each operation. Thus, after translating the user query, the
evaluation plan.
o The annotations in the evaluation plan may refer to the algorithms to be used
Primitives. The evaluation primitives carry the instructions needed for the
used for evaluating a query. The query evaluation plan is also referred to
given query. It takes the query execution plan, executes it, and finally makes
o The cost of the query evaluation can vary for different types of queries.
Although the system is responsible for constructing the evaluation plan, the
which minimizes its cost. This type of task performed by the database
o For optimizing a query, the query optimizer should have an estimated cost
Finally, after selecting an evaluation plan, the system evaluates the query and
The selection operation is performed by the file scan. File scans are the search
algorithms that are used for locating and accessing the data. It is the lowest-level
operator used in query processing.
In RDBMS or relational database systems, the file scan reads a relation only if the
whole relation is stored in one file only. When the selection operation is performed
on a relation whose tuples are stored in one file, it uses the following algorithms:
o Linear Search: In a linear search, the system scans each record to test
whether satisfying the given selection condition. For accessing the first
block of a file, it needs an initial seek. If the blocks in the file are not stored
in contiguous order, then it needs some extra seeks. However, linear search
is the slowest algorithm used for searching, but it is applicable in all types of
cases. This algorithm does not care about the nature of selection, availability
of indices, or the file sequence. But other algorithms are not applicable in all
types of cases.
The index-based search algorithms are known as Index scans. Such index
structures are known as access paths. These paths allow locating and accessing the
data in the file. There are following algorithms that use the index in query
processing:
o Primary index, equality on a key: We use the index to retrieve a single
record that satisfies the equality condition for making the selection. The
equality comparison is performed on the key attribute carrying a primary
key.
o Primary index, equality on nonkey: The difference between equality on
key and nonkey is that in this, we can fetch multiple records. We can fetch
multiple records through a primary key when the selection criteria specify
the equality comparison on a nonkey.
o Secondary index, equality on key or nonkey: The selection that specifies
an equality condition can use the secondary index. Using secondary index
strategy, we can either retrieve a single record when equality is on key or
multiple records when the equality condition is on nonkey. When retrieving
a single record, the time cost is equal to the primary index. In the case of
multiple records, they may reside on different blocks. This results in one I/O
operation per fetched record, and each I/O operation requires a seek and a
block transfer.
σ θ1ꓥθ2ꓥ…ꓥθn (r)
A conjunction is the intersection of all records that satisfies the above selection
condition.
Disjunction: A disjunctive selection is the selection having the form as:
σ θ1ꓥθ2ꓥ…ꓥθn (r)
A disjunction is the union of all records that satisfies the given selection
condition θi.
Negation: The result of a selection σ¬θ(r) is the set of tuples of given relation r
where the selection condition evaluates to false. But nulls are not present, and this
set is only the set of tuples in relation r that are not in σθ(r).
The Hash Join algorithm is used to perform the natural join or equi join operations.
The concept behind the Hash join algorithm is to partition the tuples of each given
relation into sets. The partition is done on the basis of the same hash value on the
join attributes. The hash function provides the hash value. The main goal of using
the hash function in the algorithm is to reduce the number of comparisons and
increase the efficiency to complete the join operation on the relations.
For example, suppose there are two tuples a and b where both of them satisfy the
join condition. It means they have the same value for the join attributes. Suppose
that both a and b tuples consist of a hash value as i. It implies that tuple a should be
in ai, and tuple b should be in bi. Thus, we only compare a tuples in ai with b tuples
of bi. We do not need to compare the b tuples in any other partition. Therefore, in
this way, the hash join operation works.
Recursive Partitioning in Hash Join
Recursive partitioning is the one in which the system repeats the partitioning of the
input until each partition of the build input fits into the memory. The recursive
partitioning is needed when the value of nh is greater than or equal to the number of
memory blocks. It becomes difficult to split the relation in one pass since there can
be insufficient buffer blocks. So, it's better to split the relation in repeated passes.
In one pass, we can split the input as several partitions because there are sufficient
blocks available to be used as output buffers. Each bucket build by the pass is read
separately and further partitioned in the next pass so as to create smaller partitions.
Also, the hash functions are different in different passes. So, it is better to use
recursive partitioning for handling such cases.
The overflow condition in hash-table occurs in any partition i of the build relation s
due to the following cases:
Case 1: When the hash index on si is greater than the main memory, the overflow
condition occurs.
Case 2: When there are multiple tuples in the build relation with the same values
for the join attributes.
Case 3: When the hash function does not hold randomness and uniformity
characteristics.
Case 4: When some of the partitions have more tuples than the average and others
have fewer tuples, then such type of partitioning is known as skewed.
Handling the Overflows
We can handle a small amount of skew by increasing the number of partitions with
the use of the fudge factor. The fudge factor is a small value that increases the
number of partitions. So, it will help to reduce the expected size of each partition,
including their hash index less than the memory size. Unfortunately, the use of a
fudge factor makes the user conservative on the size of the partitions. Thus, the
chances of overflow are still possible. However, the use of the fudge factor is
suitable for handling small overflows, but it is not sufficient for handling large
overflows in the hash-table.
1. Overflow Resolution
The overflow resolution method is applied during the build phase when a hash
index overflow is detected. The overflow resolution works in the following way:
It finds si for any partition i if having size larger than the memory size. It again
partitions such build relation si into smaller partitions through a different hash
function. Similarly, it partitions the probe relation ri through the new hash function,
and only those tuples are joined, which are having matching partitions. But, it is a
less careful approach because this method waits for such conditions to occur, and
then take the necessary actions to resolve the problem.
2. Overflow Avoidance
The overflow avoidance method uses a careful approach while partitioning in order
to avoid the occurrence of overflow in the build phase. The overflow avoidance
works in the following way:
It initially partitions the build relation s into several small partitions and then
combines some of the partitions. These partitions are combined in such a way that
each combined partition fits in the memory. Similarly, it partitions the probe
relation r as the combined partitions on s. But, the size of r i does not matter in this
method.
Both overflow resolution and overflow avoidance methods may fail on some
partitions if a large number of tuples in s have the same value for the join
attributes. In such a case, it is better to use block nested-loop join rather than
applying the hash join technique for completing the join operation on those
partitions.
For analyzing the cost of a hash join, we consider that no overflow occurs in the
hash join. We will consider only two cases where:
We need to read and write relations r and s completely for partitioning them. For
this, a total of 2(b r + b s ) block transfers are required. The term b r and b s are the
number of blocks holding records of relations r and s. Both relations read each
partition once for more br + bs blocks transfers. However, the partitions might have
occupied slightly more number of blocks than br + bs, which results in partially
filled blocks. To access such partially filled blocks can include the overhead of
2nh approximately for each relation. Thus, a hash join cost estimates need:
Here, we can neglect the overhead value of 4nh since it is much smaller than br +
bs value.
Here, we have assumed that each input buffers are allocated with bb blocks, and the
build, as well as probe phase, needs only one seek for each n h partition of the
relation, as we can read each partition sequentially.
In this case, each pass reduces the size of each partition by M-1 expected factor,
and also passes are repeated until it makes the size of each partition as M blocks at
most. Therefore, for partitioning the relation s, we need:
The number of passes required in the partitioning of the build and probe relations
is the same. As in each pass, each block of s is read and written out and needs a
total of 2bsΓlogM-1(bs) - 1ꓶ block transfers for splitting relation s. Thus, a hash join
cost estimates need:
As a result, the hash join algorithm can be further improved if the size of the main
memory increases or is large.
It is a type of hash join that is useful for performing the join operations in which
the memory size is relatively large. But still, the build relation does not fit in the
memory completely. So, the hybrid hash join algorithm resolves the drawback of
the hash join algorithm.
The merge joins are used for performing natural joins and equi-joins for given
relations r and s. We use an algorithm for performing the merge join, known as the
Merge Join algorithm. It is also known as a sort-merge-join algorithm.
Nested-Loop Join Algorithm
In our previous section, we learned about joins and various types of joins. In this
section, we will know about the nested loop join algorithm.
A nested loop join is a join that contains a pair of nested for loops. To perform the
nested loop join i.e., θ on two relations r and s, we use an algorithm known as
the Nested loop join algorithm. The computation takes place as:
r⋈ θ s
o The nested-loop join does not need any indexing similar to a linear file scan
for accessing the data.
o Nested-loop join does not care about the given join condition. It is suitable
for each given join condition.
o The nested-loop join algorithm is expensive in nature. It is because it
computes and examines each pair of tuples in the given two relations.
Block Nested-Loop Join is a variant of nested-loop join in which each block of the
inner relation is paired with each block of the outer relation. The block nested-loop
join saves major block access in a situation where the buffer size is small enough
to hold the entire relation into the memory. It does so by processing the relations
on the basis of per block rather on the basis of per tuple. Within each pair of
blocks, the block nested-loop join pairs each tuple of one block with each tuple in
the other block to produce all pairs of tuples. It pairs only those tuples that satisfy
the given join condition and them to the result.
Cost Estimation
Here, the overall cost of the algorithm is composed by adding the cost of individual
index scans and cost of fetching the records in the intersection of the retrieved lists
of pointers. We can minimize the cost by sorting the list of pointers and fetching
the sorted records. So, we found the following two points for cost estimation:
o We can fetch all selected records of the block using a single I/O operation
because each pointer in the block appears together.
o The disk-arm movement gets minimized as blocks are read in sorted order.
To estimate the cost of a query evaluation plan, we use the number of blocks
transferred from the disk, and the number of disks seeks. Suppose the disk has an
average block access time of ts seconds and takes an average of tT seconds to
transfer x data blocks. The block access time is the sum of disk seeks time and
rotational latency. It performs S seeks than the time taken will be b*tT +
S*tS seconds. If tT=0.1 ms, tS =4 ms, the block size is 4 KB, and its transfer rate is
40 MB per second. With this, we can easily calculate the estimated cost of the
given query evaluation plan.
Generally, for estimating the cost, we consider the worst case that could happen.
The users assume that initially, the data is read from the disk only. But there must
be a chance that the information is already present in the main memory. However,
the users usually ignore this effect, and due to this, the actual cost of execution
comes out less than the estimated value.
The response time, i.e., the time required to execute the plan, could be used for
estimating the cost of the query evaluation plan. But due to the following reasons,
it becomes difficult to calculate the response time without actually executing the
query evaluation plan:
o When the query begins its execution, the response time becomes dependent
on the contents stored in the buffer. But this information is difficult to
retrieve when the query is in optimized mode, or it is not available also.
o When a system with multiple disks is present, the response time depends on
an interrogation that in "what way accesses are distributed among the
disks?". It is difficult to estimate without having detailed knowledge of the
data layout present over the disk.
o Consequently, instead of minimizing the response time for any query
evaluation plan, the optimizers finds it better to reduce the total resource
consumption of the query plan. Thus to estimate the cost of a query
evaluation plan, it is good to minimize the resources used for accessing the
disk or use of the extra resources.
This is based on the cost of the query. The query can use different paths based on
indexes, constraints, sorting methods etc. This method mainly uses the statistics
like record size, number of records, number of records per block, number of
blocks, table size, whether whole table fits in a block, organization of tables,
uniqueness of column values, size of columns etc.
T1 ∞ T2 ∞ T3 ∞ T4∞ T5 ∞ T6
For above query we can have any order of evaluation. We can start taking any two
tables in any order and start evaluating the query. Ideally, we can have join
combinations in (2(n-1))! / (n-1)! ways.
For example, suppose we have 5 tables involved in join, then we can have 8! / 4!
= 1680 combinations. But when query optimizer runs, it does not evaluate in all
these ways always. It uses Dynamic Programming where it generates the costs for
join orders of any combination of tables. It is calculated and generated only once.
This least cost for all the table combination is then stored in the database and is
used for future use. i.e.; say we have a set of tables, T = { T1 , T2 , T3 .. Tn}, then
it generates least cost combination for all the tables and stores it.
Dynamic Programming: As we learnt above, the least cost for the joins of
any combination of table is generated here. These values are stored in the
database and when those tables are used in the query, this combination is
selected for evaluating the query.
While generating the cost, it follows below steps: Suppose we have set of tables, T
= {T1 , T2 , T3 .. Tn}, in a DB. It picks the first table, and computes cost for
joining with rest of the tables in set T. It calculates cost for each of the tables and
then chooses the best cost. It continues doing the same with rest of the tables in set
T. It will generate 2n – 1 cases and it selects the lowest cost and stores it. When a
query uses those tables, it checks for the costs here and that combination is used to
evaluate the query. This is called dynamic programming.
In this method, time required to find optimized query is in the order of 3n, where n
is the number of tables. Suppose we have 5 tables, then time required in 35 = 243,
which is lesser than finding all the combination of tables and then deciding the best
combination (1680). Also, the space required for computing and storing the cost is
also less and is in the order of 2n. In above example, it is 25 = 32.
Left Deep Trees: This is another method of determining the cost of the
joins. Here, the tables and joins are represented in the form of trees. The
joins always form the root of the tree and table is kept at the right side of the
root. LHS of the root always point to the next join. Hence it gets deeper and
deeper on LHS. Hence it is called as left deep tree.
Here instead of calculating the best join cost for set of tables, best join cost for
joining with each table is calculated. In this method, time required to find
optimized query is in the order of n2n, where n is the number of tables. Suppose
we have 5 tables, then time required in 5*25 =160, which is lesser than dynamic
programming. Also, the space required for computing storing the cost is also less
and is in the order of 2n. In above example, it is 25 = 32, same as dynamic
programming.
Suppose we have a query to retrieve the students with age 18 and studying in class
DESIGN_01. We can get all the student details from STUDENT table, and class
details from CLASS table. We can write this query in two different ways.
Here both the queries will return same result. But when we observe them closely
we can see that first query will join the two tables first and then applies the filters.
That means, it traverses whole table to join, hence the number of records involved
is more. But he second query, applies the filters on each table first. This reduces
the number of records on each table (in class table, the number of record reduces to
one in this case!). Then it joins these intermediary tables. Hence the cost in this
case is comparatively less.
Instead of writing query the optimizer creates relational algebra and tree for above
case.
Perform all the projection as early as possible in the query. This is similar to
selection but will reduce the number of columns in the query.