0% found this document useful (0 votes)
2 views

Chapter 6 - Query Processing and Optimization Algorithm

The document discusses query processing and optimization algorithms in database management systems, focusing on SQL queries and relational algebra. It outlines operations such as SELECT, PROJECT, and JOIN, and describes the phases of query processing including decomposition, optimization, code generation, and execution. Additionally, it covers approaches to query optimization, cost components, and techniques like pipelining for efficient query execution.

Uploaded by

diro bayisa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Chapter 6 - Query Processing and Optimization Algorithm

The document discusses query processing and optimization algorithms in database management systems, focusing on SQL queries and relational algebra. It outlines operations such as SELECT, PROJECT, and JOIN, and describes the phases of query processing including decomposition, optimization, code generation, and execution. Additionally, it covers approaches to query optimization, cost components, and techniques like pipelining for efficient query execution.

Uploaded by

diro bayisa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Exit Exam Tutorial

Part 2: Fundamental Database Management Systems


Episode 6: Query Processing and Optimization
Algorithm
2.6 Query Processing and
Optimization Algorithm
2.6.1 SQL Queries and Relational Algebra
- It enables a user to specify basic retrieval requests. A
sequence of relational algebra operations forms a relational
algebra expression.
- It provides a formal foundation for relational model
operations.
- It is used as a basis for implementing and optimizing queries
in relational database management system. Its operations
can be divided into two:
1. Include set operations, UNION, INTERSECTION, SET
DIFFERENCE, and CARTESIAN PRODUCT.
2. Developed for relational database specifically (SELECT,
PROJECT, and JOIN).
2.6 Query Processing and
Optimization Algorithm
1. SELECT operation
 A SELECT operation can be visualized as a horizontal
partitioning of the relation into two sets of tuples.
 Example, σDno=4(Employee) – to select employees whose
department is 4
 The general form is σ<Selection condition>(R)
 σ - sigma is used to denote the select operator, and the
selection operation is Boolean expression.
 Select operation is commutative.
2.6 Query Processing and
Optimization Algorithm
2. PROJECT operation
 If we are interested only in certain attributes, we use
project operation to project the relation over these
attributes only. It can be visualized as a virtual partitioning
of the relation.
 Example, to list each employee’s first and last name and
salary, we can use project operation
 πLname, Fname, Salary (Employee)
 The general form of the project operation is π<Attribute
list> (R)
 Where π (Pi) is used to represent the project operation. If
the attribute list includes only non key attributes or R,
duplicate tuples are likely to occur.
 However, project operation removes any duplicate tuples.
 Project operation is not commutative.
2.6 Query Processing and
Optimization Algorithm
 We can apply several relational algebra operations one
after the other.
 πLname,Fname,Salary (σDno=5 (Employee))
2.6 Query Processing and
Optimization Algorithm
3. JOIN operation:
 The JOIN operation denoted by , is used to combine
related tuples from two relations into a single tuples.
 It allows as processing relationships among relations. For
example, to get the names of the managers of each
department (i.e. to get the managers name, we need to
combine each department tuple with the employee tuple
whose SSN value matches the mgrSSN value in the
department tuple.)
2.6 Query Processing and
Optimization Algorithm
 This is a general form R <join condition>S: it can be
combined with other operations
 πDname,Lname,Fname(R<mgrSSN=SSN>S (Employee))
 There are different kinds of joins these are:
 INNER JOIN: Used to combine data from multiple relations
so that related information can be presented in a single
table, only matching records are kept in the result.
 OUTER JOIN: Can be used if we want to keep all the tuples
in R, or all those in S if we have two relations R and S, or all
those in both relations regardless of whether or not they
have matching tuples in the other relation. It can be Left
outer join, Right outer join, or full outer join.
2.6 Query Processing and
Optimization Algorithm
 LEFT OUTER JOIN: to retrieve a list of all employee names
and the name of the departments they manage, if they do
not manage it will be indicated by null.
 It keeps every tuple in the first or left relation.
 If no matching tuple is found in S, these attributes are filled
with Null values in the result.
 RIGHT OUTER JOIN: keeps every tuple in the second or
right relation. If no matching tuples found these attribute
values will be filled with Null values.
 FULL OUTER JOIN: keeps all tuples in both the left and right
relation. If no matching tuples are found it will put Null
values.
2.6 Query Processing and
Optimization Algorithm
 Since relational algebra is somewhat low, level than SQL
queries it will give us a mathematical foundation for
analyzing and optimizing SQL queries.
 SELECT LNAME,FNAME FROM EMPLOYEE WHERE SALARY
> 5000
 ,
 SELECT MAX(SALARY) FROM EMPLOYEE WHERE DNO =
5
F
2.6 Query Processing and
Optimization Algorithm
2.6.2 Query Processing and Optimization
- The aim of query processing is to find information in one or
more databases and deliver it to the user quickly and
efficiently.
- Traditional techniques work well for databases with
standard, single-site relational structures, but databases
containing more complex and diverse types of data demand
new query processing and optimization techniques.
- Query Processing can be divided into four main phases:
1. Decomposition
2. Optimization
3. Code generation, and
4. Execution.
2.6 Query Processing and
Optimization Algorithm
 Query decomposition is the process of transforming a high
level query into a relational algebra query, and to check that
the query is syntactically and semantically correct.
 Query decomposition consists of parsing and validation.
 Typical stages in query decomposition are:
1. Analysis: lexical and syntactical analysis of the query
(correctness). Query tree will be built for the query
containing leaf node for base relations, one or many non-leaf
nodes for relations produced by relational algebra
operations and root node for the result of the query.
Sequence of operation is from the leaves to the root.
2. Normalization: convert the query into a normalized form.
The predicate WHERE will be converted to Conjunctive (Ú)
or Disjunctive (Ú) Normal form.
3. Semantic Analysis: to reject normalized queries hat are not
correctly formulated or contradictory.
 Incorrect if components do not contribute to generate result.
2.6 Query Processing and
Optimization Algorithm
 Contradictory if the predicate can not be satisfied by any
tuple.
 Algorithms: relation connection graph and normalized
attribute connection graph.
4. Simplification: to detect redundant qualifications,
eliminate common sub-expressions, and transform the
query to a semantically equivalent but more easily and
effectively computed form.
5. Query Restructuring: More than one translation is
possible Use transformation rules.
 Most real-world data is not well structured.
 Today's databases typically contain much non-structured
data such as text, images, video, and audio, often
distributed across computer networks.
2.6 Query Processing and
Optimization Algorithm
 Query processing: Execute transactions in behalf of this query
and print the result.
 Query optimizers are one of the main means by which modern
database systems achieve their performance advantages.
 Given a request for data manipulation or retrieval, an
optimizer will choose an optimal plan for evaluating the
request from among the manifold alternative strategies. i.e.
there are many ways (access paths) for accessing desired
file/record.
 The optimizer tries to select the most efficient (cheapest)
access path for accessing the data.
 DBMS is responsible to pick the best execution strategy based
on various considerations.
 Query optimizers were already among the largest and most
complex modules of database systems.
2.6 Query Processing and
Optimization Algorithm
2.6.3 Approaches to Query Optimization
1. Heuristics Approach: The heuristic approach uses the
knowledge of the characteristics of the relational algebra
operations and the relationship between the operators to
optimize the query.
 Thus the heuristic approach of optimization will make use
of: Properties of individual operators and Association
between operators.
2.6 Query Processing and
Optimization Algorithm
2. Query Tree: a graphical representation of the operators,
relations, attributes and predicates and processing sequence
during query processing.
 Query tree is composed of three main parts:
i. The Leafs: the base relations used for processing the query/
extracting the required information
ii. The Root: the final result/relation as an output based on
the operation on the relations used for query processing
iii. Nodes: intermediate results or relations before reaching
the final result.
 Sequence of execution of operation in a query tree will start
from the leaves and continues to the intermediate nodes
and ends at the root.
2.6 Query Processing and
Optimization Algorithm
- The properties of each operations and the association
between operators is analyzed using set of rules called
TRANSFORMATION RULES.
- Use of the transformation rules will transform the query to
relatively good execution strategy.
2.6 Query Processing and
Optimization Algorithm
2.6.4 Transformation Rules for Relational Algebra
1. Cascade of SELECTION: conjunctive SELECTION
Operations can cascade into individual Selection Operations
and Vice Versa
(c1∩c2∩c3) (R)= σc1(σc2(σc3(R))where ci is a predicate where
ci is a predicate.
2. Commutatively of SELECTION operations
σc1(σc2(R))= σc2(σc1(R)) where ci is a predicate
3. Cascade of PROJECTION: in the sequence of PROJECTION
Operations, only the last in the sequence is required
πL1πL2πL3πL4(R)=πL1(R).
2.6 Query Processing and
Optimization Algorithm
4. Commutatively of SELECTION with PROJECTION and Vise
Versa
a. If the predicate c1 involves only the attributes in the
projection list (L1), then the selection and projection
operations commute.
πL1(σc1(R))= σc1 (πL1(R))
5. Commutatively of THETA JOIN/Cartesian Product
R X S is equivalent to S X R
Also holds for Equi-Join and Natural-Join
(R c1S)= (S c1R)
2.6 Query Processing and
Optimization Algorithm
6. Commutatively of SELECTION with THETA JOIN
a. If the predicate c1 involves only attributes of one of the
relations (R) being joined, then the Selection and Join operations
commute.
σc1 (R c S)=( σc1 (R)) c S)
b. If the predicate is in the form c1,c2 and c1 involves only
attributes of R and c2 involves only attributes of S, then the
Selection and Theta Join operations commute.
σc1^c2 (R cS)=( σc1 (R)) c(σc2 S))
7. Commutatively of PROJECTION and THETA JOIN
If the projection list is of the form L1, L2, where L1 involves only
attributes of R and L2 involves only attributes of S being joined
and the predicate c involves only attributes in the projection list,
then the SELECTION and JOIN operations commute
πL1,L2 (R c S)=( πL1,L2 (R)) c (πL1,L2 S))
2.6 Query Processing and
Optimization Algorithm
8. Commutatively of the Set Operations: UNION and
INTERSECTION but not SET DIFFERENCE
R∩S=S∩R and R S=S R
9. Associatively of the THETA JOIN,CARTESIAN PRODUCT,
UNION and INTERSECTION.
(Rθ S)θ T=Rθ (SθT) where θ is one of the operations
10. Commuting SELECTION with SET OPERATIONS
σc (RθS)= (σc(R) θσc(S))where θ is one of the operations
11. Commuting PROJECTION with UNION
πL1 (S R)= πL1 (S) πL1 (R)
2.6 Query Processing and
Optimization Algorithm
2.6.5 Cost Components for Query Optimization
- The costs of query execution can be calculated for the
following major process we have during processing.
1. Access Cost of Secondary Storage
Data is going to be accessed from secondary storage, as a
query will be needing some part of the data stored in the
database. The disk access cost can again be analyzed in terms
of: Searching, Reading, and Writing, data blocks used to store
some portion of a relation.
- The disk access cost will vary depending on the file
organization used and the access method implemented for
the file organization.
- In addition to the file organization, the data allocation
scheme, whether the data is stored contiguously or in
scattered manner, will affect the disk access cost.
2.6 Query Processing and
Optimization Algorithm
2. Storage Cost
- While processing a query, as any query would be composed
of many database operations, there could be one or more
intermediate results before reaching the final output.
- These intermediate results should be stored in primary
memory for further processing.
- The bigger the intermediate relation, the larger the memory
requirement, which will have impact on the limited
available space.
- This will be considered as a cost of storage.
2.6 Query Processing and
Optimization Algorithm
3. Computation Cost
- Query is composed of many operations.
- The operations could be database operations like reading
and writing to a disk, or mathematical and other operations
like: Searching, Sorting, Merging, Computation on field
values.
4. Communication Cost
- In most database systems the database resides in one
station and various queries originate from different
terminals.
- This will have impact on the performance of the system
adding cost for query processing.
- Thus, the cost of transporting data between the database
site and the terminal from where the query originate should
be analyzed.
2.6 Query Processing and
Optimization Algorithm
2.6.6 Pipelining
- Pipelining is another method used for query optimization.
- It is sometime referred to as on-the-fly processing of
queries.
- As query optimization tries to reduce the size of the
intermediate result, pipelining use a better way of reducing
the size by performing different conditions on a single
intermediate result continuously.
- Thus the technique is said to reduce the number of
intermediate relations in query execution.
- Pipelining performs multiple operations on a single relation
in a pipeline.
2.6 Query Processing and
Optimization Algorithm
Special Thanks to the publisher and author with:
2.6 Query Processing and
Optimization Algorithm
TOPICS AND THE CONCEPTS:
SQL queries to Relational Algebra
Query Processing
Query Optimization
Query Tree
Using Heuristics for Query Optimization

REFERENCES:
Fundamental Database Management Systems (6th Edition) by Ramez Elmasri, Shamkant B. Navaathe
Database Systems: A Practical Approach to Design, Implementation, and Management (6th Edition) by
Thomas Connolly, Carolyn Begg

PRESENTED BY:
Mohammed Nebil

HISTORY OF THE PROGRAMMING:


Boyce Codd

SPECIAL THANKS:
Digital Library of Educations
Federal Democratic Republic of Ethiopia, Ministry of Educations
Ethiopian Education Short Note

You might also like