ADBChapter 1
ADBChapter 1
1
Query Processing
2
Query Processing
These activities includes parsing the queries and
translate them into expressions that can be implemented
at the physical level of the file system,
Optimizing the query of internal form to get a suitable
execution strategies for processing and then doing the
actual execution of queries to get the results.
Query processing: A 3-step process that transforms a
high-level query (of relational calculus/SQL) into an
equivalent and more efficient lower-level query.
3
Basic Steps in Processing an SQL Query
1. Parsing and Translating:-
Parser checks syntax, validates relations, attributes and
access permissions.
Translate the query into an equivalent relational algebra
expression.
2. Evaluation:-
The query execution engine takes a physical query plan,
executes the plan, and returns the result.
Generate an optimal evaluation plan (with lowest cost) for
the query plan.
4
Basic Steps in Processing an SQL Query
3. Optimization:
Find the cheapest execution plan for a query.
The query-execution engine takes an (optimal) evaluation
plan, executes that plan, and returns the answers to the
query.
Objective of query optimization is to minimize the
following cost function:
I/O cost + CPU cost + communication cost.
5
cont…
A query expressed in a high-level query language such as SQL
must first be scanned, parsed, and validated.
The scanner identifies the language token such as SQL
keywords, attribute names, and relation names in the text of the
query.
Whereas the parser checks the query syntax to determine
whether
it is formulated according to the syntax rules of the query
language.
The query must also be validated, by checking that all attribute
and relation names are valid and semantically meaningful names
in the schema of the particular database being queried.
6
Query processing cont…
7
Translating SQL Queries into Relational
Algebra
We need to know about relational algebra to understand query
execution and optimization in a relational DBMS.
Relational Algebra:- An algebra whose objects are
relations and whose operators transform relations into
other relations.
Basic operators: select , project, union, set difference,
Cartesian product (or cross product)
8
For example, consider the query:-
SELECT Salary
FROM EMPLOYEE
WHERE Salary >= 5000 ;
The possible relational algebra expressions for this query are:
Salary( Salary>=5000(EMPLOYEE)) or
Salary>=5000( Salary(EMPLOYEE))
9
Translate SQL query into relational algebra
Example
Instructor(ID,Fname,gender,salary,ddno).
Department(Dno,dname,address).
Course(course_id,title,deptname,credits).
10
Examples of Translate SQL query into relational algebra
11
Query Optimization
It is the process of choosing a suitable execution
strategy for processing a query.
It is optimizing the query of internal form to get a
suitable execution strategies for processing and then
doing the actual execution of queries to get the results.
Used to find an efficient physical query plan for an SQL
query.
Goal is minimize the evaluation time for the query,
i.e. compute query result as fast as possible
12
Steps in query optimization
13
Cont’d
3. Query Plan Code Generation:
Code Generation is the final step in the Query Optimization.
It is the executable form of the query.
Once the query code is generated, the execution manager runs it and
produces the results.
A query tree is used to represent a relational algebra or
extended relational algebra expression, whereas
• A query graph is used to represent a relational calculus
expression.
14
Techniques for Query Optimization
Main techniques for query optimization
1. Based on Heuristic Rules for ordering the operations in
query execution strategy.
2. Systematically estimation:
– It estimates cost of different execution strategies and chooses the
execution plan with lowest execution cost.
3. Semantic query optimization
15
Heuristic Approach
The heuristic rules are used as an optimization technique to modify
the internal representation of query.
Heuristic rules are used in the form of query tree of query graph
data structure, to improve its performance.
• One of the main heuristic rule is to apply SELECT operation before
applying the JOIN or other BINARY operations.
This is because the size of the file resulting from a binary operation
such as JOIN is usually a multi value function of the sizes of the
input file
16
General Guideline
A conjunctive selection condition can be broken up into a
cascade of individual σ operations.
this will allow moving selection down the tree at different
branches
Rearrange base relations so that the most restrictive selection
is executed first.
Combine Cross product X with a selection replace with
JOIN
Moving project operations down the query tree
Execute select and join operations that are more restrictive
or result in less tuples
17
Company database schema
18
Cont’d
Q2. Find the last names of employees born after 1957-12-31 who
work on a project named ‘Aquarius’.
for the following question
Write SQL query
Write relational algebraic representation
Draw the canonical query tree
Using the Heuristic rules optimize (show all the necessary steps)
19
SQL
SQL Query
SELECT Lname
FROM EMPLOYEE, WORKS_ON, PROJECT
WHERE Pname=‘Aquarius’ AND Pnumber=Pno AND Essn=Ssn
AND Bdate > ‘1957-12-31’;
20
Initial (canonical) query tree for SQL query Q
21
Moving select operation down the query tree
22
Applying the more restrictive select
operation first
23
Replacing Cartesian product and select
with join operations.
24
Moving project operations
down the query tree
25
Exercise 1
for the following question use the schema give on slide 18
Write SQL query
Write the possible relational algebra representation
Draw the canonical query tree
Using the Heuristic rules optimize (show all the necessary steps)
Q1. For every project located in ‘Stafford’, retrieve the project
number, the controlling department number, and the department
manager’s last name, address, and birthdate.
26
Exercise 2
for the following question use the schema give on slide 18
Write SQL query
Write the possible relational algebra representation
Draw the canonical query tree
Using the Heuristic rules optimize (show all the necessary steps)
Q2. Retrieve first name, birthdste and address of an employee
from the research department.
27
Systematical Estimation( Cost Estimation )
It uses traditional optimization techniques that search the
solution space to a problem for a solution that minimizes an
objective (cost) function.
The cost functions used in query optimization are estimates and
not exact cost functions
Cost Estimation for Relational Algebra Expressions:
Estimation of relational algebra expression
Choosing the expression with the lowest cost
28
Cont’d
Cost Estimation Components:
Access cost to secondary storage : is the cost of transferring
data blocks between secondary disk storage and main memory
buffers
Storage cost – cost of storing intermediate results
Computation cost : is the cost of performing in-memory
operations on the records within the data buffers during query
execution.
Memory usage cost :t he number of main memory buffers needed
during query execution.
Communication cost: is the cost of shipping the query and its
results from the database site
29
Semantic Query Optimization
• Semantic information stored in databases as integrity
constraints could be used for query optimization.
integrity : preserve data consistency when changes made
in a database.
This technique, which may be used in combination with
the techniques discussed previously, uses constraints
specified on the database schema.
such as unique attributes and other more complex
constraints.
30
Advantages of Query Optimization
Faster processing of Query
Lesser cost per Query
High performance of the system
Lesser stress on the database
Efficient usage of database engine
Lesser memory is consumed
31
Reading Assignment
What is System R or System R approach?
32