Chapter Two
Query Processing and
Optimization
Department of Computer Science 1
Outline
What does query processing mean?
Steps in query processing
Translating SQL queries into relational algebra
Basic algorithms for executing query operations
Evaluation of expressions
Using heuristic in query optimization
Using selectivity and cost estimates in query
optimization
Department of Computer Science 2
What Does Query processing Mean?
Query processing refers to the range of activities
involved in extracting data from a database
It is the entire process or activity which involves
query translation into low level instructions, query
optimization to save resources, cost estimation or
evaluation of query, and extraction of data from the
database.
The main goal of query processing is to find an
efficient query execution plan for a given SQL
query which would minimize the cost considerably,
especially time.
Query Processing
Query Processing is a procedure of converting a query
written in high-level language (Ex. SQL) into a
correct and efficient execution plan expressed in low-
level language, which is used for data manipulation.
A query expressed in a high-level query language such
as SQL must first be
Scanned
Parsed and
Validated
Scanner identifies the query tokens such as
SQL keywords
Attribute names and
Relation names that appear in the text of the query
Department of Computer Science 4
Con…
Parser checks the query syntax to determine whether it is
formulated according to the syntax rules (rules of grammar) of
the query language
Department of Computer Science 5
Query Processing
Validated by checking that all attribute and
relation names are valid and
o Semantically meaningful names in the schema of
the particular database being queried
An internal representation of the query is then
created, usually as a tree data structure called a
query tree
It is also possible to represent the query using a
graph data structure called a query graph
o Directed acyclic graph (DAG)
The DBMS must then devise an execution
strategy or query plan for retrieving the
results of the query from the database files.
Department of Computer Science 6
Query Processing
The query processing involves three basic steps.
Parsing and translation
Optimization
Evaluation
Parsing and translation: Parser checks the syntax and
verifies the user’s privilege to execute the query, the
relations and the attributes which are used in the query.
If written SQL is valid, the translator converts given
SQL query into respective relational algebra
Optimization: it uses statistical data stored as part of
data dictionary like information about size of the table,
the length of the records, the indexes created on the table
Different query execution plans for a given query, can
have different cost.
It is the responsibility of query optimizer to generate
least costly plan and place it in evaluation engine.
Evaluation: takes a query-execution plan, executes
that plan, and returns the answers to the query.
Query Processing
Query optimizer module has the task of producing
a good execution plan
Code generator generates the code to execute that
plan
Runtime database processor has the task of
running (executing) the query code, whether in
compiled or interpreted mode, to produce the query
result
If a runtime error results, an error message is
generated by the runtime database processor
Department of Computer Science 9
Query Processing
High-level query language such as SQL for
relational DBMSs (RDBMSs) or OQL for object
DBMSs (ODBMSs) is more declarative in nature
because
o It specifies what the intended results of the
query are, rather than identifying the details of
how the result should be obtained.
Query optimization is thus necessary for queries
that are specified in a high-level query language
Department of Computer Science 10
Relational Algebra Operators
Selection (σ): is unary operator, select rows from a relation
Syntax: σ <selection_condition>(Relation)
σage>21(Student)
Example: Write an RA expression to find all instructor working in
Finance department.
Solution:
Projection (π): It ignores(delete) unwanted columns of given relation
from resulting relation.
Write an RA expression to list instructor name
Syntax: πname(instructor)
Cross-product (𝑅1 × 𝑅2)
it concatenates every tuples of relation R1 with every tuples of
relation R2.
Con…
Set-difference (R1 – R2): returns tuples in relation
R1, but not in relation R2, . It requires two input
Union (𝑅1 𝖴 𝑅2): returns every tuples in relation
relations which are union compatible
Intersection (𝑅1 ∩ 𝑅2): returns tuples both relation
R1 and every tuples in relation R2
R1 and R2 have in common
Department of Computer Science 12
Con…
Join (⋈): it is binary operator. It allows us to combine two
Condition join: Syntax: R1 ⋈C R2 Sometimes called a
relations.
theta-join
Equal-Join: is a special case of condition join where the
condition c contains only equalities.
Syntax: R1⋈equality conditionR2. Result schema similar to cross-
product, but only one copy of fields for which equality is
specified.
Natural Join: Join on all common fields.
Translating SQL Queries into
Relational Algebra
SQL is the query language that is used in most
commercial RDBMSs
SQL Query is first translated into an equivalent
extended relational algebra expression-
represented as a query tree data structure that is
then optimized
SQL queries are decomposed into query blocks
The basic unit that can be translated into the algebraic
operators and optimized.
A query block contains a single SELECT-FROM-
WHERE expression, as well as GROUP BY and
HAVING clause if these are part of the block.
Nested queries within a query are identified as
separate query blocks.
Aggregate operators in SQL must be included in
the extended algebra( MAX, MIN, SUM, COUNT).
Department of Computer Science 15
Translating SQL Queries into
Relational Algebra
Consider the following SQL query
SELECT Lname, Fname
FROM EMPLOYEE
WHERE Salary > ( SELECT MAX (Salary)
FROM
EMPLOYEE
WHERE Dno=5 );
This query retrieves the names of employees
(from any department in the company) who earn
a salary that is greater than the highest salary in
department 5
The query includes a nested subquery and hence
would be decomposed into two blocks
Department of Computer Science 16
Translating SQL Queries into
Relational Algebra
The inner block is:
( SELECT MAX (Salary)
FROM EMPLOYEE
WHERE Dno=5 );
This query retrieves the highest salary in
department 5.
The outer query block is:
SELECT Lname, Fname
FROM EMPLOYEE
WHERE Salary > c
where c represents the result returned from the
inner block
Department of Computer Science 17
Translating SQL Queries into
Relational Algebra
SELECT LNAME, FNAME
FROM EMPLOYEE
WHERE SALARY > ( SELECT MAX (SALARY)
FROM EMPLOYEE
WHERE DNO = 5);
Outer Inner
query query
block block
SELECT LNAME, FNAME SELECT MAX (SALARY)
FROM EMPLOYEE FROM EMPLOYEE
WHERE SALARY > C WHERE DNO = 5
πLNAME, FNAME (σSALARY>C(EMPLOYEE)) GMAX SALARY (σDNO=5 (EMPLOYEE))
outer block into extended relational
the expression Department of Computer Science
algebra expression 18
Cont’d…
The query optimizer would then choose an
execution plan for each query block
NB: In the above example, the inner block needs to
be evaluated only once to produce the
maximum salary of employees in department 5,
which is then used as the constant c by the
outer block
Department of Computer Science 19
Transformation of relational expressions
Two relational algebra expressions are said to be equivalent
if the two expressions generate the same set of tuples.
Example: Customer Account
Na Balan
CID ANO ANO
me ce
C01 A01 Raj A01 3000
C02 A02 Meet A02 1000
C03 A03 Jay A03 2000
C04 A04 Ram A04 4000
ΠName ( σBalance<2500 (Account) (Customer) )
Customer
Name
Meet
ΠName ( σBalance<2500 (Account Customer) )
Jay
Department of Computer Science 20
Cont’d…
Combined selection operation can be
divided into sequence of individual selections.
This transformation is called cascade of σ.
Example:
Customer
CID
AN Na
O me
Balan
ce σANO<3 Λ Balance<2000 Output
C01 1 Raj 3000 (Customer)
AN Na Balan
OUT CID
O me ce
C02 2 Meet 1000
PUT C02 2 Meet 1000
C03 3 Jay 2000
C04 4 Ram 4000
σANO<3 (σBalance<2000
(Customer))
σθ1Λθ2 (E) = σθ1 (σθ2 (E))
Department of Computer Science 21
Cont’d…
Selection operations are commutative.
Example:
Customer
CID
AN Na
O me
Balan
ce σANO<3 (σBalance<2000 Output
C01 1 Raj 3000 (Customer))
AN Na Balan
OUT CID
O me ce
C02 2 Meet 1000
PUT C02 2 Meet 1000
C03 3 Jay 2000
C04 4 Ram 4000
σBalance<2000 (σANO<3
(Customer))
σθ1 (σθ2 (E)) = σθ2 (σθ1 (E))
Department of Computer Science 22
Algorithms for External Sorting
Sorting is one of the primary algorithms used in query
processing
For example, whenever an SQL query specifies an
ORDER BY clause, the query result must be sorted
External sorting:
Refers to sorting algorithms that are suitable for large
files of records stored on disk that do not fit entirely in
main memory, such as most database files.
External sorting algorithm uses a sort-merge strategy
Sort-Merge strategy:
Starts by sorting small subfiles called runs of the main
file and then merges the sorted runs, creating larger
sorted subfiles that are merged in turn.
Sorting phase:
In the sorting phase, runs (portions or pieces) of the file
that can fit in the available buffer space are read into
main memory, sorted using an internal sorting algorithm,
and written back to disk as temporary sorted subfiles (or
runs) Department of Computer Science 23
External Sort-Merge (Example)
• Blocks=3
24 19 14
24 2
19 24 16
19 3
31 31 19
31 7
33 14 24
33 14
14 16 31
14 14
16 33 33
16 16
16 16 3 16
2
21 21 16 19
3
3 3 21 21
7
2 merg merg 24
14
2 2 e
7 e 31
create 7 7 pass- 16 pass-
14 runs 1 2 33
21
14 14
initial sorted
relation runs runs output
Department of Computer Science 24
Algorithms for SELECT
Operation
There are many algorithms for executing a
SELECT operation, which is basically a search
operation to locate the records in a disk file that
satisfy a certain condition
Examples:
• (OP1): σSsn='123456789' (EMPLOYEE)
• (OP2): σDNUMBER>5(DEPARTMENT)
• (OP3): σDno=5(EMPLOYEE)
• (OP4): σDno=5 AND SALARY>30000 AND
SEX=‘F’(EMPLOYEE)
• (OP5): σESSN=‘123456789’ AND
PNO=10(WORKS_ON)
Department of Computer Science 25
Implementing the JOIN
Operation
The JOIN operation is one of the most time-
consuming operations in query processing
Two–way join: a join on two files
e.g.
multi-way joins: joins involving more than two
files.
e.g.
In two way join above A and B are the join
attributes
• Which should be domain-compatible attributes of
R and S, respectively.
We illustrate four of the most common techniques
for performing such a join, using the following
sample operations below.
Department of Computer Science 26
Implementing the JOIN
Operation
Examples
Methods for implementing joins:
J1-Nested-loop join (Nested Block Join):
This is the default (brut force) algorithm, as it
does not require any special access paths
on either file in the join
For each record t in R (outer loop), retrieve
every record s from S (inner loop) and test
whether the two records satisfy the join
condition t[A] = s[B].
Department of Computer Science 27
Sum (Nested loop join)
Assuming worst case memory availability and
the following given statistics for the relations
customer and depositor
• Number of records of customer: 10,000
(ncustomer)
• Number of records of depositor: 5,000
(ndepositor)
• Number of blocks of customer: 400 (bcustomer)
• Number of blocks of depositor: 100 (bdepositor)
Estimate the cost
1. with depositor as outer relation
2. with customer as outer relation
Department of Computer Science 28
Sum (Nested loop join)
(Worst case)
1. with depositor as outer relation
No. of blocks access = ndepositor * bcustomer
+ bdepositor
= 5000 * 400 + 100
= 2000100
2. with customer as outer relation
No. of blocks access = ncustomer * bdepositor
+ bcustomer
= 10000 * 100 + 400
= 1000400
Department of Computer Science 29
Sum (Nested loop join)
Assuming best case memory availability and the
following given statistics for the relations customer
and depositor
• Number of records of customer: 10,000 (ncustomer)
• Number of records of depositor: 5,000 (ndepositor)
• Number of blocks of customer: 400 (bcustomer)
• Number of blocks of depositor: 100 (bdepositor)
Estimate the cost
1. with customer as outer relation
No. of blocks access = bdepositor + bcustomer
= 100 + 400
= 500
Department of Computer Science 30
Cont’d…
J2-Index-based Single-loop join (Using an access
structure to retrieve the matching records):
If an index (or hash key) exists for one of the two join
attributes- say, attribute B of file S-retrieve each record
t in R (loop over file R), one at a time, and then use the
access structure to retrieve directly all matching
records s from S that satisfy s[B] = t[A]
J3-Sort-merge join:
If the records of R and S are physically sorted (ordered)
by value of the join attributes A and B, respectively, we
can implement the join in the most efficient way
possible.
Both files are scanned in order of the join attributes,
matching the records that have the same values for A
and B.
In this method, the records of each file are scanned
only once each for matching with the other file-unless
both A and B are non-key attributes,
Department of Computer Science in which case 31
the method needs to be modified slightly.
Cont’d…
J4-Hash-join:
The records of files R and S are both hashed to
the same hash file, using the same hashing
function on the join attributes A of R and B of S
as hash keys.
A single pass through the file with fewer records
(say, R) hashes its records to the hash file
buckets.
A single pass through the other file (S) then
hashes each of its records to the appropriate
bucket, where the record is combined with all
matching records from R.
hash bucket and probing phase are two
processing for hash join
Department of Computer Science 32
Cost of computing for all joins
R is the outer and S is the inner relation of the
join.
• Number of records of R: (NR)
• Number of records of S: (NS)
• Number of blocks of R: (BR)
• Number of blocks of S: (B S)
Join Worst Case Best Case
Nested-Loop Join BR + NR ∗ BS BR + B S
Block Nested-Loop BR + B R ∗ B S BR + B S
Join
Index Nested-Loop BR + NR ∗ c
Join
Merge Join BR + B S
Hash-Join 3 ∗ (BR + BS)
• c is the cost of a single selection on S using the join condition.
Department of Computer Science 33
Algorithms for PROJECT
operation
Algorithm for PROJECT operations <attribute list>(R) is
straight forward to implement
If <attribute list> has a key of relation R, extract
all tuples from R with only the values for the
attributes in <attribute list>.
If <attribute list> does NOT include a key of
relation R, duplicated tuples must be removed
from the results.
This can be done by sorting the result of the
operation and then eliminating duplicate tuples,
which appear consecutively after sorting
Methods to remove duplicate tuples
1. Sorting: sorting the result of the operation and
then eliminating duplicate tuples, which appear
consecutively after sorting
2. Hashing: each record is hashed and inserted into
a bucket of the hash file in memory, it is checked
against those records already in the bucket; if it is
a duplicate, it is not inserted in the bucket.
Department of Computer Science 34
Algorithms for SET operations
Set operations:
o UNION, INTERSECTION, SET DIFFERENCE
and CARTESIAN PRODUCT
CARTESIAN PRODUCT of relations R and S
include all possible combinations of records from
R and S.
o The attribute of the result include all attributes
of R and S.
Cost analysis of CARTESIAN PRODUCT
o If R has n records and j attributes and S has m
records and k attributes, the result relation will
have n*m records and j+k attributes
o CARTESIAN PRODUCT operation is very
expensive and should be avoided if possible
Department of Computer Science 35
Algorithms for SET operations
UNION
o Sort the two relations on the same attributes.
o Scan and merge both sorted files concurrently,
whenever the same tuple exists in both
relations, only one is kept in the merged results.
INTERSECTION
o Sort the two relations on the same attributes.
o Scan and merge both sorted files concurrently,
keep in the merged results only those tuples
that appear in both relations.
SET DIFFERENCE R-S
o Keep in the merged results only those tuples
that appear in relation R but not in relation S.
o The result of this operation, denoted by R - S, is
a relation that includes all tuples that are in R
but not in S
Department of Computer Science 36
Implementing Aggregate
Operations
Aggregate Operators:
o MIN, MAX, SUM, COUNT and AVG
Options to implement aggregate operators:
o Table Scan
o Index
Example:
SELECT MAX (SALARY)
FROM EMPLOYEE;
If an (ascending) index on SALARY exists for the
employee relation, then the optimizer could
decide on traversing the index for the largest
value, which would entail following the right most
pointer in each index node from the root to a leaf.
Department of Computer Science 37
Implementing Aggregate
Operations (Cont’d.)
SUM, COUNT and AVG
For a dense index (each record has one index entry):
o Apply the associated computation to the values in
the index.
For a non-dense index:
o Actual number of records associated with each index
entry must be used for a correct computation
o This can be done if the number of records associated
with each value in the index is stored in each index
entry.
With GROUP BY: the aggregate operator must be
applied separately to each group of tuples.
oUse sorting or hashing on the group attributes to
partition the file into the appropriate groups;
oComputes the aggregate function for the tuples in
each group.
Department of Computer Science 38
Implementing Outer Join
Outer Join Operators:
LEFT OUTER JOIN
RIGHT OUTER JOIN
FULL OUTER JOIN
The full outer join produces a result which is
equivalent to the union of the results of the left
and right outer joins.
Example:
SELECT FNAME, DNAME
FROM (EMPLOYEE LEFT
OUTER JOIN DEPARTMENT ON
DNO = DNUMBER);
Note: The result of this query is a table of
employee names and their associated
departments. It is similar to a regular join result,
with the exception that if an employee does not
have an associated department, the employee's
Department of Computer Science 39
Implementing Outer Join
(Cont’d.)
Modifying Join Algorithms:
Nested Loop or Sort-Merge joins can be
modified to implement outer join. E.g.,
For left outer join, use the left relation as
outer relation and construct result from every
tuple in the left relation
If there is a match, the concatenated tuple is
saved in the result
However, if an outer tuple does not match,
then the tuple is still included in the result
but is padded with a null value(s)
Department of Computer Science 40
Implementing Outer Join
(Cont’d.)
Theoretically, outer join can also be computed by
executing a combination of relational Algebra
operators.
Implement the previous left outer join example
1. Compute the (inner) JOIN of the EMPLOYEE and
DEPARTMENT tables
• TEMP1FNAME,DNAME(EMPLOYEE DNO=DNUMBER
DEPARTMENT)
2. Find the EMPLOYEEs that do not appear in the
(inner) JOIN
• TEMP2 FNAME (EMPLOYEE) - FNAME (Temp1)
3. Pad each tuple in TEMP2 with a null DNAME field
• TEMP2 TEMP2 x 'null'
4. UNION the temporary tables to produce the LEFT
OUTER JOIN Department of Computer Science 41
• RESULT TEMP1 υ TEMP2
Evaluation of expressions
Expression may contain more than one
operations, solving expression will be difficult if
it contains more than one operations.
ΠCust_Name ( σBalance<2500 (account) (customer) )
To evaluate such expression we need to
evaluate each operations one by one in
appropriate order.
Two methods for evaluating an expression
carrying multiple operations are:
Materialization
Pipelining
Department of Computer Science 42
Cont’d….
ΠCust_Name ( σBalance<2500 (account) (customer) )
ΠCust_Name
Bottom to top
Execution
σBalance<2500 (customer)
(account)
Department of Computer Science 43
Materialization
Materialization evaluates the expression tree of
the relational algebra operation from the bottom
and performs the innermost or leaf-level
operations first.
The intermediate result of each operation is
materialized (store in temporary relation) and
becomes input for subsequent (next) operations.
The cost of materialization is the sum of the
individual operations plus the cost of writing the
intermediate results to disk.
The problem with materialization is that
• it creates lots of temporary relations
• it performs lots of I/O operations
Department of Computer Science 44
Pipelining
In pipelining, operations form a queue, and results
are passed from one operation to another as they
are calculated.
To reduce number of intermediate temporary
relations, we pass results of one operation to the
next operation in the pipelines.
Combining operations into a pipeline eliminates the
cost of reading and writing temporary relations.
Pipelines can be executed in two ways:
• Demand driven (System makes repeated requests
for tuples from the operation at the top of pipeline)
• Producer driven (Operations do not wait for
request to produce tuples, but generate the tuples
eagerly.)
Department of Computer Science 45
Query Optimization
Exhaustive Search Optimization
• Generates all possible query plans and then the best
plan is selected.
• It provides best solution.
Heuristic Based Optimization
• Heuristic based optimization uses rule-based
optimization approaches for query optimization.
• Performs select and project operations before join
operations. This is done by moving the select and
project operations down the query tree. This reduces
the number of tuples available for join.
• Avoid cross-product operation because they result in
very large-sized intermediate tables.
• This algorithms do not necessarily produce the best
query plan.
Department of Computer Science 46
Using Heuristics in Query
Optimization (1)
Process for heuristics optimization
1.The parser of a high-level query generates an
initial internal representation;
2.Apply heuristics rules to optimize the internal
representation.
3.A query execution plan is generated to
execute groups of operations based on the
access paths available on the files involved in
the query.
The main heuristic is to apply first the
operations that reduce the size of intermediate
results
• E.g., Apply SELECT and PROJECT operations
before applying the JOIN or other binary
operations.
• The SELECT and PROJECT operations reduce
Department of Computer Science 47
Using Heuristics in Query
Optimization (2)
Query tree and query graph can be used as the basis
for the data structures that are used for internal
representation of queries
Query tree:
A tree data structure that corresponds to a relational
algebra expression
It represents the input relations of the query as leaf
nodes of the tree, and represents the relational
algebra operations as internal nodes
An execution of the query tree consists of executing an
internal node operation whenever its operands are
available and then replacing that internal node by the
relation that results from executing the operation.
The order of execution of operations starts at the leaf
nodes, which represents the input database relations for
the query, and ends at the root node, which represents
the final operation of the query
Query graph:
A graph data structure that corresponds to a
Department of Computer Science 48
Using Heuristics in Query
Optimization (3)
Example:
For every project located in ‘Stafford’, retrieve the
project number, the controlling department number
and the department manager’s last name, address
and birthdate.
• Relation algebra:
PNUMBER, DNUM, LNAME, ADDRESS, BDATE
(((PLOCATION=‘STAFFORD’(PROJECT))
DNUM=DNUMBER (DEPARTMENT))
MGRSSN=SSN (EMPLOYEE))
SQL query:
Q2: SELECT P.NUMBER,P.DNUM,E.LNAME,
E.ADDRESS, E.BDATE
FROM PROJECT AS P,DEPARTMENT AS
D,
EMPLOYEE AS E
WHEREDepartment of Computer
P.DNUM=D.DNUMBER
Science AND 49
Query trees for query Q2
Department of Computer Science 50
Using Heuristics in Query
Optimization (5)
Heuristic Optimization of Query Trees:
The same query could correspond to many
different relational algebra expressions and
hence many different query trees.
The task of heuristic optimization of query trees
is to find a final query tree that is efficient to
execute
Example:
Q: SELECT LNAME
FROM EMPLOYEE, WORKS_ON, PROJECT
WHERE PNAME = ‘AQUARIUS’ AND
PNMUBER=PNO AND
ESSN=SSN AND
BDATE > ‘1957-12-31’;
Department of Computer Science 51
Using Heuristics in Query
Optimization (6)
Steps in converting a query tree during
heuristic optimization:
(a) Initial (canonical) query tree for SQL
query Q.
(b) Moving SELECT operations down the
query tree.
(c) Applying the more restrictive SELECT
operation first.
(d) Replacing CARTESIAN PRODUCT and
SELECT with JOIN operations.
(e) Moving PROJECT operations down the
query tree
Applying these steps to query Q are shown in the
Department of Computer Science 52
Using Heuristics in Query
Optimization (7)
(a) Initial
(canonical) query
tree for SQL query
Q.
(b) Moving
SELECT
operations down
the query tree.
Department of Computer Science 53
Using Heuristics in Query
Optimization (8)
(c) Applying the more
restrictive SELECT
operation first.
Department of Computer Science 54
Using Heuristics in Query
Optimization (9)
(d) Replacing
CARTESIAN
PRODUCT and
SELECT with JOIN
operations.
(e) Moving
PROJECT
operations down
the query tree
Department of Computer Science 55
Using Selectivity and Cost
Estimates in Query Optimization
(1)
Cost-based query optimization:
Estimate and compare the costs of executing a
query using different execution strategies and
choose the strategy with the lowest cost
estimate
Issues
Cost function
Number of execution strategies to be
considered
Department of Computer Science 56
Using Selectivity and Cost
Estimates in Query Optimization
(2)
Cost is generally measured as the total time required to
execute a statement/query.
Cost Components for Query Execution
1. Access cost to secondary storage (Disk access)
2. Storage cost
3. Computation cost
4. Memory usage cost
5. Communication cost
Note: Different database systems may focus on different cost
components.
• Disk accesses (time to process a data request and retrieve
the required data from the storage device)
• Disk access is the predominant (major) cost, since disk
access is slow as compared to in-memory operation.
• Cost to write a block is greater than cost to read a
block because data is read back after being written to
ensure that the write was successful.
Department of Computer Science 57
Cont’d…
• Access cost to secondary storage: This is the
cost of transferring (reading and writing) data
blocks between secondary disk storage and main
memory buffers.
• Disk storage cost: This is the cost of storing on
disk any intermediate files that are generated by an
execution strategy for the query.
• Computation cost: This is the cost of performing
in-memory operations on the records within the
data buffers during query execution. Such
operations include searching for and sorting
records, merging records for a join or a sort
operation, and performing computations on field
values. This is also known as CPU (central
processing unit) cost.
Department of Computer Science 58
Cont’d…
• Memory usage cost: This is the cost
pertaining to the number of main memory
buffers needed during query execution
• Communication cost: This is the cost of
shipping the query and its results from the
database site to the site or terminal where the
query originated.
Department of Computer Science 59
Semantic Query Optimization
Uses constraints specified on the database schema
in order to modify one query into another query
that is more efficient to execute.
Consider the following SQL query,
SELECT E.LNAME, M.LNAME
FROM EMPLOYEE AS E, EMPLOYEE AS M
WHERE E.SUPERSSN=M.SSN AND E.SALARY>M.SALARY
Explanation:
Suppose that we had a constraint on the
database schema that stated that no employee
can earn more than his or her direct supervisor.
If the semantic query optimizer checks for the
existence of this constraint, it need not execute
the query at all because it knows that the result
of the query will be empty. Techniques known as
theorem proving can be used for this purpose
Department of Computer Science 60