0% found this document useful (0 votes)
16 views

Chapter Two Query Processing (2)

kkjkjh

Uploaded by

shiferachala778
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Chapter Two Query Processing (2)

kkjkjh

Uploaded by

shiferachala778
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 60

Chapter Two

Query Processing and


Optimization

Department of Computer Science 1


Outline

What does query processing mean?


Steps in query processing
Translating SQL queries into relational algebra
Basic algorithms for executing query operations
Evaluation of expressions
Using heuristic in query optimization
Using selectivity and cost estimates in query
optimization
Department of Computer Science 2
What Does Query processing Mean?

Query processing refers to the range of activities


involved in extracting data from a database
It is the entire process or activity which involves
query translation into low level instructions, query
optimization to save resources, cost estimation or
evaluation of query, and extraction of data from the
database.
The main goal of query processing is to find an
efficient query execution plan for a given SQL
query which would minimize the cost considerably,
especially time.
Query Processing
 Query Processing is a procedure of converting a query
written in high-level language (Ex. SQL) into a
correct and efficient execution plan expressed in low-
level language, which is used for data manipulation.
 A query expressed in a high-level query language such
as SQL must first be
 Scanned
 Parsed and
 Validated
 Scanner identifies the query tokens such as
 SQL keywords
 Attribute names and
 Relation names that appear in the text of the query
Department of Computer Science 4
Con…
Parser checks the query syntax to determine whether it is
formulated according to the syntax rules (rules of grammar) of
the query language

Department of Computer Science 5


Query Processing
 Validated by checking that all attribute and
relation names are valid and
o Semantically meaningful names in the schema of
the particular database being queried
 An internal representation of the query is then
created, usually as a tree data structure called a
query tree
 It is also possible to represent the query using a
graph data structure called a query graph
o Directed acyclic graph (DAG)
 The DBMS must then devise an execution
strategy or query plan for retrieving the
results of the query from the database files.

Department of Computer Science 6


Query Processing
The query processing involves three basic steps.
Parsing and translation
Optimization
Evaluation
Parsing and translation: Parser checks the syntax and
verifies the user’s privilege to execute the query, the
relations and the attributes which are used in the query.
If written SQL is valid, the translator converts given
SQL query into respective relational algebra
Optimization: it uses statistical data stored as part of
data dictionary like information about size of the table,
the length of the records, the indexes created on the table
Different query execution plans for a given query, can
have different cost.
It is the responsibility of query optimizer to generate
least costly plan and place it in evaluation engine.
Evaluation: takes a query-execution plan, executes
that plan, and returns the answers to the query.
Query Processing
 Query optimizer module has the task of producing
a good execution plan
 Code generator generates the code to execute that
plan
 Runtime database processor has the task of
running (executing) the query code, whether in
compiled or interpreted mode, to produce the query
result
 If a runtime error results, an error message is
generated by the runtime database processor

Department of Computer Science 9


Query Processing
 High-level query language such as SQL for
relational DBMSs (RDBMSs) or OQL for object
DBMSs (ODBMSs) is more declarative in nature
because
o It specifies what the intended results of the
query are, rather than identifying the details of
how the result should be obtained.

 Query optimization is thus necessary for queries


that are specified in a high-level query language

Department of Computer Science 10


Relational Algebra Operators
Selection (σ): is unary operator, select rows from a relation
Syntax: σ <selection_condition>(Relation)
σage>21(Student)
Example: Write an RA expression to find all instructor working in
Finance department.
Solution:
Projection (π): It ignores(delete) unwanted columns of given relation
from resulting relation.
Write an RA expression to list instructor name
Syntax: πname(instructor)
Cross-product (𝑅1 × 𝑅2)
it concatenates every tuples of relation R1 with every tuples of
relation R2.
Con…
Set-difference (R1 – R2): returns tuples in relation
R1, but not in relation R2, . It requires two input

Union (𝑅1 𝖴 𝑅2): returns every tuples in relation


relations which are union compatible

Intersection (𝑅1 ∩ 𝑅2): returns tuples both relation


R1 and every tuples in relation R2

R1 and R2 have in common

Department of Computer Science 12


Con…
Join (⋈): it is binary operator. It allows us to combine two

Condition join: Syntax: R1 ⋈C R2 Sometimes called a


relations.

theta-join
Equal-Join: is a special case of condition join where the
condition c contains only equalities.
Syntax: R1⋈equality conditionR2. Result schema similar to cross-
product, but only one copy of fields for which equality is
specified.
Natural Join: Join on all common fields.
Translating SQL Queries into
Relational Algebra
 SQL is the query language that is used in most
commercial RDBMSs
 SQL Query is first translated into an equivalent
extended relational algebra expression-
represented as a query tree data structure that is
then optimized
 SQL queries are decomposed into query blocks
 The basic unit that can be translated into the algebraic
operators and optimized.
 A query block contains a single SELECT-FROM-
WHERE expression, as well as GROUP BY and
HAVING clause if these are part of the block.
 Nested queries within a query are identified as
separate query blocks.
 Aggregate operators in SQL must be included in
the extended algebra( MAX, MIN, SUM, COUNT).
Department of Computer Science 15
Translating SQL Queries into
Relational Algebra
 Consider the following SQL query
SELECT Lname, Fname
FROM EMPLOYEE
WHERE Salary > ( SELECT MAX (Salary)
FROM
EMPLOYEE
WHERE Dno=5 );
 This query retrieves the names of employees
(from any department in the company) who earn
a salary that is greater than the highest salary in
department 5
 The query includes a nested subquery and hence
would be decomposed into two blocks
Department of Computer Science 16
Translating SQL Queries into
Relational Algebra
 The inner block is:
( SELECT MAX (Salary)
FROM EMPLOYEE
WHERE Dno=5 );
 This query retrieves the highest salary in
department 5.
The outer query block is:

SELECT Lname, Fname


FROM EMPLOYEE
WHERE Salary > c
 where c represents the result returned from the
inner block
Department of Computer Science 17
Translating SQL Queries into
Relational Algebra
SELECT LNAME, FNAME
FROM EMPLOYEE
WHERE SALARY > ( SELECT MAX (SALARY)
FROM EMPLOYEE
WHERE DNO = 5);
Outer Inner
query query
block block
SELECT LNAME, FNAME SELECT MAX (SALARY)
FROM EMPLOYEE FROM EMPLOYEE
WHERE SALARY > C WHERE DNO = 5

πLNAME, FNAME (σSALARY>C(EMPLOYEE)) GMAX SALARY (σDNO=5 (EMPLOYEE))


outer block into extended relational
the expression Department of Computer Science
algebra expression 18
Cont’d…
 The query optimizer would then choose an
execution plan for each query block
NB: In the above example, the inner block needs to
be evaluated only once to produce the
maximum salary of employees in department 5,
which is then used as the constant c by the
outer block

Department of Computer Science 19


Transformation of relational expressions
 Two relational algebra expressions are said to be equivalent
if the two expressions generate the same set of tuples.
 Example: Customer Account

Na Balan
CID ANO ANO
me ce
C01 A01 Raj A01 3000

C02 A02 Meet A02 1000

C03 A03 Jay A03 2000

C04 A04 Ram A04 4000

ΠName ( σBalance<2500 (Account) (Customer) )


Customer

Name
Meet
ΠName ( σBalance<2500 (Account Customer) )
Jay
Department of Computer Science 20
Cont’d…
 Combined selection operation can be
divided into sequence of individual selections.
This transformation is called cascade of σ.
 Example:
Customer

CID
AN Na
O me
Balan
ce σANO<3 Λ Balance<2000 Output
C01 1 Raj 3000 (Customer)
AN Na Balan
OUT CID
O me ce
C02 2 Meet 1000
PUT C02 2 Meet 1000
C03 3 Jay 2000

C04 4 Ram 4000


σANO<3 (σBalance<2000
(Customer))

σθ1Λθ2 (E) = σθ1 (σθ2 (E))


Department of Computer Science 21
Cont’d…
 Selection operations are commutative.
 Example:

Customer

CID
AN Na
O me
Balan
ce σANO<3 (σBalance<2000 Output
C01 1 Raj 3000 (Customer))
AN Na Balan
OUT CID
O me ce
C02 2 Meet 1000
PUT C02 2 Meet 1000
C03 3 Jay 2000

C04 4 Ram 4000


σBalance<2000 (σANO<3
(Customer))

σθ1 (σθ2 (E)) = σθ2 (σθ1 (E))


Department of Computer Science 22
Algorithms for External Sorting
 Sorting is one of the primary algorithms used in query
processing
For example, whenever an SQL query specifies an
ORDER BY clause, the query result must be sorted
 External sorting:
Refers to sorting algorithms that are suitable for large
files of records stored on disk that do not fit entirely in
main memory, such as most database files.
 External sorting algorithm uses a sort-merge strategy
 Sort-Merge strategy:
Starts by sorting small subfiles called runs of the main
file and then merges the sorted runs, creating larger
sorted subfiles that are merged in turn.
 Sorting phase:
In the sorting phase, runs (portions or pieces) of the file
that can fit in the available buffer space are read into
main memory, sorted using an internal sorting algorithm,
and written back to disk as temporary sorted subfiles (or
runs) Department of Computer Science 23
External Sort-Merge (Example)
• Blocks=3
24 19 14
24 2
19 24 16
19 3
31 31 19
31 7
33 14 24
33 14
14 16 31
14 14
16 33 33
16 16

16 16 3 16
2
21 21 16 19
3
3 3 21 21
7
2 merg merg 24
14
2 2 e
7 e 31
create 7 7 pass- 16 pass-
14 runs 1 2 33
21
14 14
initial sorted
relation runs runs output
Department of Computer Science 24
Algorithms for SELECT
Operation
 There are many algorithms for executing a
SELECT operation, which is basically a search
operation to locate the records in a disk file that
satisfy a certain condition
Examples:
• (OP1): σSsn='123456789' (EMPLOYEE)
• (OP2): σDNUMBER>5(DEPARTMENT)
• (OP3): σDno=5(EMPLOYEE)
• (OP4): σDno=5 AND SALARY>30000 AND
SEX=‘F’(EMPLOYEE)
• (OP5): σESSN=‘123456789’ AND
PNO=10(WORKS_ON)

Department of Computer Science 25


Implementing the JOIN
Operation
 The JOIN operation is one of the most time-
consuming operations in query processing
 Two–way join: a join on two files
 e.g.
 multi-way joins: joins involving more than two
files.
 e.g.

 In two way join above A and B are the join


attributes
• Which should be domain-compatible attributes of
R and S, respectively.
 We illustrate four of the most common techniques
for performing such a join, using the following
sample operations below.
Department of Computer Science 26
Implementing the JOIN
Operation
 Examples

Methods for implementing joins:


J1-Nested-loop join (Nested Block Join):
 This is the default (brut force) algorithm, as it
does not require any special access paths
on either file in the join
 For each record t in R (outer loop), retrieve
every record s from S (inner loop) and test
whether the two records satisfy the join
condition t[A] = s[B].

Department of Computer Science 27


Sum (Nested loop join)
 Assuming worst case memory availability and
the following given statistics for the relations
customer and depositor
• Number of records of customer: 10,000
(ncustomer)
• Number of records of depositor: 5,000
(ndepositor)
• Number of blocks of customer: 400 (bcustomer)

• Number of blocks of depositor: 100 (bdepositor)


 Estimate the cost
1. with depositor as outer relation
2. with customer as outer relation
Department of Computer Science 28
Sum (Nested loop join)

(Worst case)
1. with depositor as outer relation
No. of blocks access = ndepositor * bcustomer
+ bdepositor
= 5000 * 400 + 100
= 2000100
2. with customer as outer relation
No. of blocks access = ncustomer * bdepositor
+ bcustomer
= 10000 * 100 + 400
= 1000400

Department of Computer Science 29


Sum (Nested loop join)
 Assuming best case memory availability and the
following given statistics for the relations customer
and depositor
• Number of records of customer: 10,000 (ncustomer)
• Number of records of depositor: 5,000 (ndepositor)
• Number of blocks of customer: 400 (bcustomer)
• Number of blocks of depositor: 100 (bdepositor)
 Estimate the cost
1. with customer as outer relation
No. of blocks access = bdepositor + bcustomer
= 100 + 400
= 500

Department of Computer Science 30


Cont’d…
J2-Index-based Single-loop join (Using an access
structure to retrieve the matching records):
 If an index (or hash key) exists for one of the two join
attributes- say, attribute B of file S-retrieve each record
t in R (loop over file R), one at a time, and then use the
access structure to retrieve directly all matching
records s from S that satisfy s[B] = t[A]
J3-Sort-merge join:
 If the records of R and S are physically sorted (ordered)
by value of the join attributes A and B, respectively, we
can implement the join in the most efficient way
possible.
 Both files are scanned in order of the join attributes,
matching the records that have the same values for A
and B.
 In this method, the records of each file are scanned
only once each for matching with the other file-unless
both A and B are non-key attributes,
Department of Computer Science in which case 31
the method needs to be modified slightly.
Cont’d…
J4-Hash-join:
 The records of files R and S are both hashed to
the same hash file, using the same hashing
function on the join attributes A of R and B of S
as hash keys.
 A single pass through the file with fewer records
(say, R) hashes its records to the hash file
buckets.
 A single pass through the other file (S) then
hashes each of its records to the appropriate
bucket, where the record is combined with all
matching records from R.
 hash bucket and probing phase are two
processing for hash join

Department of Computer Science 32


Cost of computing for all joins
 R is the outer and S is the inner relation of the
join.
• Number of records of R: (NR)
• Number of records of S: (NS)
• Number of blocks of R: (BR)
• Number of blocks of S: (B S)
Join Worst Case Best Case
Nested-Loop Join BR + NR ∗ BS BR + B S
Block Nested-Loop BR + B R ∗ B S BR + B S
Join
Index Nested-Loop BR + NR ∗ c
Join
Merge Join BR + B S
Hash-Join 3 ∗ (BR + BS)
• c is the cost of a single selection on S using the join condition.

Department of Computer Science 33


Algorithms for PROJECT
operation
 Algorithm for PROJECT operations  <attribute list>(R) is
straight forward to implement
 If <attribute list> has a key of relation R, extract
all tuples from R with only the values for the
attributes in <attribute list>.
 If <attribute list> does NOT include a key of
relation R, duplicated tuples must be removed
from the results.
 This can be done by sorting the result of the
operation and then eliminating duplicate tuples,
which appear consecutively after sorting
 Methods to remove duplicate tuples
1. Sorting: sorting the result of the operation and
then eliminating duplicate tuples, which appear
consecutively after sorting
2. Hashing: each record is hashed and inserted into
a bucket of the hash file in memory, it is checked
against those records already in the bucket; if it is
a duplicate, it is not inserted in the bucket.
Department of Computer Science 34
Algorithms for SET operations
 Set operations:
o UNION, INTERSECTION, SET DIFFERENCE
and CARTESIAN PRODUCT

 CARTESIAN PRODUCT of relations R and S


include all possible combinations of records from
R and S.
o The attribute of the result include all attributes
of R and S.
 Cost analysis of CARTESIAN PRODUCT
o If R has n records and j attributes and S has m
records and k attributes, the result relation will
have n*m records and j+k attributes
o CARTESIAN PRODUCT operation is very
expensive and should be avoided if possible

Department of Computer Science 35


Algorithms for SET operations
 UNION
o Sort the two relations on the same attributes.
o Scan and merge both sorted files concurrently,
whenever the same tuple exists in both
relations, only one is kept in the merged results.
 INTERSECTION
o Sort the two relations on the same attributes.
o Scan and merge both sorted files concurrently,
keep in the merged results only those tuples
that appear in both relations.
 SET DIFFERENCE R-S
o Keep in the merged results only those tuples
that appear in relation R but not in relation S.
o The result of this operation, denoted by R - S, is
a relation that includes all tuples that are in R
but not in S
Department of Computer Science 36
Implementing Aggregate
Operations
 Aggregate Operators:
o MIN, MAX, SUM, COUNT and AVG
 Options to implement aggregate operators:
o Table Scan
o Index
 Example:
SELECT MAX (SALARY)
FROM EMPLOYEE;
 If an (ascending) index on SALARY exists for the
employee relation, then the optimizer could
decide on traversing the index for the largest
value, which would entail following the right most
pointer in each index node from the root to a leaf.

Department of Computer Science 37


Implementing Aggregate
Operations (Cont’d.)
 SUM, COUNT and AVG
 For a dense index (each record has one index entry):
o Apply the associated computation to the values in
the index.
 For a non-dense index:
o Actual number of records associated with each index
entry must be used for a correct computation
o This can be done if the number of records associated
with each value in the index is stored in each index
entry.
 With GROUP BY: the aggregate operator must be
applied separately to each group of tuples.
oUse sorting or hashing on the group attributes to
partition the file into the appropriate groups;
oComputes the aggregate function for the tuples in
each group.
Department of Computer Science 38
Implementing Outer Join
 Outer Join Operators:
 LEFT OUTER JOIN
 RIGHT OUTER JOIN
 FULL OUTER JOIN
 The full outer join produces a result which is
equivalent to the union of the results of the left
and right outer joins.
 Example:
SELECT FNAME, DNAME
FROM (EMPLOYEE LEFT
OUTER JOIN DEPARTMENT ON
DNO = DNUMBER);
 Note: The result of this query is a table of
employee names and their associated
departments. It is similar to a regular join result,
with the exception that if an employee does not
have an associated department, the employee's
Department of Computer Science 39
Implementing Outer Join
(Cont’d.)
 Modifying Join Algorithms:
 Nested Loop or Sort-Merge joins can be
modified to implement outer join. E.g.,

 For left outer join, use the left relation as


outer relation and construct result from every
tuple in the left relation
 If there is a match, the concatenated tuple is
saved in the result
 However, if an outer tuple does not match,
then the tuple is still included in the result
but is padded with a null value(s)

Department of Computer Science 40


Implementing Outer Join
(Cont’d.)
 Theoretically, outer join can also be computed by
executing a combination of relational Algebra
operators.
 Implement the previous left outer join example

1. Compute the (inner) JOIN of the EMPLOYEE and


DEPARTMENT tables
• TEMP1FNAME,DNAME(EMPLOYEE DNO=DNUMBER
DEPARTMENT)
2. Find the EMPLOYEEs that do not appear in the
(inner) JOIN
• TEMP2   FNAME (EMPLOYEE) - FNAME (Temp1)
3. Pad each tuple in TEMP2 with a null DNAME field

• TEMP2  TEMP2 x 'null'


4. UNION the temporary tables to produce the LEFT
OUTER JOIN Department of Computer Science 41

• RESULT  TEMP1 υ TEMP2


Evaluation of expressions
Expression may contain more than one
operations, solving expression will be difficult if
it contains more than one operations.

ΠCust_Name ( σBalance<2500 (account) (customer) )

To evaluate such expression we need to


evaluate each operations one by one in
appropriate order.
Two methods for evaluating an expression
carrying multiple operations are:
 Materialization
 Pipelining
Department of Computer Science 42
Cont’d….

ΠCust_Name ( σBalance<2500 (account) (customer) )


ΠCust_Name
Bottom to top
Execution

σBalance<2500 (customer)

(account)

Department of Computer Science 43


Materialization

 Materialization evaluates the expression tree of


the relational algebra operation from the bottom
and performs the innermost or leaf-level
operations first.
 The intermediate result of each operation is
materialized (store in temporary relation) and
becomes input for subsequent (next) operations.
 The cost of materialization is the sum of the
individual operations plus the cost of writing the
intermediate results to disk.
 The problem with materialization is that
• it creates lots of temporary relations
• it performs lots of I/O operations

Department of Computer Science 44


Pipelining

 In pipelining, operations form a queue, and results


are passed from one operation to another as they
are calculated.
 To reduce number of intermediate temporary
relations, we pass results of one operation to the
next operation in the pipelines.
 Combining operations into a pipeline eliminates the
cost of reading and writing temporary relations.
 Pipelines can be executed in two ways:
• Demand driven (System makes repeated requests
for tuples from the operation at the top of pipeline)
• Producer driven (Operations do not wait for
request to produce tuples, but generate the tuples
eagerly.)

Department of Computer Science 45


Query Optimization

 Exhaustive Search Optimization


• Generates all possible query plans and then the best
plan is selected.
• It provides best solution.
 Heuristic Based Optimization
• Heuristic based optimization uses rule-based
optimization approaches for query optimization.
• Performs select and project operations before join
operations. This is done by moving the select and
project operations down the query tree. This reduces
the number of tuples available for join.
• Avoid cross-product operation because they result in
very large-sized intermediate tables.
• This algorithms do not necessarily produce the best
query plan.

Department of Computer Science 46


Using Heuristics in Query
Optimization (1)
 Process for heuristics optimization
1.The parser of a high-level query generates an
initial internal representation;
2.Apply heuristics rules to optimize the internal
representation.
3.A query execution plan is generated to
execute groups of operations based on the
access paths available on the files involved in
the query.
 The main heuristic is to apply first the
operations that reduce the size of intermediate
results
• E.g., Apply SELECT and PROJECT operations
before applying the JOIN or other binary
operations.
• The SELECT and PROJECT operations reduce
Department of Computer Science 47
Using Heuristics in Query
Optimization (2)
 Query tree and query graph can be used as the basis
for the data structures that are used for internal
representation of queries
 Query tree:
 A tree data structure that corresponds to a relational
algebra expression
 It represents the input relations of the query as leaf
nodes of the tree, and represents the relational
algebra operations as internal nodes
 An execution of the query tree consists of executing an
internal node operation whenever its operands are
available and then replacing that internal node by the
relation that results from executing the operation.
 The order of execution of operations starts at the leaf
nodes, which represents the input database relations for
the query, and ends at the root node, which represents
the final operation of the query
 Query graph:
 A graph data structure that corresponds to a
Department of Computer Science 48
Using Heuristics in Query
Optimization (3)
 Example:
 For every project located in ‘Stafford’, retrieve the
project number, the controlling department number
and the department manager’s last name, address
and birthdate.
• Relation algebra:
PNUMBER, DNUM, LNAME, ADDRESS, BDATE
(((PLOCATION=‘STAFFORD’(PROJECT))
DNUM=DNUMBER (DEPARTMENT))
MGRSSN=SSN (EMPLOYEE))
 SQL query:
Q2: SELECT P.NUMBER,P.DNUM,E.LNAME,
E.ADDRESS, E.BDATE
FROM PROJECT AS P,DEPARTMENT AS
D,
EMPLOYEE AS E
WHEREDepartment of Computer
P.DNUM=D.DNUMBER
Science AND 49
Query trees for query Q2

Department of Computer Science 50


Using Heuristics in Query
Optimization (5)
 Heuristic Optimization of Query Trees:
 The same query could correspond to many
different relational algebra expressions and
hence many different query trees.
 The task of heuristic optimization of query trees
is to find a final query tree that is efficient to
execute
 Example:
Q: SELECT LNAME
FROM EMPLOYEE, WORKS_ON, PROJECT
WHERE PNAME = ‘AQUARIUS’ AND
PNMUBER=PNO AND
ESSN=SSN AND
BDATE > ‘1957-12-31’;
Department of Computer Science 51
Using Heuristics in Query
Optimization (6)
 Steps in converting a query tree during
heuristic optimization:
(a) Initial (canonical) query tree for SQL
query Q.
(b) Moving SELECT operations down the
query tree.
(c) Applying the more restrictive SELECT
operation first.
(d) Replacing CARTESIAN PRODUCT and
SELECT with JOIN operations.
(e) Moving PROJECT operations down the
query tree

 Applying these steps to query Q are shown in the


Department of Computer Science 52
Using Heuristics in Query
Optimization (7)

(a) Initial
(canonical) query
tree for SQL query
Q.

(b) Moving
SELECT
operations down
the query tree.

Department of Computer Science 53


Using Heuristics in Query
Optimization (8)

(c) Applying the more


restrictive SELECT
operation first.

Department of Computer Science 54


Using Heuristics in Query
Optimization (9)
(d) Replacing
CARTESIAN
PRODUCT and
SELECT with JOIN
operations.

(e) Moving
PROJECT
operations down
the query tree

Department of Computer Science 55


Using Selectivity and Cost
Estimates in Query Optimization
(1)
Cost-based query optimization:
 Estimate and compare the costs of executing a
query using different execution strategies and
choose the strategy with the lowest cost
estimate

 Issues
 Cost function
 Number of execution strategies to be
considered

Department of Computer Science 56


Using Selectivity and Cost
Estimates in Query Optimization
(2)
Cost is generally measured as the total time required to
execute a statement/query.
Cost Components for Query Execution
1. Access cost to secondary storage (Disk access)
2. Storage cost
3. Computation cost
4. Memory usage cost
5. Communication cost
Note: Different database systems may focus on different cost
components.
• Disk accesses (time to process a data request and retrieve
the required data from the storage device)
• Disk access is the predominant (major) cost, since disk
access is slow as compared to in-memory operation.
• Cost to write a block is greater than cost to read a
block because data is read back after being written to
ensure that the write was successful.
Department of Computer Science 57
Cont’d…
• Access cost to secondary storage: This is the
cost of transferring (reading and writing) data
blocks between secondary disk storage and main
memory buffers.
• Disk storage cost: This is the cost of storing on
disk any intermediate files that are generated by an
execution strategy for the query.
• Computation cost: This is the cost of performing
in-memory operations on the records within the
data buffers during query execution. Such
operations include searching for and sorting
records, merging records for a join or a sort
operation, and performing computations on field
values. This is also known as CPU (central
processing unit) cost.
Department of Computer Science 58
Cont’d…

• Memory usage cost: This is the cost


pertaining to the number of main memory
buffers needed during query execution
• Communication cost: This is the cost of
shipping the query and its results from the
database site to the site or terminal where the
query originated.

Department of Computer Science 59


Semantic Query Optimization
 Uses constraints specified on the database schema
in order to modify one query into another query
that is more efficient to execute.
 Consider the following SQL query,
SELECT E.LNAME, M.LNAME
FROM EMPLOYEE AS E, EMPLOYEE AS M
WHERE E.SUPERSSN=M.SSN AND E.SALARY>M.SALARY

 Explanation:
 Suppose that we had a constraint on the
database schema that stated that no employee
can earn more than his or her direct supervisor.
If the semantic query optimizer checks for the
existence of this constraint, it need not execute
the query at all because it knows that the result
of the query will be empty. Techniques known as
theorem proving can be used for this purpose
Department of Computer Science 60

You might also like