0% found this document useful (0 votes)

60 views60 pages

Chapter Two Query Processing

kkjkjh

Uploaded by

shiferachala778

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views60 pages

Chapter Two Query Processing

kkjkjh

Uploaded by

shiferachala778

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 60

Chapter Two

Query Processing and

Optimization

Department of Computer Science 1

Outline

What does query processing mean?

Steps in query processing
Translating SQL queries into relational algebra
Basic algorithms for executing query operations
Evaluation of expressions
Using heuristic in query optimization
Using selectivity and cost estimates in query
optimization
Department of Computer Science 2
What Does Query processing Mean?

Query processing refers to the range of activities

involved in extracting data from a database
It is the entire process or activity which involves
query translation into low level instructions, query
optimization to save resources, cost estimation or
evaluation of query, and extraction of data from the
database.
The main goal of query processing is to find an
efficient query execution plan for a given SQL
query which would minimize the cost considerably,
especially time.
Query Processing
 Query Processing is a procedure of converting a query
written in high-level language (Ex. SQL) into a
correct and efficient execution plan expressed in low-
level language, which is used for data manipulation.
 A query expressed in a high-level query language such
as SQL must first be
 Scanned
 Parsed and
 Validated
 Scanner identifies the query tokens such as
 SQL keywords
 Attribute names and
 Relation names that appear in the text of the query
Department of Computer Science 4
Con…
Parser checks the query syntax to determine whether it is
formulated according to the syntax rules (rules of grammar) of
the query language

Department of Computer Science 5

Query Processing
 Validated by checking that all attribute and
relation names are valid and
o Semantically meaningful names in the schema of
the particular database being queried
 An internal representation of the query is then
created, usually as a tree data structure called a
query tree
 It is also possible to represent the query using a
graph data structure called a query graph
o Directed acyclic graph (DAG)
 The DBMS must then devise an execution
strategy or query plan for retrieving the
results of the query from the database files.

Department of Computer Science 6

Query Processing
The query processing involves three basic steps.
Parsing and translation
Optimization
Evaluation
Parsing and translation: Parser checks the syntax and
verifies the user’s privilege to execute the query, the
relations and the attributes which are used in the query.
If written SQL is valid, the translator converts given
SQL query into respective relational algebra
Optimization: it uses statistical data stored as part of
data dictionary like information about size of the table,
the length of the records, the indexes created on the table
Different query execution plans for a given query, can
have different cost.
It is the responsibility of query optimizer to generate
least costly plan and place it in evaluation engine.
Evaluation: takes a query-execution plan, executes
that plan, and returns the answers to the query.
Query Processing
 Query optimizer module has the task of producing
a good execution plan
 Code generator generates the code to execute that
plan
 Runtime database processor has the task of
running (executing) the query code, whether in
compiled or interpreted mode, to produce the query
result
 If a runtime error results, an error message is
generated by the runtime database processor

Department of Computer Science 9

Query Processing
 High-level query language such as SQL for
relational DBMSs (RDBMSs) or OQL for object
DBMSs (ODBMSs) is more declarative in nature
because
o It specifies what the intended results of the
query are, rather than identifying the details of
how the result should be obtained.

 Query optimization is thus necessary for queries

that are specified in a high-level query language

Department of Computer Science 10

Relational Algebra Operators
Selection (σ): is unary operator, select rows from a relation
Syntax: σ <selection_condition>(Relation)
σage>21(Student)
Example: Write an RA expression to find all instructor working in
Finance department.
Solution:
Projection (π): It ignores(delete) unwanted columns of given relation
from resulting relation.
Write an RA expression to list instructor name
Syntax: πname(instructor)
Cross-product (𝑅1 × 𝑅2)
it concatenates every tuples of relation R1 with every tuples of
relation R2.
Con…
Set-difference (R1 – R2): returns tuples in relation
R1, but not in relation R2, . It requires two input

Union (𝑅1 𝖴 𝑅2): returns every tuples in relation

relations which are union compatible

Intersection (𝑅1 ∩ 𝑅2): returns tuples both relation

R1 and every tuples in relation R2

R1 and R2 have in common

Department of Computer Science 12

Con…
Join (⋈): it is binary operator. It allows us to combine two

Condition join: Syntax: R1 ⋈C R2 Sometimes called a

relations.

theta-join
Equal-Join: is a special case of condition join where the
condition c contains only equalities.
Syntax: R1⋈equality conditionR2. Result schema similar to cross-
product, but only one copy of fields for which equality is
specified.
Natural Join: Join on all common fields.
Translating SQL Queries into
Relational Algebra
 SQL is the query language that is used in most
commercial RDBMSs
 SQL Query is first translated into an equivalent
extended relational algebra expression-
represented as a query tree data structure that is
then optimized
 SQL queries are decomposed into query blocks
 The basic unit that can be translated into the algebraic
operators and optimized.
 A query block contains a single SELECT-FROM-
WHERE expression, as well as GROUP BY and
HAVING clause if these are part of the block.
 Nested queries within a query are identified as
separate query blocks.
 Aggregate operators in SQL must be included in
the extended algebra( MAX, MIN, SUM, COUNT).
Department of Computer Science 15
Translating SQL Queries into
Relational Algebra
 Consider the following SQL query
SELECT Lname, Fname
FROM EMPLOYEE
WHERE Salary > ( SELECT MAX (Salary)
FROM
EMPLOYEE
WHERE Dno=5 );
 This query retrieves the names of employees
(from any department in the company) who earn
a salary that is greater than the highest salary in
department 5
 The query includes a nested subquery and hence
would be decomposed into two blocks
Department of Computer Science 16
Translating SQL Queries into
Relational Algebra
 The inner block is:
( SELECT MAX (Salary)
FROM EMPLOYEE
WHERE Dno=5 );
 This query retrieves the highest salary in
department 5.
The outer query block is:

SELECT Lname, Fname

FROM EMPLOYEE
WHERE Salary > c
 where c represents the result returned from the
inner block
Department of Computer Science 17
Translating SQL Queries into
Relational Algebra
SELECT LNAME, FNAME
FROM EMPLOYEE
WHERE SALARY > ( SELECT MAX (SALARY)
FROM EMPLOYEE
WHERE DNO = 5);
Outer Inner
query query
block block
SELECT LNAME, FNAME SELECT MAX (SALARY)
FROM EMPLOYEE FROM EMPLOYEE
WHERE SALARY > C WHERE DNO = 5

πLNAME, FNAME (σSALARY>C(EMPLOYEE)) GMAX SALARY (σDNO=5 (EMPLOYEE))

outer block into extended relational
the expression Department of Computer Science
algebra expression 18
Cont’d…
 The query optimizer would then choose an
execution plan for each query block
NB: In the above example, the inner block needs to
be evaluated only once to produce the
maximum salary of employees in department 5,
which is then used as the constant c by the
outer block

Department of Computer Science 19

Transformation of relational expressions
 Two relational algebra expressions are said to be equivalent
if the two expressions generate the same set of tuples.
 Example: Customer Account

Na Balan
CID ANO ANO
me ce
C01 A01 Raj A01 3000

C02 A02 Meet A02 1000

C03 A03 Jay A03 2000

C04 A04 Ram A04 4000

ΠName ( σBalance<2500 (Account) (Customer) )

Customer

Name
Meet
ΠName ( σBalance<2500 (Account Customer) )
Jay
Department of Computer Science 20
Cont’d…
 Combined selection operation can be
divided into sequence of individual selections.
This transformation is called cascade of σ.
 Example:
Customer

CID
AN Na
O me
Balan
ce σANO<3 Λ Balance<2000 Output
C01 1 Raj 3000 (Customer)
AN Na Balan
OUT CID
O me ce
C02 2 Meet 1000
PUT C02 2 Meet 1000
C03 3 Jay 2000

C04 4 Ram 4000

σANO<3 (σBalance<2000
(Customer))

σθ1Λθ2 (E) = σθ1 (σθ2 (E))

Department of Computer Science 21
Cont’d…
 Selection operations are commutative.
 Example:

Customer

CID
AN Na
O me
Balan
ce σANO<3 (σBalance<2000 Output
C01 1 Raj 3000 (Customer))
AN Na Balan
OUT CID
O me ce
C02 2 Meet 1000
PUT C02 2 Meet 1000
C03 3 Jay 2000

C04 4 Ram 4000

σBalance<2000 (σANO<3
(Customer))

σθ1 (σθ2 (E)) = σθ2 (σθ1 (E))

Department of Computer Science 22
Algorithms for External Sorting
 Sorting is one of the primary algorithms used in query
processing
For example, whenever an SQL query specifies an
ORDER BY clause, the query result must be sorted
 External sorting:
Refers to sorting algorithms that are suitable for large
files of records stored on disk that do not fit entirely in
main memory, such as most database files.
 External sorting algorithm uses a sort-merge strategy
 Sort-Merge strategy:
Starts by sorting small subfiles called runs of the main
file and then merges the sorted runs, creating larger
sorted subfiles that are merged in turn.
 Sorting phase:
In the sorting phase, runs (portions or pieces) of the file
that can fit in the available buffer space are read into
main memory, sorted using an internal sorting algorithm,
and written back to disk as temporary sorted subfiles (or
runs) Department of Computer Science 23
External Sort-Merge (Example)
• Blocks=3
24 19 14
24 2
19 24 16
19 3
31 31 19
31 7
33 14 24
33 14
14 16 31
14 14
16 33 33
16 16

16 16 3 16
2
21 21 16 19
3
3 3 21 21
7
2 merg merg 24
14
2 2 e
7 e 31
create 7 7 pass- 16 pass-
14 runs 1 2 33
21
14 14
initial sorted
relation runs runs output
Department of Computer Science 24
Algorithms for SELECT
Operation
 There are many algorithms for executing a
SELECT operation, which is basically a search
operation to locate the records in a disk file that
satisfy a certain condition
Examples:
• (OP1): σSsn='123456789' (EMPLOYEE)
• (OP2): σDNUMBER>5(DEPARTMENT)
• (OP3): σDno=5(EMPLOYEE)
• (OP4): σDno=5 AND SALARY>30000 AND
SEX=‘F’(EMPLOYEE)
• (OP5): σESSN=‘123456789’ AND
PNO=10(WORKS_ON)

Department of Computer Science 25

Implementing the JOIN
Operation
 The JOIN operation is one of the most time-
consuming operations in query processing
 Two–way join: a join on two files
 e.g.
 multi-way joins: joins involving more than two
files.
 e.g.

 In two way join above A and B are the join

attributes
• Which should be domain-compatible attributes of
R and S, respectively.
 We illustrate four of the most common techniques
for performing such a join, using the following
sample operations below.
Department of Computer Science 26
Implementing the JOIN
Operation
 Examples

Methods for implementing joins:

J1-Nested-loop join (Nested Block Join):
 This is the default (brut force) algorithm, as it
does not require any special access paths
on either file in the join
 For each record t in R (outer loop), retrieve
every record s from S (inner loop) and test
whether the two records satisfy the join
condition t[A] = s[B].

Department of Computer Science 27

Sum (Nested loop join)
 Assuming worst case memory availability and
the following given statistics for the relations
customer and depositor
• Number of records of customer: 10,000
(ncustomer)
• Number of records of depositor: 5,000
(ndepositor)
• Number of blocks of customer: 400 (bcustomer)

• Number of blocks of depositor: 100 (bdepositor)

 Estimate the cost
1. with depositor as outer relation
2. with customer as outer relation
Department of Computer Science 28
Sum (Nested loop join)

(Worst case)
1. with depositor as outer relation
No. of blocks access = ndepositor * bcustomer
+ bdepositor
= 5000 * 400 + 100
= 2000100
2. with customer as outer relation
No. of blocks access = ncustomer * bdepositor
+ bcustomer
= 10000 * 100 + 400
= 1000400

Department of Computer Science 29

Sum (Nested loop join)
 Assuming best case memory availability and the
following given statistics for the relations customer
and depositor
• Number of records of customer: 10,000 (ncustomer)
• Number of records of depositor: 5,000 (ndepositor)
• Number of blocks of customer: 400 (bcustomer)
• Number of blocks of depositor: 100 (bdepositor)
 Estimate the cost
1. with customer as outer relation
No. of blocks access = bdepositor + bcustomer
= 100 + 400
= 500

Department of Computer Science 30

Cont’d…
J2-Index-based Single-loop join (Using an access
structure to retrieve the matching records):
 If an index (or hash key) exists for one of the two join
attributes- say, attribute B of file S-retrieve each record
t in R (loop over file R), one at a time, and then use the
access structure to retrieve directly all matching
records s from S that satisfy s[B] = t[A]
J3-Sort-merge join:
 If the records of R and S are physically sorted (ordered)
by value of the join attributes A and B, respectively, we
can implement the join in the most efficient way
possible.
 Both files are scanned in order of the join attributes,
matching the records that have the same values for A
and B.
 In this method, the records of each file are scanned
only once each for matching with the other file-unless
both A and B are non-key attributes,
Department of Computer Science in which case 31
the method needs to be modified slightly.
Cont’d…
J4-Hash-join:
 The records of files R and S are both hashed to
the same hash file, using the same hashing
function on the join attributes A of R and B of S
as hash keys.
 A single pass through the file with fewer records
(say, R) hashes its records to the hash file
buckets.
 A single pass through the other file (S) then
hashes each of its records to the appropriate
bucket, where the record is combined with all
matching records from R.
 hash bucket and probing phase are two
processing for hash join

Department of Computer Science 32

Cost of computing for all joins
 R is the outer and S is the inner relation of the
join.
• Number of records of R: (NR)
• Number of records of S: (NS)
• Number of blocks of R: (BR)
• Number of blocks of S: (B S)
Join Worst Case Best Case
Nested-Loop Join BR + NR ∗ BS BR + B S
Block Nested-Loop BR + B R ∗ B S BR + B S
Join
Index Nested-Loop BR + NR ∗ c
Join
Merge Join BR + B S
Hash-Join 3 ∗ (BR + BS)
• c is the cost of a single selection on S using the join condition.

Department of Computer Science 33

Algorithms for PROJECT
operation
 Algorithm for PROJECT operations  <attribute list>(R) is
straight forward to implement
 If <attribute list> has a key of relation R, extract
all tuples from R with only the values for the
attributes in <attribute list>.
 If <attribute list> does NOT include a key of
relation R, duplicated tuples must be removed
from the results.
 This can be done by sorting the result of the
operation and then eliminating duplicate tuples,
which appear consecutively after sorting
 Methods to remove duplicate tuples
1. Sorting: sorting the result of the operation and
then eliminating duplicate tuples, which appear
consecutively after sorting
2. Hashing: each record is hashed and inserted into
a bucket of the hash file in memory, it is checked
against those records already in the bucket; if it is
a duplicate, it is not inserted in the bucket.
Department of Computer Science 34
Algorithms for SET operations
 Set operations:
o UNION, INTERSECTION, SET DIFFERENCE
and CARTESIAN PRODUCT

 CARTESIAN PRODUCT of relations R and S

include all possible combinations of records from
R and S.
o The attribute of the result include all attributes
of R and S.
 Cost analysis of CARTESIAN PRODUCT
o If R has n records and j attributes and S has m
records and k attributes, the result relation will
have n*m records and j+k attributes
o CARTESIAN PRODUCT operation is very
expensive and should be avoided if possible

Department of Computer Science 35

Algorithms for SET operations
 UNION
o Sort the two relations on the same attributes.
o Scan and merge both sorted files concurrently,
whenever the same tuple exists in both
relations, only one is kept in the merged results.
 INTERSECTION
o Sort the two relations on the same attributes.
o Scan and merge both sorted files concurrently,
keep in the merged results only those tuples
that appear in both relations.
 SET DIFFERENCE R-S
o Keep in the merged results only those tuples
that appear in relation R but not in relation S.
o The result of this operation, denoted by R - S, is
a relation that includes all tuples that are in R
but not in S
Department of Computer Science 36
Implementing Aggregate
Operations
 Aggregate Operators:
o MIN, MAX, SUM, COUNT and AVG
 Options to implement aggregate operators:
o Table Scan
o Index
 Example:
SELECT MAX (SALARY)
FROM EMPLOYEE;
 If an (ascending) index on SALARY exists for the
employee relation, then the optimizer could
decide on traversing the index for the largest
value, which would entail following the right most
pointer in each index node from the root to a leaf.

Department of Computer Science 37

Implementing Aggregate
Operations (Cont’d.)
 SUM, COUNT and AVG
 For a dense index (each record has one index entry):
o Apply the associated computation to the values in
the index.
 For a non-dense index:
o Actual number of records associated with each index
entry must be used for a correct computation
o This can be done if the number of records associated
with each value in the index is stored in each index
entry.
 With GROUP BY: the aggregate operator must be
applied separately to each group of tuples.
oUse sorting or hashing on the group attributes to
partition the file into the appropriate groups;
oComputes the aggregate function for the tuples in
each group.
Department of Computer Science 38
Implementing Outer Join
 Outer Join Operators:
 LEFT OUTER JOIN
 RIGHT OUTER JOIN
 FULL OUTER JOIN
 The full outer join produces a result which is
equivalent to the union of the results of the left
and right outer joins.
 Example:
SELECT FNAME, DNAME
FROM (EMPLOYEE LEFT
OUTER JOIN DEPARTMENT ON
DNO = DNUMBER);
 Note: The result of this query is a table of
employee names and their associated
departments. It is similar to a regular join result,
with the exception that if an employee does not
have an associated department, the employee's
Department of Computer Science 39
Implementing Outer Join
(Cont’d.)
 Modifying Join Algorithms:
 Nested Loop or Sort-Merge joins can be
modified to implement outer join. E.g.,

 For left outer join, use the left relation as

outer relation and construct result from every
tuple in the left relation
 If there is a match, the concatenated tuple is
saved in the result
 However, if an outer tuple does not match,
then the tuple is still included in the result
but is padded with a null value(s)

Department of Computer Science 40

Implementing Outer Join
(Cont’d.)
 Theoretically, outer join can also be computed by
executing a combination of relational Algebra
operators.
 Implement the previous left outer join example

1. Compute the (inner) JOIN of the EMPLOYEE and

DEPARTMENT tables
• TEMP1FNAME,DNAME(EMPLOYEE DNO=DNUMBER
DEPARTMENT)
2. Find the EMPLOYEEs that do not appear in the
(inner) JOIN
• TEMP2   FNAME (EMPLOYEE) - FNAME (Temp1)
3. Pad each tuple in TEMP2 with a null DNAME field

• TEMP2  TEMP2 x 'null'

4. UNION the temporary tables to produce the LEFT
OUTER JOIN Department of Computer Science 41

• RESULT  TEMP1 υ TEMP2

Evaluation of expressions
Expression may contain more than one
operations, solving expression will be difficult if
it contains more than one operations.

ΠCust_Name ( σBalance<2500 (account) (customer) )

To evaluate such expression we need to

evaluate each operations one by one in
appropriate order.
Two methods for evaluating an expression
carrying multiple operations are:
 Materialization
 Pipelining
Department of Computer Science 42
Cont’d….

ΠCust_Name ( σBalance<2500 (account) (customer) )

ΠCust_Name
Bottom to top
Execution

σBalance<2500 (customer)

(account)

Department of Computer Science 43

Materialization

 Materialization evaluates the expression tree of

the relational algebra operation from the bottom
and performs the innermost or leaf-level
operations first.
 The intermediate result of each operation is
materialized (store in temporary relation) and
becomes input for subsequent (next) operations.
 The cost of materialization is the sum of the
individual operations plus the cost of writing the
intermediate results to disk.
 The problem with materialization is that
• it creates lots of temporary relations
• it performs lots of I/O operations

Department of Computer Science 44

Pipelining

 In pipelining, operations form a queue, and results

are passed from one operation to another as they
are calculated.
 To reduce number of intermediate temporary
relations, we pass results of one operation to the
next operation in the pipelines.
 Combining operations into a pipeline eliminates the
cost of reading and writing temporary relations.
 Pipelines can be executed in two ways:
• Demand driven (System makes repeated requests
for tuples from the operation at the top of pipeline)
• Producer driven (Operations do not wait for
request to produce tuples, but generate the tuples
eagerly.)

Department of Computer Science 45

Query Optimization

 Exhaustive Search Optimization

• Generates all possible query plans and then the best
plan is selected.
• It provides best solution.
 Heuristic Based Optimization
• Heuristic based optimization uses rule-based
optimization approaches for query optimization.
• Performs select and project operations before join
operations. This is done by moving the select and
project operations down the query tree. This reduces
the number of tuples available for join.
• Avoid cross-product operation because they result in
very large-sized intermediate tables.
• This algorithms do not necessarily produce the best
query plan.

Department of Computer Science 46

Using Heuristics in Query
Optimization (1)
 Process for heuristics optimization
1.The parser of a high-level query generates an
initial internal representation;
2.Apply heuristics rules to optimize the internal
representation.
3.A query execution plan is generated to
execute groups of operations based on the
access paths available on the files involved in
the query.
 The main heuristic is to apply first the
operations that reduce the size of intermediate
results
• E.g., Apply SELECT and PROJECT operations
before applying the JOIN or other binary
operations.
• The SELECT and PROJECT operations reduce
Department of Computer Science 47
Using Heuristics in Query
Optimization (2)
 Query tree and query graph can be used as the basis
for the data structures that are used for internal
representation of queries
 Query tree:
 A tree data structure that corresponds to a relational
algebra expression
 It represents the input relations of the query as leaf
nodes of the tree, and represents the relational
algebra operations as internal nodes
 An execution of the query tree consists of executing an
internal node operation whenever its operands are
available and then replacing that internal node by the
relation that results from executing the operation.
 The order of execution of operations starts at the leaf
nodes, which represents the input database relations for
the query, and ends at the root node, which represents
the final operation of the query
 Query graph:
 A graph data structure that corresponds to a
Department of Computer Science 48
Using Heuristics in Query
Optimization (3)
 Example:
 For every project located in ‘Stafford’, retrieve the
project number, the controlling department number
and the department manager’s last name, address
and birthdate.
• Relation algebra:
PNUMBER, DNUM, LNAME, ADDRESS, BDATE
(((PLOCATION=‘STAFFORD’(PROJECT))
DNUM=DNUMBER (DEPARTMENT))
MGRSSN=SSN (EMPLOYEE))
 SQL query:
Q2: SELECT P.NUMBER,P.DNUM,E.LNAME,
E.ADDRESS, E.BDATE
FROM PROJECT AS P,DEPARTMENT AS
D,
EMPLOYEE AS E
WHEREDepartment of Computer
P.DNUM=D.DNUMBER
Science AND 49
Query trees for query Q2

Department of Computer Science 50

Using Heuristics in Query
Optimization (5)
 Heuristic Optimization of Query Trees:
 The same query could correspond to many
different relational algebra expressions and
hence many different query trees.
 The task of heuristic optimization of query trees
is to find a final query tree that is efficient to
execute
 Example:
Q: SELECT LNAME
FROM EMPLOYEE, WORKS_ON, PROJECT
WHERE PNAME = ‘AQUARIUS’ AND
PNMUBER=PNO AND
ESSN=SSN AND
BDATE > ‘1957-12-31’;
Department of Computer Science 51
Using Heuristics in Query
Optimization (6)
 Steps in converting a query tree during
heuristic optimization:
(a) Initial (canonical) query tree for SQL
query Q.
(b) Moving SELECT operations down the
query tree.
(c) Applying the more restrictive SELECT
operation first.
(d) Replacing CARTESIAN PRODUCT and
SELECT with JOIN operations.
(e) Moving PROJECT operations down the
query tree

 Applying these steps to query Q are shown in the

Department of Computer Science 52
Using Heuristics in Query
Optimization (7)

(a) Initial
(canonical) query
tree for SQL query
Q.

(b) Moving
SELECT
operations down
the query tree.

Department of Computer Science 53

Using Heuristics in Query
Optimization (8)

(c) Applying the more

restrictive SELECT
operation first.

Department of Computer Science 54

Using Heuristics in Query
Optimization (9)
(d) Replacing
CARTESIAN
PRODUCT and
SELECT with JOIN
operations.

(e) Moving
PROJECT
operations down
the query tree

Department of Computer Science 55

Using Selectivity and Cost
Estimates in Query Optimization
(1)
Cost-based query optimization:
 Estimate and compare the costs of executing a
query using different execution strategies and
choose the strategy with the lowest cost
estimate

 Issues
 Cost function
 Number of execution strategies to be
considered

Department of Computer Science 56

Using Selectivity and Cost
Estimates in Query Optimization
(2)
Cost is generally measured as the total time required to
execute a statement/query.
Cost Components for Query Execution
1. Access cost to secondary storage (Disk access)
2. Storage cost
3. Computation cost
4. Memory usage cost
5. Communication cost
Note: Different database systems may focus on different cost
components.
• Disk accesses (time to process a data request and retrieve
the required data from the storage device)
• Disk access is the predominant (major) cost, since disk
access is slow as compared to in-memory operation.
• Cost to write a block is greater than cost to read a
block because data is read back after being written to
ensure that the write was successful.
Department of Computer Science 57
Cont’d…
• Access cost to secondary storage: This is the
cost of transferring (reading and writing) data
blocks between secondary disk storage and main
memory buffers.
• Disk storage cost: This is the cost of storing on
disk any intermediate files that are generated by an
execution strategy for the query.
• Computation cost: This is the cost of performing
in-memory operations on the records within the
data buffers during query execution. Such
operations include searching for and sorting
records, merging records for a join or a sort
operation, and performing computations on field
values. This is also known as CPU (central
processing unit) cost.
Department of Computer Science 58
Cont’d…

• Memory usage cost: This is the cost

pertaining to the number of main memory
buffers needed during query execution
• Communication cost: This is the cost of
shipping the query and its results from the
database site to the site or terminal where the
query originated.

Department of Computer Science 59

Semantic Query Optimization
 Uses constraints specified on the database schema
in order to modify one query into another query
that is more efficient to execute.
 Consider the following SQL query,
SELECT E.LNAME, M.LNAME
FROM EMPLOYEE AS E, EMPLOYEE AS M
WHERE E.SUPERSSN=M.SSN AND E.SALARY>M.SALARY

 Explanation:
 Suppose that we had a constraint on the
database schema that stated that no employee
can earn more than his or her direct supervisor.
If the semantic query optimizer checks for the
existence of this constraint, it need not execute
the query at all because it knows that the result
of the query will be empty. Techniques known as
theorem proving can be used for this purpose
Department of Computer Science 60

Chapter - 1 - Query Optimization
No ratings yet
Chapter - 1 - Query Optimization
38 pages
Chapter 1 - Query Processing and Optimization
No ratings yet
Chapter 1 - Query Processing and Optimization
62 pages
CH - 2 Query Process
No ratings yet
CH - 2 Query Process
44 pages
Ch-2 Query Processing and Optimization
No ratings yet
Ch-2 Query Processing and Optimization
26 pages
CH - 1 Query Process SW
No ratings yet
CH - 1 Query Process SW
43 pages
Chapter -2-Query Prosessing and Optimization
No ratings yet
Chapter -2-Query Prosessing and Optimization
44 pages
QUERY Processing and Relational Algebra
No ratings yet
QUERY Processing and Relational Algebra
27 pages
Chapter 1 Query Processing and Optimization
No ratings yet
Chapter 1 Query Processing and Optimization
108 pages
AMSAL
No ratings yet
AMSAL
58 pages
Chapter 1 Query Processing and Optimization
No ratings yet
Chapter 1 Query Processing and Optimization
129 pages
Chapter 2
No ratings yet
Chapter 2
47 pages
ADBChapter 1
No ratings yet
ADBChapter 1
32 pages
Presentation9 - Query Processing and Query Optimization in DBMS
No ratings yet
Presentation9 - Query Processing and Query Optimization in DBMS
36 pages
Uds24201j Unit III
No ratings yet
Uds24201j Unit III
34 pages
Chapter 2 Query Processing
No ratings yet
Chapter 2 Query Processing
21 pages
ch2 PDF
No ratings yet
ch2 PDF
72 pages
Chapter - 2 Query Processing
No ratings yet
Chapter - 2 Query Processing
63 pages
CH 02
No ratings yet
CH 02
127 pages
CO3-Notes-Query Processing and Optimization
No ratings yet
CO3-Notes-Query Processing and Optimization
5 pages
ADB Chapter 2
No ratings yet
ADB Chapter 2
40 pages
Module - 4
No ratings yet
Module - 4
60 pages
Query Processing
No ratings yet
Query Processing
28 pages
Adb ch2
No ratings yet
Adb ch2
72 pages
Module - 1
No ratings yet
Module - 1
94 pages
CO3 Session 7
No ratings yet
CO3 Session 7
32 pages
Unit-5 Query Processing and Optimization
No ratings yet
Unit-5 Query Processing and Optimization
40 pages
2 Algorithms For Query Processing Optimization
No ratings yet
2 Algorithms For Query Processing Optimization
46 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
31 pages
Chapter 2 Querry Proccessing
No ratings yet
Chapter 2 Querry Proccessing
7 pages
Advanced Database System Chapter Two Query Processing and Optimization
No ratings yet
Advanced Database System Chapter Two Query Processing and Optimization
50 pages
Chapter One1
No ratings yet
Chapter One1
21 pages
Query Optimization
No ratings yet
Query Optimization
103 pages
2.advanced Database System
No ratings yet
2.advanced Database System
184 pages
Ch-2 Query Processing and Optimization
No ratings yet
Ch-2 Query Processing and Optimization
21 pages
ADBS - Chapter Two
No ratings yet
ADBS - Chapter Two
41 pages
DE Module5 QueryOptimization
No ratings yet
DE Module5 QueryOptimization
11 pages
Chapter 2
No ratings yet
Chapter 2
50 pages
Chapter 1 Query Processing
No ratings yet
Chapter 1 Query Processing
58 pages
Chapter 2-Query Processing and Optimi
No ratings yet
Chapter 2-Query Processing and Optimi
43 pages
Query Processing
No ratings yet
Query Processing
20 pages
Advaced DB U1
No ratings yet
Advaced DB U1
48 pages
Chapter 5
No ratings yet
Chapter 5
45 pages
ADB Chapter 2 DB Part1
No ratings yet
ADB Chapter 2 DB Part1
10 pages
Module 4 - 3 Bhargavi
No ratings yet
Module 4 - 3 Bhargavi
56 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
42 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
23 pages
Chapter 1 Query Processing
100% (1)
Chapter 1 Query Processing
63 pages
29-Query Optimization-04-10-2024
No ratings yet
29-Query Optimization-04-10-2024
35 pages
Query Processing
No ratings yet
Query Processing
4 pages
Module 1 - Query Processing
No ratings yet
Module 1 - Query Processing
20 pages
Chapter 1 Query Processing
100% (1)
Chapter 1 Query Processing
45 pages
Chapter 1
No ratings yet
Chapter 1
44 pages
Advanced Database
No ratings yet
Advanced Database
47 pages
Chapter 2 - Query Processing and Optimization
No ratings yet
Chapter 2 - Query Processing and Optimization
16 pages
Ch-1 - Query Processing and Optimization
No ratings yet
Ch-1 - Query Processing and Optimization
39 pages
Chapter 2 - Query Processing and Optimization
100% (1)
Chapter 2 - Query Processing and Optimization
28 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
42 pages
Chapter 2 Adb
No ratings yet
Chapter 2 Adb
21 pages
DBMS Lab Manual
From Everand
DBMS Lab Manual
Jitendra Patel
1.5/5 (3)
Advanced SAS Interview Questions You'll Most Likely Be Asked
From Everand
Advanced SAS Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Chapter - 3 - CG
No ratings yet
Chapter - 3 - CG
41 pages
3 & 4 Feasibilty Analysis and Business Plan - Edited N
No ratings yet
3 & 4 Feasibilty Analysis and Business Plan - Edited N
34 pages
Exitexam Demo With Answer
No ratings yet
Exitexam Demo With Answer
19 pages
Exit Exam From Ministry of Education
No ratings yet
Exit Exam From Ministry of Education
90 pages
2 Tir 2017 CS Exit Exam Question
100% (3)
2 Tir 2017 CS Exit Exam Question
13 pages
Computer Programming
No ratings yet
Computer Programming
73 pages
Design and Analysis of Algorithm
No ratings yet
Design and Analysis of Algorithm
132 pages
Computer Exit Exam Quesitions With Anaswer
80% (5)
Computer Exit Exam Quesitions With Anaswer
14 pages
Chapter 3
No ratings yet
Chapter 3
38 pages
Basic Oop Concepts
No ratings yet
Basic Oop Concepts
38 pages
Chapter 4-Concrruncy Controling Techniques
No ratings yet
Chapter 4-Concrruncy Controling Techniques
39 pages
Introduction To Software Engineering
No ratings yet
Introduction To Software Engineering
57 pages
C++ Chapter 4
No ratings yet
C++ Chapter 4
11 pages
Sqlquery
No ratings yet
Sqlquery
5 pages
CS522T4C-DBMS-MODULE 2 - ER Diagram
No ratings yet
CS522T4C-DBMS-MODULE 2 - ER Diagram
77 pages
Kinds of Databases
No ratings yet
Kinds of Databases
12 pages
Laboratory5-ITT557-2020878252-SITI FARHANA
No ratings yet
Laboratory5-ITT557-2020878252-SITI FARHANA
8 pages
Dataguard Switchover Steps
No ratings yet
Dataguard Switchover Steps
5 pages
Dbms Unit-2 Notes Mca I
No ratings yet
Dbms Unit-2 Notes Mca I
58 pages
Unit 1
No ratings yet
Unit 1
43 pages
1Z0 071 Demo
No ratings yet
1Z0 071 Demo
14 pages
SQL
No ratings yet
SQL
3 pages
DBMS Unit-4
No ratings yet
DBMS Unit-4
57 pages
Data Warehouse
100% (1)
Data Warehouse
12 pages
Notes-Raising Queries-Ch-4
No ratings yet
Notes-Raising Queries-Ch-4
2 pages
Microsoft Access (MS Access) Solved MCQs (Set-22)
No ratings yet
Microsoft Access (MS Access) Solved MCQs (Set-22)
4 pages
Dbms Lab PDF
No ratings yet
Dbms Lab PDF
85 pages
Project On Query in Ms Access
No ratings yet
Project On Query in Ms Access
36 pages
Hierarchical Model The Hierarchical Data Model Organizes Data in A Tree Structure
No ratings yet
Hierarchical Model The Hierarchical Data Model Organizes Data in A Tree Structure
6 pages
Domain Constraints Referential Integrity Assertions Triggers Functional Dependencies
No ratings yet
Domain Constraints Referential Integrity Assertions Triggers Functional Dependencies
31 pages
SQL
100% (1)
SQL
6 pages
CS101 Assignment # 2
No ratings yet
CS101 Assignment # 2
3 pages
Bug 10114837 Rman Deletes The Unapplied Archived Redo Logs
No ratings yet
Bug 10114837 Rman Deletes The Unapplied Archived Redo Logs
4 pages
Explain Plan
No ratings yet
Explain Plan
4 pages
Results of MCA II Semester (R20) RegularSupplementary Examinations July 2024
No ratings yet
Results of MCA II Semester (R20) RegularSupplementary Examinations July 2024
2 pages
MID 2DBMS CS SET 1 - April 2024
No ratings yet
MID 2DBMS CS SET 1 - April 2024
2 pages
Database Management System Class 10 IT 402
100% (1)
Database Management System Class 10 IT 402
17 pages
Student MGT System (Cs Class 12)
No ratings yet
Student MGT System (Cs Class 12)
38 pages
CB3401 DBMSS
No ratings yet
CB3401 DBMSS
25 pages
ICT Worksheet For Grade 11 @ambo Ifa Boru Special Boarding School in 2025-1
No ratings yet
ICT Worksheet For Grade 11 @ambo Ifa Boru Special Boarding School in 2025-1
10 pages
Dbms Lab Report 1
67% (3)
Dbms Lab Report 1
4 pages
TM07 Using Basic Structured Query Language
No ratings yet
TM07 Using Basic Structured Query Language
119 pages
Creatiq-Online Art System: A DBMS Project Report On
No ratings yet
Creatiq-Online Art System: A DBMS Project Report On
26 pages

Chapter Two Query Processing

Uploaded by

Chapter Two Query Processing

Uploaded by

Chapter Two

Query Processing and

Department of Computer Science 1

What does query processing mean?

Query processing refers to the range of activities

Department of Computer Science 5

Department of Computer Science 6

Department of Computer Science 9

 Query optimization is thus necessary for queries

Department of Computer Science 10

Union (𝑅1 𝖴 𝑅2): returns every tuples in relation

Intersection (𝑅1 ∩ 𝑅2): returns tuples both relation

R1 and R2 have in common

Department of Computer Science 12

Condition join: Syntax: R1 ⋈C R2 Sometimes called a

SELECT Lname, Fname

πLNAME, FNAME (σSALARY>C(EMPLOYEE)) GMAX SALARY (σDNO=5 (EMPLOYEE))

Department of Computer Science 19

C02 A02 Meet A02 1000

C03 A03 Jay A03 2000

C04 A04 Ram A04 4000

ΠName ( σBalance<2500 (Account) (Customer) )

C04 4 Ram 4000

σθ1Λθ2 (E) = σθ1 (σθ2 (E))

C04 4 Ram 4000

σθ1 (σθ2 (E)) = σθ2 (σθ1 (E))

Department of Computer Science 25

 In two way join above A and B are the join

Methods for implementing joins:

Department of Computer Science 27

• Number of blocks of depositor: 100 (bdepositor)

Department of Computer Science 29

Department of Computer Science 30

Department of Computer Science 32

Department of Computer Science 33

 CARTESIAN PRODUCT of relations R and S

Department of Computer Science 35

Department of Computer Science 37

 For left outer join, use the left relation as

Department of Computer Science 40

1. Compute the (inner) JOIN of the EMPLOYEE and

• TEMP2  TEMP2 x 'null'

• RESULT  TEMP1 υ TEMP2

ΠCust_Name ( σBalance<2500 (account) (customer) )

To evaluate such expression we need to

ΠCust_Name ( σBalance<2500 (account) (customer) )

Department of Computer Science 43

 Materialization evaluates the expression tree of

Department of Computer Science 44

 In pipelining, operations form a queue, and results

Department of Computer Science 45

 Exhaustive Search Optimization

Department of Computer Science 46

Department of Computer Science 50

 Applying these steps to query Q are shown in the

Department of Computer Science 53

(c) Applying the more

Department of Computer Science 54

Department of Computer Science 55

Department of Computer Science 56

• Memory usage cost: This is the cost

Department of Computer Science 59

You might also like