DBMS Unit - 7
DBMS Unit - 7
Prof. S.W.Thakare
Assistant Professor,
Computer science & Engineering
Topics
• Query processing
• Steps in query processing
• Measures of query cost
• Selection operation
• Evaluation of expressions
• Query optimization
• Transformation of relational expressions
• Cost base optimization approach
Query Processing
• Query Processing is process to convert high level queries to low level so
machine can understand and perform the action that are requested by
the user.
• It is used to extract data from database and to fetch data it takes three
steps:
Database Catalog
Data Data Data Statistics about Data
Step in Query Processing
1. Parsing and Translation:
• Parsing(Parser):
• Check Syntax
• Check Schema Elements
• Translation(Translator)
• Parse Tree Relational Algebra
Step in Query Processing
2. Optimization(Optimizer):
• Communication cost:
• Applicable to distributed/parallel system.
• CPU Cycles:
• Difficult to calculate
• CPU speed improves at much faster rate as compared to Disk
speed
Measures of Query cost
• Disk Access:
• Dominates the total time to execute a query
1. Linear Search(A1)
2. Binary Search(A2)
Selection operation
1. Linear Search(A1):
• This algorithm will search and scan all blocks available and tests all
records/data to determine whether or not they satisfy the selection
condition.
• Cost(A1) = BR (worst case)
where BR denotes number of blocks
• If the condition is on a Key(primary) attribute, then system can stop
searching if desired record found.
• Cost(A1) = BR/2 (best case)
• If the condition is on non (primary) key attribute, then multiple
blocks may contain desired records, then the price of scanning such
blocks have to be added to the estimate value.
• This is slower than Binary Search.
Selection operation
2. Binary Search(A2):
• File (relation) ordered based on attribute A (primary index).
• Cost(A2) = log2(BR)
Bottom to top
we have to solve one by one in
Execution
proper order.
• There are two methods to
evaluate multiple operations (Customer)
expression: σBalance<25000
1. Materialization
2. Pipelining
(Account)
Materialization
• Materialization starts the bottom of the expression and
performs a
single operation at a time.
• Materialized(store in temporary relation) each intermediate result of all
operations performed and use this result as input to evaluate next-level
operations.
• The cost of materialization can be quite high as overall cost can be
compute as:
Overall Cost = Sum of Costs of individual
operations + Cost of writing intermediate results to
the disk
• Disadvantages of Materializations are:
• Due to intermediate results, it creates lots of temporary relations.
Pipelining
• In Pipelining, the output of one operation is passed as input to another
operation. i.e. it forms a queue.
• As the output of one operation is passed to the next operation
in the Pipelines, the number of intermediate temporary relations
will be
reduced.
• Performing operations in Pipeline eliminates the cost of writing and
reading temporary relations.
Customer) ) A1 30000
ΠName ( σBalance<25000 (Account A2 10000
A3 20000
4 records 4 records A4 40000
Query Optimization Approaches
• Cost Based Optimization (Exhaustive Search Optimization):
• In this, it initially generates all possible plans and then select the
best plan from it.
• Its provides the best solution.
(Customer) ) Customer) )
ΠName ( σBalance<25000 (Account) ΠName ( σBalance<25000 (Account
Customer
Name
Meet
Jay
Equivalence Rules
1. Conjunctive(Combined) selection operations can be deconstructed
into sequence of individual selections. This is known as Cascade of σ.
Customer
CID ANO Name Balance
σANO<3 Λ Balance<20000 (Customer) Output
CS1 1 Jay 30000
CID ANO Name Balance
CS2 2 Abhi 10000 OUTPUT CSE2 2 Abhi 10000
CS3 3 Parth 20000
σANO<3 (σBalance<20000 (Customer))
CS4 4 Pratik 40000
E1 σθ E2
= E2 σθ E1
E2)
E3
Equivalence Rules
7. The selection operation distributes over the theta join
operation
under the following two conditions:
∏ L1 𝖴 L2 (E1 θ E2 ) = (∏ L (E1 ))
1 θ (∏ L (E2 ))
2
L1 𝖴 L2, and
• Let L3 be attributes of E1 that are involved in join condition θ, but are not in
L1 𝖴 L2.
• Let L4 be attributes of E2 that are involved in join condition θ, but are not in
(
Equivalence Rules
9. The set operations union and intersection are commutative
E1 𝖴 E2 = E2 𝖴
E1 E1 ∩ E2
= E2 ∩ E1
Note: set difference is not
commutative
5. Communication cost-
This is the cost that is associated with sending or communicating the query and
its results from one place to another. It also includes the cost of transferring the
table and results to the various sites during the process of query evaluation.