0% found this document useful (0 votes)

87 views20 pages

07 QueryOptimisation-no Blanks

The document discusses query optimization techniques used in database management systems. It covers topics like equivalence rules, join ordering, cost-based query optimization and optimizing nested and dynamic queries. The goal of query optimization is to find an efficient execution plan by generating equivalent queries and choosing the lowest cost plan based on statistics and estimated costs.

Uploaded by

朱宸烨

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

87 views20 pages

07 QueryOptimisation-no Blanks

Uploaded by

朱宸烨

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

DATA3404 Data Science Platforms

Week 7: Query Optimisation

Presented by
A/Prof Uwe Roehm
School of Computer Science

DATA3404 "Data Science Platforms" - 2020 (Roehm) 1

Learning Objectives
– Query Optimisation
– Equivalence Rules
– Join Ordering
– Cost-Based Query Optimisation
– Optimising nested and dynamic queries

DATA3404 "Data Science Platforms" - 2020 (Roehm) 2

Cost-based Query Optimisation

DATA3404 "Data Science Platforms" - 2020 (Roehm) 3

Motivation for Query Optimisation

Query is (by definition) declarative, so it does not specify the execution plan.
If a query runs slowly, what can you do?
– Sometimes you can write a better query
– E.g. by avoiding cross-joins or trying to use more selective filter conditions
– Add indexes
– Some systems allow hints (assumes you know better than the query planner)
– Oracle ‘+ALL ROWS’, etc
– PostgreSQL doesn’t have them (deliberately)
– SQL Server OPTION clause
– MySQL USE/IGNORE/FORCE INDEX
– Configuration tweaks & Hardware upgrades
To do most of the above you need to understand what’s going on inside, and how the
query optimizer works

DATA3404 "Data Science Platforms" - 2020 (Roehm) 4

Reminder: Query Processing
SELECT name
FROM Student
query
NATURAL JOIN Enrolled
WHERE uosCode=‘DATA3404’
pname
parser and
translator

⋈
relational algebra
expression
suosCode=‘DATA3404’ Student
focus this week
optimizer

Enrolled

execution plan
statistics
about data

evaluation engine

query output
data
DATA3404 "Data Science Platforms" - 2020 (Roehm) 5

Query Optimization
Central Problems:
– Query is (by definition) declarative,
e.g. it does not specify the execution order.
But we need an executable plan.

– The goal of query optimization is to find a suitable execution plan.

– Ideally: Want to find best plan.
– Practically: Avoid worst plans!
Time for query optimization adds to total query execution time.

DATA3404 "Data Science Platforms" - 2020 (Roehm) 6

Cost-based Query Optimization
– Generation of query-evaluation plans for an expression involves several steps:
1. Generating logically equivalent expressions
• Use equivalence rules to transform an expression into an equivalent one.
2. Annotating resultant expressions to get alternative query plans
3. Choosing the cheapest plan based on estimated cost

– The overall process is called cost based optimization.

– Two main issues:
• For a given query, what plans are considered?
– Algorithm to search plan space for cheapest (estimated) plan.
• How is the cost of a plan estimated?
– Alternative: rule-based optimization

DATA3404 "Data Science Platforms" - 2020 (Roehm) 7

The Importance of I/O Cost

› Amongst all equivalent evaluation plans
choose the one with lowest estimated cost
› Usually cost in terms of time to answer query CPU
Disk I/O

processing
› Typically disk access is the predominant cost,
and is also relatively easy to estimate Network
comms
- Based on statistics about size of relations,
spread of values in a column, etc.
› For simplicity, we just use number of page
transfers from disk (I/Os) as the cost
measure Query time

DATA3404 "Data Science Platforms" - 2020 (Roehm) 8

Cost Estimation
– For each plan considered, must estimate cost:
– Must estimate cost of each operation in plan tree.
• Depends on input cardinalities.
• We’ve already discussed how to estimate the cost of operations
(sequential scan, index scan, joins, etc.)
– Must also estimate size of result for each operation in tree!
• Use information about the input relations.
• For selections and joins, assume independence of predicates
– Database Statistics play a crucial role here
– Our assumption: data values to be uniformly distributed
(Note: This is typically not the case with many data sets)
– How to determine a data distribution over large data sets?
– How to keep those statistics up-to-date?
Þ Periodic sampling
– How to handle data inter-dependencies? Hard!
DATA3404 "Data Science Platforms" - 2020 (Roehm) 9

Exercise 1: Query Costs

sid name address dob sid uosCode grade
123 Alice 1 Acacia Drive 18-AUG-1989 123 COMP9120 HD
124 Bob 7 Belmont Av 07-JUN-1993 124 COMP5338 P
… …

⋈
SELECT *
FROM Student JOIN Enrolled USING(sid) BLOCK NESTED LOOPS
WHERE grade='HD'
sid name address dob sid uosCode grade
123 Alice 1 Acacia Drive 18-AUG-1989 123 COMP9120 HD
124 Bob 7 Belmont Av 07-JUN-1993 124 COMP5338 P
… …

σ FILTER

sid name address dob sid uosCode grade

123 Alice 1 Acacia Drive 18-AUG-1989 123 COMP9120 HD
… …
DATA3404 "Data Science Platforms" - 2020 (Roehm) 10
Equivalence Rules

DATA3404 "Data Science Platforms" - 2020 (Roehm) 11

Equivalent Algebra Expressions

– Two relational algebra expressions are said to be equivalent if on every
legal database instance the two expressions generate the same set of tuples
– Note: order of tuples is irrelevant
– In SQL, inputs and outputs are multisets of tuples, hence above definition
on multiset of tuples

– An equivalence rule says that expressions of two forms are equivalent

– Can replace expression of first form by second, or vice versa

DATA3404 "Data Science Platforms" - 2020 (Roehm) 12

Equivalent Algebra Expressions
pname
Enrolled Student
sid uosCode sid Name Name
⋈
101 DATA3404 101 Alice Alice
102 COMP5338 102 Bob suosCode=‘DATA3404’ Student

Enrolled
Name
Equivalence rules Alice
Clare
Enrolled Student
sid uosCode sid Name pname
101 DATA3404 103 Clare suosCode=‘DATA3404’ Name
102 COMP5338 102 Bob
Clare
103 COMP5338 101 Alice ⋈ Alice
103 DATA3404
Student Enrolled

DATA3404 "Data Science Platforms" - 2020 (Roehm) 13

Selection Equivalence
SELECT *
FROM Enrolled
WHERE sid=102 sid uos grade
AND uos=‘DATA3404’ 102 DATA3404 HD
Enrolled σsid=102 σuos='DATA3404'
102 COMP5338 P
sid uos grade
101 DATA3404 D
σsid=102 ˄ uos='DATA3404' sid uos grade
101 COMP5338 CR
102 DATA3404 HD
102 DATA3404 HD
102 COMP5338 P
sid uos grade
σuos='DATA3404' 101 DATA3404 D σsid=102
102 DATA3404 HD

Selections with conditions joined with ˄ σ c1∧ ...∧ cn ( R) ≡ σ c1 ( . . . σ cn ( R))

(AND) operator cascade

Nested selections operations commute σ c1 (σ c 2 (R)) ≡ σ c 2 (σ c1 (R))

DATA3404 "Data Science Platforms" - 2020 (Roehm) 14
Projection Equivalence
SELECT name, home
FROM Student sid name home
101 Alice Austria
πsid,name,home 102 Bob Brazil πname,home
Student
103 Clare Chile
sid name age home name home

101 Alice 21 Austria πname,home Alice Austria

102 Bob 32 Brazil Bob Brazil

103 Clare 23 Chile Clare Chile

sid name
101 Alice
πsid,name 102 Bob
home attribute
not available
103 Clare

Projection operations cascade (

π a1 ( R) ≡ π a1 ... (π a1..an ( R)) )
DATA3404 "Data Science Platforms" - 2020 (Roehm) 15

Projections and Selections

SELECT name, home
FROM Student sid name age home
WHERE sid=102
σsid=102 102 Bob 32 Brazil
πname,home
Student
sid name age home name home sid attribute not
101 Alice 21 Austria πname,home Alice Austria
available
name home
102 Bob 32 Brazil Bob Brazil Bob Brazil
σsid=102
103 Clare 23 Chile Clare Chile

πsid,name,home πname,home
sid name home
101 Alice Austria sid name home
102 Bob Brazil 102 Bob Brazil
103 Clare Chile σsid=102

A projection commutes with a selection that only uses attributes retained by the projection.
DATA3404 "Data Science Platforms" - 2020 (Roehm) 16
Joins and Cross Products
SELECT * FROM Student S
JOIN Enrolled E ON (E.sid=S.sid)
E.sid uosCode S.sid Name
E S
101 COMP9120 101 Alice
sid uosCode sid Name
101 COMP9120 102 Bob
101 COMP9120 101 Alice
× 101 COMP9120 103 Clare
102 COMP9120 102 Bob
102 COMP9120 101 Alice
102 COMP5338 103 Clare
102 COMP9120 102 Bob
104 COMP9120
102 COMP9120 103 Clare
(R⋈θ S)≡σθ(R×S)
⋈E.sid=S.sid 102 COMP5338 101 Alice
102 COMP5338 102 Bob

E.sid uosCode S.sid Name 102 COMP5338 103 Clare

101 COMP9120 101 Alice σE.sid=S.sid 104 COMP9120 101 Alice
102 COMP9120 102 Bob 104 COMP9120 102 Bob
102 COMP5338 102 Bob 104 COMP9120 103 Clare
DATA3404 "Data Science Platforms" - 2020 (Roehm) 17

Join Equivalences
SELECT * FROM Student S JOIN Enrolled E ON (E.sid=S.sid)

E S
sid uosCode sid Name Joins commute:
101 COMP9120 101 Alice (R⋈θS) ≡ (S⋈ θ R)
102 COMP9120 102 Bob Joins are associative:
102 COMP5338 103 Clare R⋈θ(S⋈ηT) ≡ (S⋈ θ R) ⋈ η T
104 COMP9120

E ⋈E.sid=S.sid S S ⋈E.sid=S.sid E

E.sid uosCode S.sid Name S.sid Name E.sid uosCode

101 COMP9120 101 Alice 101 Alice 101 COMP9120
102 COMP9120 102 Bob 102 Bob 102 COMP5338
102 COMP5338 102 Bob 102 Bob 102 COMP9120

DATA3404 "Data Science Platforms" - 2020 (Roehm) 18

More Equivalences
Pushing down selections
suosCode=‘DATA3404’ ⋈

⋈ suosCode=‘DATA3404’ Student

Enrolled Student Enrolled

Pushing down projections

pname pname

⋈
⋈
psid,name Enrolled
Enrolled Student

Student
DATA3404 "Data Science Platforms" - 2020 (Roehm) 19

Optimization Heuristics
– Working through all possible join orders can be a big job as number of relations
in query gets large
– Can use dynamic programming to store intermediate results
– Cost-based optimization is expensive, even with dynamic programming.
– Systems may use heuristics to reduce the number of choices that must be made in a
cost-based fashion.
– Heuristic optimization transforms the query-tree by using a set of rules that
typically (but not in all cases) improve execution performance:
– Perform selection early (reduces the number of tuples)
– Perform projection early (reduces the number of attributes)
– Perform most restrictive selection and join operations before other similar
operations.
– Some systems use only heuristics, others combine heuristics with partial cost-
based optimization.

DATA3404 "Data Science Platforms" - 2020 (Roehm) 20

Join Ordering

DATA3404 "Data Science Platforms" - 2020 (Roehm) 21

Join Optimization
– Fundamental problem for query optimization: Join Order
– In principle, naïve join optimization could enumerate all possible execution
plans, i.e., all possible 2-way join combinations for each query block.

[Source: DB2 lecture, Uni Tuebingen]

DATA3404 "Data Science Platforms" - 2020 (Roehm) 22
How Many Such Combinations Are There?
– A join over n+1 relations R1,...,Rn+1 requires n binary joins.
– Its root-level operator joins sub-plans of k and n − k − 1 join operators
(0 ≤ k ≤ n − 1):
⋈

k joins n-k-1 joins

R1, …, Rk+1 Rk+2, …, Rn+1

– Let #$ be the number of possibilities to construct a binary tree of i inner

nodes (join operators): &+,

#& = ( #$ - #&+$+,
$)*

DATA3404 "Data Science Platforms" - 2020 (Roehm) 23

Search Space
– The resulting search space is enormous:
Possible bushy join trees joining n relations
Number of relations n Cn-1 Join trees
2 1 2
3 2 12
4 5 120
5 14 1,680
6 42 40,340
7 132 665,280
8 429 17,297,280
10 4,862 17,643,225,600

– And we haven’t yet even considered the use of m different join algorithms
(yielding another factor of m(n−1))!
DATA3404 "Data Science Platforms" - 2020 (Roehm) 24
Left-Deep Join Plans
– To master this search space, fundamental decision in System R (the father of
all query optimizers): only left-deep join trees are considered.
– In left-deep join trees, the right-hand-side input for each join is a relation, not
the result of an intermediate join.
– Left-deep trees allow us to generate all fully pipelined plans.
• Intermediate results not written to temporary files.
• Not all left-deep trees are fully pipelined (e.g., Sort-Merge join).
⋈
⋈
Join results must
be cached for use ⋈
in next join
⋈ D

⋈ ⋈ ⋈ D

Join results must C ⋈

be cached for use
A B C D ⋈ C in next join

A B
bushy join plan left-deep join plan
A B non-left-deep plan
DATA3404 "Data Science Platforms" - 2020 (Roehm) 25

Reminder: Passing Records between Operations

› Materialization (set-at-a-time):
1. Evaluate whole operation
2. Store (materialize) results in a temporary relation pb,d

3. Next operation reads in temporary relation.

• Always applicable, expensive I/O
⋈
› Pipelining (tuple-at-a-time):
⋈ R
1. Evaluate one row of output of a relation
2. Pass (pipeline) each row to next operation sS.e>=100 ∧ S.e<=119
T
• Much cheaper (all in memory)
S
• Some operations not compatible with pipelining
(e.g., inner table of join, sorts, hash joins, aggregations)

DATA3404 "Data Science Platforms" - 2020 (Roehm) 26

Dynamic Programming:
Bottom-Up Enumeration of Left-Deep Plans
– Left-deep plans differ only in the order of relations, the access method for each relation,
and the join method for each join.
– Enumerated using N passes (if N relations joined):
Dynamic programming approach
– Pass 1: Best 1-relation plans
Find best access path for each relation individually.
– Pass 2: Best 2-relation plans
Find best way to join result of each pair of tables Ri and Rj using previous best access paths:
optPlan( {Ri, Rj} ) = best of Ri ⋈ Rj and Rj ⋈ Ri (12 plans to consider)
– Pass N: Best N-relation plans
Find best way to join result of a (N-1)-relation plan (as outer) to the N’th relation based on
the best n-1 plans.
– For each subset of relations, retain only:
– Cheapest plan overall, plus
– Cheapest plan for each interesting order of the tuples.

DATA3404 "Data Science Platforms" - 2020 (Roehm) 27

Dynamic Programming in Bottom-Up Query Optimization

for i := 1 to n do
optPlan({Ri}) := best_access_plans(Ri)

for i := 2 to n do
{
for all S Í {R1, …, Rn} s.t. |S|=i do
{
bestPlan := a dummy plan w/ infinite cost
for all Rj, Sj s.t. S = {Rj} È Sj do
{
p := joinPlan(optPlan(Sj), Rj);
if cost(p) £ cost(bestPlan) then
bestPlan := p
}
optPlan(S) := bestPlan
}
}
return (optPlan({R1, …, Rn}))
DATA3404 "Data Science Platforms" - 2020 (Roehm) 28
Examples of
Cost-based Query Optimisation

DATA3404 "Data Science Platforms" - 2020 (Roehm) 29

Exercise 2: Execution Trees

– Scenario: – Physical operations available:
Car(cid, pod) – TABLE ACCESS
Trip(tid,cid) – INDEX SCAN
– (simple) NESTED LOOPS
SELECT Car.cid, T.tid
– BLOCK NESTED LOOPS
FROM Car C, Trip T
WHERE T.cid = C.cid
– INDEX NESTED
AND C.pod = 5; – MERGE JOIN
– Give execution plans:
– No indexes
– Clustered primary index on cid
– Proposed extra index

DATA3404 "Data Science Platforms" - 2020 (Roehm) 30

Example Solutions
Using clustered Using unclustered secondary
Basic Plan using no indexes index on Trip(cid)
primary index
PROJECTION (tid,cid)
PROJECTION (tid,cid) PROJECTION (tid,cid)

FILTER (pod=5)
INDEX NESTED JOIN
FILTER (pod=5)

BLOCK NESTED JOIN

INDEX NESTED JOIN

INDEX SCAN ON INDEX SCAN

Car.pod (pod=5) ON Trip.cid

TABLE SCAN Car TABLE SCAN Trip

TABLE SCAN Trip INDEX SCAN Car.cid

Pushed-down selection
PROJECTION (tid,cid)

BLOCK NESTED JOIN

TABLE SCAN Car (pod=5) TABLE SCAN Trip

DATA3404 "Data Science Platforms" - 2020 (Roehm) 31

Exercise 3: Choosing an Optimal Plan

Query Statistics
SELECT Car.cid, T.tid – Car(cid, pod) :
FROM Car C, Trip T – 10,000 rows each 50 bytes
WHERE T.cid = C.cid – 10,000 values for cid
AND C.pod = 5; – 2500 values for pod
– Trip(tid, cid):
– Identify the best plan by costing out – Foreign key (a) references Car
several logically equivalent plans. – 50,000 rows, each 40 bytes
– 50,000 values for tid
– 10,000 values for cid
– Assume a page is 4096 bytes, of which
4000 are useful for data records
(the rest is header)
DATA3404 "Data Science Platforms" - 2020 (Roehm) 32
Basic Plan using No Indexes

PROJECTION (tid,cid) Cost estimate

Selection and projection in
memory (pipelined)
– Cost to scan Car
– Car has 4000/50=80 data records per page,
FILTER (pod=5) so is stored on 10000/80=125 pages
– Cost to scan Trip
– Trip has 4000/40=100 data records per
BLOCK NESTED JOIN
page, so is stored on 50000/100=500 pages
– So we read 125 pages of Car, and we scan Trip
125 times (doing 500 page reads each time)
– Total disk I/O is 125+125*500 = 62625 pages
TABLE SCAN Car TABLE SCAN Trip

DATA3404 "Data Science Platforms" - 2020 (Roehm) 33

Plan using Pushed-down Selection

Cost estimate
PROJECTION (tid,cid)
– Cost to scan Car
– Car is stored on 125 pages
Selection and projection in
memory (pipelined) – Selectivity on Car
BLOCK NESTED JOIN – Filter condition has selectivity of 1/2500
– about 4 cars will get through filter (10,000 cars * selectivity)
– So we read 125 pages of Car, and we scan Trip 4
times (doing 500 page reads each time)
TABLE SCAN Car (pod=5) TABLE SCAN Trip
– Total disk I/O is 125+4*500 = 2125 pages

DATA3404 "Data Science Platforms" - 2020 (Roehm) 34

Plan using Clustered Primary Index
A plan suitable if Car has clustered Cost estimate
primary index on Car.cid: – Assume index on Car.cid has 2 levels
(index height 1)
PROJECTION (tid,cid) – Cost to scan Trip
Selection and projection in
memory (pipelined) – Read 500 pages
– Cost to look up row of Car with given cid
FILTER (pod=5)
– Read one page per level, then read the one
data record that is pointed to (recall Car.cid is
primary key of Car)
INDEX NESTED JOIN
– Cost of lookup is 3 pages
– So we read 500 pages of Trip, and we do index
lookup on C once for each record in Trip (i.e., we do
50,000 index lookups)
TABLE SCAN Trip INDEX SCAN Car.cid – Total disk I/O is: 500 + 50000*3 = 150,500 pages

DATA3404 "Data Science Platforms" - 2020 (Roehm) 35

Plan using Unclustered Secondary Index

A plan suitable if we add an unclustered Cost estimate

secondary index on Car.pod, and an
– Assume index on Car.pod has 2 levels, and index
unclustered secondary index on Trip.cid
on Trip.cid has 3 levels
– Cost to use index to find rows with Car.pod = 3
PROJECTION (tid,cid) – Read one page per level, then fetch the data
Selection and projection in records pointed to
memory (pipelined)
– There are 2 index levels (incl root), and there will
INDEX NESTED JOIN
be 4 records matching pod with search value 5
– Cost for the lookup is 2+4 = 6 pages read
– Cost to use index to find rows of Trip with given cid
– Read root plus one page per level, then fetch the
INDEX SCAN ON INDEX SCAN ON
data records pointed to
Car.pod (pod=5) Trip.cid – There will be 50000/10000 = 5 records with a
given value of Trip.cid
– Cost of a lookup is 3 + 5 = 8 pages
– Total disk I/O is: 6 + 4*8 = 38 pages

DATA3404 "Data Science Platforms" - 2020 (Roehm) 36

FYI: Nested Queries SELECT S.name
FROM Student S
– Nested block is optimized WHERE EXISTS
independently, with the outer tuple (SELECT *
FROM Enrolled E
considered as providing a selection WHERE E.uos=‘DATA3404’ AND
condition. E.sid=S.sid)

– Outer block is optimized with the Nested block to optimize:

cost of `calling’ nested block SELECT *
FROM Enrolled E
computation taken into account. WHERE E.uos=‘DATA3404’ AND
S.sid=outer_value
– Implicit ordering of these blocks
Equivalent non-nested query:
means that some good strategies SELECT S.name
are not considered. FROM Student S, Enrolled E
The non-nested version of the query is WHERE S.sid=E.sid
typically optimized better. AND E.uos=‘DATA3404’
DATA3404 "Data Science Platforms" - 2020 (Roehm) 37

FYI: Optimising Dynamic Queries

– Common: Dynamic queries with bind parameters:
PreparedStatement stmt = connection.prepareStatement(“SELECT * FROM R WHERE R.A=?”);
stmt.setInt(1,4711);
ResultSet rs = stmt.executeQuery();

– More secure than static SQL

– Commonly advised as this would be faster (because parsed only once)

– Questions / Caveats:
– How to optimise parameterized queries?
• Just take a ‘typical’ value for placeholders? Which value is ‘typical’?
• E.g. Oracle: Optimizer peeks into the actual bind values, then optimises.
Re-uses this plan even if cursor uses the query with different bind values
– How to cache/re-use these queries?
• If re-issued a query with a different bind value, shall we still re-use plans?
• E.g. Oracle: Does not share plans with bind values!
DATA3404 "Data Science Platforms" - 2020 (Roehm) 38
Key Concepts
– Build on topics from past two weeks: – Execution Plans
– Index choice – Should be able to annotate an expression
– Expression trees tree with appropriate physical operations
– Physical operations – Should be able to identify plans that
involve indexes, and propose suitable
– Access Paths
indexes for these plans
– Estimating I/O cost for all the above
– Should be able to compare plans based
upon I/O cost

– RA Expression Equivalence
– Should be able to translate between
expression trees using RA equivalence rules

DATA3404 "Data Science Platforms" - 2020 (Roehm) 39

Next Lecture (after the Easter break)

Next Week: Easter break – no lectures
Then:
– Distributed Data Management
– Data Partitioning & Data Sharing
– Data Replication
– Distributed Query/Join Processing

– Textbooks
– Ramakrishnan/Gehrke: Chapter 22
– Kifer/Bernstein/Lewis: Chapter 24

DATA3404 "Data Science Platforms" - 2020 (Roehm) 40

8.PEGA Extra Knowledge
No ratings yet
8.PEGA Extra Knowledge
164 pages
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
From Everand
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
AJIT DASH
2/5 (2)
Oracle Database 12c SQL 1Z0 071 Exam
100% (4)
Oracle Database 12c SQL 1Z0 071 Exam
40 pages
Advanced Database Systems: Chapter 3:query Processing and Evaluation
100% (1)
Advanced Database Systems: Chapter 3:query Processing and Evaluation
36 pages
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
Deep Learning: COMP 5329
No ratings yet
Deep Learning: COMP 5329
32 pages
An Elasticsearch Crash Course Presentation PDF
No ratings yet
An Elasticsearch Crash Course Presentation PDF
81 pages
Lecture 17
No ratings yet
Lecture 17
52 pages
12 Query Plan Space
No ratings yet
12 Query Plan Space
72 pages
13 Query Plan Space
No ratings yet
13 Query Plan Space
71 pages
Relational Algebra: R & G, Chapter 4
No ratings yet
Relational Algebra: R & G, Chapter 4
27 pages
ADBMS Notes
67% (3)
ADBMS Notes
48 pages
Lecture11 Query Processing
No ratings yet
Lecture11 Query Processing
37 pages
Chapter 13: Query Processing
No ratings yet
Chapter 13: Query Processing
25 pages
Query Optimization
No ratings yet
Query Optimization
103 pages
DE Module5 QueryOptimization
No ratings yet
DE Module5 QueryOptimization
11 pages
Advanced Database Systems Lecture Notes
No ratings yet
Advanced Database Systems Lecture Notes
79 pages
4 Chapter Four
No ratings yet
4 Chapter Four
34 pages
Chapter 1 Query Processing
100% (1)
Chapter 1 Query Processing
45 pages
Tutorial - 10 - A2 and Query Optimization
No ratings yet
Tutorial - 10 - A2 and Query Optimization
16 pages
Relational Algebra: CSCD343-Introduction To Databases - A. Vaisman 1
No ratings yet
Relational Algebra: CSCD343-Introduction To Databases - A. Vaisman 1
21 pages
28-Query Processing-30-09-2024
No ratings yet
28-Query Processing-30-09-2024
17 pages
1.6 PPT - Query Optimization
No ratings yet
1.6 PPT - Query Optimization
53 pages
Relation Algebra
No ratings yet
Relation Algebra
14 pages
Chapter 6 Relational Algebra
No ratings yet
Chapter 6 Relational Algebra
29 pages
Relational Algebra: CS 186 Spring 2006, Lecture 8 R & G, Chapter 4
No ratings yet
Relational Algebra: CS 186 Spring 2006, Lecture 8 R & G, Chapter 4
30 pages
06 QueryProcessing-noblanks
No ratings yet
06 QueryProcessing-noblanks
56 pages
Relational Algebra: Module 3, Lecture 1
No ratings yet
Relational Algebra: Module 3, Lecture 1
20 pages
Database Management Systems: Relational Algebra
No ratings yet
Database Management Systems: Relational Algebra
28 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
45 pages
Relational Algebra: CSCD343-Introduction To Databases - A. Vaisman 1
No ratings yet
Relational Algebra: CSCD343-Introduction To Databases - A. Vaisman 1
21 pages
DBMS Unit 2
No ratings yet
DBMS Unit 2
11 pages
Data Modeling
No ratings yet
Data Modeling
164 pages
Ad Bms Notes
No ratings yet
Ad Bms Notes
44 pages
CH 14 Updated
No ratings yet
CH 14 Updated
30 pages
Relational Algebra Optimization
No ratings yet
Relational Algebra Optimization
24 pages
Course Code CSE3001 CT C LTP 4 Prerequisite: Objectives
No ratings yet
Course Code CSE3001 CT C LTP 4 Prerequisite: Objectives
7 pages
Query Planning & Optimization: Intro To Database Systems Andy Pavlo
No ratings yet
Query Planning & Optimization: Intro To Database Systems Andy Pavlo
30 pages
Relational Algebra: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
No ratings yet
Relational Algebra: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
22 pages
Chapter - 2 Query Processing
No ratings yet
Chapter - 2 Query Processing
64 pages
Query Optimization
No ratings yet
Query Optimization
7 pages
1 1b Query Optimization Sil 7ed ch16 SPLIT
No ratings yet
1 1b Query Optimization Sil 7ed ch16 SPLIT
69 pages
Session - 10 Querying
No ratings yet
Session - 10 Querying
36 pages
Relational Algebra
No ratings yet
Relational Algebra
29 pages
Chapter - 2 Query Processing
No ratings yet
Chapter - 2 Query Processing
63 pages
CSU 07314 Lecture 5-6
No ratings yet
CSU 07314 Lecture 5-6
37 pages
Lect 19
No ratings yet
Lect 19
33 pages
Case Study of Data Science
No ratings yet
Case Study of Data Science
16 pages
Chapter - 2 Query Processing
No ratings yet
Chapter - 2 Query Processing
61 pages
Lec08 - Good Source PDF
No ratings yet
Lec08 - Good Source PDF
20 pages
Week09 QPO
No ratings yet
Week09 QPO
56 pages
Relational Algebra: Ref: Chapter 4, Raghu Ramakrishnan, Database Management Systems
No ratings yet
Relational Algebra: Ref: Chapter 4, Raghu Ramakrishnan, Database Management Systems
22 pages
Ch4 Algebra
No ratings yet
Ch4 Algebra
8 pages
1b Query Optimization Sil 7ed ch16
No ratings yet
1b Query Optimization Sil 7ed ch16
35 pages
SOEN 363 - Data Systems For Software Engineers: Query Optimization
No ratings yet
SOEN 363 - Data Systems For Software Engineers: Query Optimization
15 pages
1b Query Optimization Sil 7ed Ch16
No ratings yet
1b Query Optimization Sil 7ed Ch16
35 pages
Query Execution
No ratings yet
Query Execution
87 pages
Module 4
No ratings yet
Module 4
33 pages
1 Intro Select Project
No ratings yet
1 Intro Select Project
28 pages
Querycompiler PDF
No ratings yet
Querycompiler PDF
714 pages
Relational Algebra
No ratings yet
Relational Algebra
58 pages
Query Compiler
No ratings yet
Query Compiler
599 pages
Relational Algebra: Chapter 4, Part A
No ratings yet
Relational Algebra: Chapter 4, Part A
21 pages
Lexicon of Computer Science Terminology: Lexicon of Tech and Business, #16
From Everand
Lexicon of Computer Science Terminology: Lexicon of Tech and Business, #16
Mustafa Al-Dori
4/5 (1)
Tutorial 2
No ratings yet
Tutorial 2
7 pages
COMP5046: Natural Language Processing
No ratings yet
COMP5046: Natural Language Processing
71 pages
COMP5048 Visual Analytics: Color
No ratings yet
COMP5048 Visual Analytics: Color
23 pages
SQL Server DBA Interview Questions Part 1
No ratings yet
SQL Server DBA Interview Questions Part 1
5 pages
PHPExcel Developer Documentation PDF
No ratings yet
PHPExcel Developer Documentation PDF
52 pages
Case / Cardiosoft: Ge Healthcare
No ratings yet
Case / Cardiosoft: Ge Healthcare
24 pages
MySQL Presentation
No ratings yet
MySQL Presentation
25 pages
Some Important Ques
No ratings yet
Some Important Ques
14 pages
Database Management Systems (DBMS) : California State University Northridge
No ratings yet
Database Management Systems (DBMS) : California State University Northridge
13 pages
Assessment - Largest Objects - Tables and Modules - Scripts and Output Report
No ratings yet
Assessment - Largest Objects - Tables and Modules - Scripts and Output Report
24 pages
ArangoDB PerformanceCourse Release 1
No ratings yet
ArangoDB PerformanceCourse Release 1
71 pages
Steps For Tunning A SQL Query SQL ID in Oracle Database 1729750354
No ratings yet
Steps For Tunning A SQL Query SQL ID in Oracle Database 1729750354
2 pages
Best Abap Guidelines For Sap Hana
No ratings yet
Best Abap Guidelines For Sap Hana
2 pages
Lecture 1-Oracle Introduction
No ratings yet
Lecture 1-Oracle Introduction
11 pages
Why A 99%+ Database Buffer Cache Hit Ratio Is Not Ok: Cary Millsap/Hotsos Enterprises, LTD
No ratings yet
Why A 99%+ Database Buffer Cache Hit Ratio Is Not Ok: Cary Millsap/Hotsos Enterprises, LTD
13 pages
Lista5 Equacao 2 Grau
No ratings yet
Lista5 Equacao 2 Grau
108 pages
Merge Docs
No ratings yet
Merge Docs
40 pages
Azure Data Engineering Course Content Day Wise.
No ratings yet
Azure Data Engineering Course Content Day Wise.
6 pages
Ibps Model Paper
No ratings yet
Ibps Model Paper
48 pages
Dba Interview Questions & Answers
No ratings yet
Dba Interview Questions & Answers
43 pages
DBMS Lab 2025
No ratings yet
DBMS Lab 2025
7 pages
All About SAP BI.... - Real Time Issues
No ratings yet
All About SAP BI.... - Real Time Issues
4 pages
NI Tutorial 3936 en
No ratings yet
NI Tutorial 3936 en
5 pages
Technical Skill Enhancement Program (TSEP)
No ratings yet
Technical Skill Enhancement Program (TSEP)
17 pages
Cse - Software Engineering
No ratings yet
Cse - Software Engineering
3 pages
Mongodblabmanual1 240305075254 f531f8f5
No ratings yet
Mongodblabmanual1 240305075254 f531f8f5
73 pages
Informix 4GL Statements
No ratings yet
Informix 4GL Statements
59 pages
DB2 702 Objective
No ratings yet
DB2 702 Objective
3 pages
Advanced Query Tuning Using IBM Data Studio
No ratings yet
Advanced Query Tuning Using IBM Data Studio
64 pages
Lab 5 - Working With Relational Data Stores in The Cloud
No ratings yet
Lab 5 - Working With Relational Data Stores in The Cloud
15 pages

07 QueryOptimisation-no Blanks

Uploaded by

07 QueryOptimisation-no Blanks

Uploaded by

DATA3404 Data Science Platforms

Week 7: Query Optimisation

DATA3404 "Data Science Platforms" - 2020 (Roehm) 1

DATA3404 "Data Science Platforms" - 2020 (Roehm) 2

DATA3404 "Data Science Platforms" - 2020 (Roehm) 3

Motivation for Query Optimisation

DATA3404 "Data Science Platforms" - 2020 (Roehm) 4

– The goal of query optimization is to find a suitable execution plan.

DATA3404 "Data Science Platforms" - 2020 (Roehm) 6

– The overall process is called cost based optimization.

DATA3404 "Data Science Platforms" - 2020 (Roehm) 7

The Importance of I/O Cost

DATA3404 "Data Science Platforms" - 2020 (Roehm) 8

Exercise 1: Query Costs

sid name address dob sid uosCode grade

DATA3404 "Data Science Platforms" - 2020 (Roehm) 11

Equivalent Algebra Expressions

– An equivalence rule says that expressions of two forms are equivalent

DATA3404 "Data Science Platforms" - 2020 (Roehm) 12

DATA3404 "Data Science Platforms" - 2020 (Roehm) 13

Selections with conditions joined with ˄ σ c1∧ ...∧ cn ( R) ≡ σ c1 ( . . . σ cn ( R))

Nested selections operations commute σ c1 (σ c 2 (R)) ≡ σ c 2 (σ c1 (R))

101 Alice 21 Austria πname,home Alice Austria

102 Bob 32 Brazil Bob Brazil

103 Clare 23 Chile Clare Chile

Projection operations cascade (

Projections and Selections

E.sid uosCode S.sid Name 102 COMP5338 103 Clare

E.sid uosCode S.sid Name S.sid Name E.sid uosCode

DATA3404 "Data Science Platforms" - 2020 (Roehm) 18

Enrolled Student Enrolled

Pushing down projections

DATA3404 "Data Science Platforms" - 2020 (Roehm) 20

DATA3404 "Data Science Platforms" - 2020 (Roehm) 21

[Source: DB2 lecture, Uni Tuebingen]

k joins n-k-1 joins

– Let #$ be the number of possibilities to construct a binary tree of i inner

DATA3404 "Data Science Platforms" - 2020 (Roehm) 23

Join results must C ⋈

Reminder: Passing Records between Operations

3. Next operation reads in temporary relation.

DATA3404 "Data Science Platforms" - 2020 (Roehm) 26

DATA3404 "Data Science Platforms" - 2020 (Roehm) 27

Dynamic Programming in Bottom-Up Query Optimization

DATA3404 "Data Science Platforms" - 2020 (Roehm) 29

Exercise 2: Execution Trees

DATA3404 "Data Science Platforms" - 2020 (Roehm) 30

BLOCK NESTED JOIN

INDEX SCAN ON INDEX SCAN

TABLE SCAN Car TABLE SCAN Trip

BLOCK NESTED JOIN

TABLE SCAN Car (pod=5) TABLE SCAN Trip

Exercise 3: Choosing an Optimal Plan

PROJECTION (tid,cid) Cost estimate

DATA3404 "Data Science Platforms" - 2020 (Roehm) 33

Plan using Pushed-down Selection

DATA3404 "Data Science Platforms" - 2020 (Roehm) 34

DATA3404 "Data Science Platforms" - 2020 (Roehm) 35

Plan using Unclustered Secondary Index

A plan suitable if we add an unclustered Cost estimate

DATA3404 "Data Science Platforms" - 2020 (Roehm) 36

– Outer block is optimized with the Nested block to optimize:

FYI: Optimising Dynamic Queries

– More secure than static SQL

DATA3404 "Data Science Platforms" - 2020 (Roehm) 39

Next Lecture (after the Easter break)

DATA3404 "Data Science Platforms" - 2020 (Roehm) 40

You might also like