0% found this document useful (0 votes)
10 views42 pages

3 - Query Tuning

Uploaded by

Hunter Money
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views42 pages

3 - Query Tuning

Uploaded by

Hunter Money
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Query tuning

Viet-Trung Tran
SoICT

9/27/21 Database Tuning


1
What is query tuning

• Rewrite query to run faster


• First thing to do if query is slow
• Other tuning approaches related to query
• Adding indexes
• Changing schema ( 3, 4 NF, etc)
• Modify transaction lengh

9/27/21 Database Tuning


2
1. Overview

• What is query processing


• Phrases of query processing
• Parser
• Optimizer
1.1. What is query processing

• The entire process or activities involved in retrieving data from the


database
• SQL query translation into low level instructions (usually relational algebra)
• Query optimization to save resources, cost estimation or evaluation of query
• Query execution for the extraction of data from the database.
1.2. Phases of query processing

SQL

Optimized
Parser Query plan
execution
plan
Optimizer
Code
Generator

Code for executing


1.3. Parser

• Scans and parses the query into individual tokens and examines for
the correctness of query
• Does it containt the right keywords?
• Does it conform to the syntax?
• Does it containt the valid tables, attributes?
• Output: Query plan
• E.g.
• Input: SELECT balance FROM account WHERE balance < 2500
• Output: Relational algebra expression
• But it’s not unique
1.4. Optimizer

• Input: RA expression

• Output: Query execution plan


• Query execution plan = query plan + the algorithms for the
executions of RA operations
• Aims to choose the cheapest execution plan out of the
possible ones
• Step 1: Equivalence transformation
• Step 2: Annotation for the algorithm of the RA expression
• Step 3: Cost estimation for different query execution plans
2. Understanding optimizer

• Choose the cheapest execution plan out of the possible ones


• Step 1: Equivalence transformation
• Step 2: Annotation for the algorithmic execution of the RA expression
• Step 3: Cost estimation for different query execution plans
2.1. Step 1: Equivalence transformation

• RA expressions are equivalent if they generate the same set of tuples


on every database instance
• Equivalence rules:
• Transform one relational algebra expression into equivalent one
• Similar to numeric algebra: a + b = b + a, a(b + c) = ab + ac, etc
• Why producing equivalent expressions?
• equivalent algebraic expressions give the same result
• but usually the execution time varies significantly
2.1. Step 1: Equivalence transformation

• Equivalance tranformation rules


• (1) Conjunctive selection operations can be deconstructed into a
sequence of individual seections; cascade of 𝜎
• 𝜎!! ∧ !" 𝐸 = 𝜎!! 𝜎!" 𝐸
• (2) Selection operations are commutative
• 𝜎!! 𝜎!" 𝐸 = 𝜎!" 𝜎!! 𝐸
• (3) Only the final operations in a sequence of projection operations
is needed; cascade of Π
• Π#! Π#" … Π## 𝐸 … = Π#! (𝐸)
• (4) Selections can be combined with Cartesian products and theta
joins
• 𝜎!! 𝐸$ × 𝐸% = 𝐸$ ⋈!! 𝐸%
• 𝜎!! 𝐸$ ⋈!" 𝐸% = 𝐸$ ⋈!! ∧ !" 𝐸%
2.1. Step 1: Equivalence transformation

• Equivalance tranformation rules


• (5) Theta Join operations are commutative
• 𝐸! ⋈" 𝐸# = 𝐸# ⋈" 𝐸!
• (6) Natural join operations are associative
• 𝐸! ⋈ 𝐸# ⋈ 𝐸$ = (𝐸! ⋈ 𝐸# ) ⋈ 𝐸$
• Theta join are associative in the follwoing manner where θ# involves
attributes from E2 and E3 only
• (𝐸! ⋈"! 𝐸# ) ⋈"" ∧ "# 𝐸$ = 𝐸! ⋈! ∧ "# (𝐸# ⋈"" 𝐸$ )
2.1. Step 1: Equivalence transformation

• Equivalance tranformation rules


• (7) Selection distributes over joins in the following ways
• If predicate involves attributes of E1 only
• 𝜎"! 𝐸! ⋈"" 𝐸# = 𝜎"! (𝐸! ) ⋈"" 𝐸#
• If predicate θ! involves only attributes of E1 and θ# involves only
attributes of E2 (a consequence of rule 7 and 1)
• 𝜎"! ∧ "" 𝐸! ⋈"# 𝐸# = 𝜎"! (𝐸! ) ⋈"# 𝜎"" (𝐸# )
2.1. Step 1: Equivalence transformation

• Equivalance tranformation rules


• (8) Projection distributes over join as follows
• Π&!∪&" (𝐸! ⋈" 𝐸# ) = Π&! (𝐸! ) ⋈" Π&" (𝐸# )
• If 𝜃 involves attributes in 𝐿! ∪ 𝐿# only and 𝐿( contains attributes of 𝐸(
• (9) The set operations union and intersection are
commutative
• 𝐸! ∪ 𝐸# = 𝐸# ∪ 𝐸!
• 𝐸! ∩ 𝐸# = 𝐸# ∩ 𝐸!
• (10) The union and intersection are associative
• (𝐸! ∪ 𝐸# ) ∪ 𝐸$ = 𝐸! ∪ (𝐸# ∪ 𝐸$ )
2.1. Step 1: Equivalence transformation

• Equivalance tranformation rules


• (11) The selection operation distributes over union,
intersection, and set-difference
• 𝜎" 𝐸! ∪ 𝐸# = 𝜎" (𝐸! ) ∪ 𝜎" (𝐸# )
• 𝜎" 𝐸! ∩ 𝐸# = 𝜎" (𝐸! ) ∩ 𝜎" (𝐸# )
• 𝜎" 𝐸! − 𝐸# = 𝜎" (𝐸! ) − 𝜎" (𝐸# )
• (12) The project operation distributes over the union
• Π& 𝐸! ∪ 𝐸# = Π& (𝐸! ) ∪ Π& (𝐸# )
2.2. Step 2: Execution algorithms of RA
operations

• Algebra expression is not a query execution plan.


• Additional decisions required:
• which indexes to use, for example, for joins and selects?
• which algorithms to use, for example, sort-merge vs. hash join?
• materialize intermediate results or pipeline them?
2.2. Step 2: Execution algorithms of RA
operations

• Basic Operators
• One-pass operators:
• Scan
• Select
• Project
• Multi-pass operators:
• Join
• Various implementations
• Handling of larger-than-memory sources
• Aggregation, union, etc.
2.2. Step 2: Execution algorithms of RA
operations

• 1-Pass Operators: Scanning a Table


• Sequential scan: read through blocks of table
• Index scan: retrieve tuples in index order
2.2. Step 2: Execution algorithms of RA
operations

• Nested-loop JOIN

For each tuple tr in r {


for each tuple ts in s {
if (tr and ts satisfy the join condition) {
add tuple tr x ts to the result set
}
}
}

• No index needed
• Any join condition types
• Expensive: O(n2)
2.2. Step 2: Execution algorithms of RA
operations

• Single-loop JOIN (Index-based)

for each tube tr in R {


seach for ts in s thought index {
if ts.exist() {
add tr x ts to the result set
}
}
}
• Index needed
• Cheaper: O(nlogm)
2.2. Step 2: Execution algorithms of RA
operations

• Sort-merge JOIN
• Requires data physically sorted by join attributes: Merge and join
sorted files, reading sequentially a block at a time
• Maintain two file pointers
• While tuple at R < tuple at S, advance R (and vice versa)
• While tuples match, output all possible pairings
• Very efficient for presorted data. Otherwise, may require a sort (adds
cost + delay)
2.2. Step 2: Execution algorithms of RA
operations

• Partition-hash JOIN
• Hash two relations on join attributes
• Join buckets accordingly
2.2. Step 2: Execution algorithms of RA
operations

• Execution Strategy: Materialization vs. Pipelining


• Execution strategy defines how to walk the query execution plan
• Materialization
• Pipelining

Join
PressRel.Symbol = EastCoast.CoSymbol

Join Project
PressRel.Symbol = Clients.Symbol
CoSymbol

Select
Client = “Atkins”

Scan Scan Scan


PressRel Clients EastCoast
2.2. Step 2: Execution algorithms of RA
operations

• Materialization
• Performs the innermost or leaf-level operations first of the query
execution plan
• The intermediate result of each operation is materialized into
temporary relation and becomes input for subsequent operations.
• The cost of materialization is the sum of the individual operations plus
the cost of writing the intermediate results to disk
• lots of temporary files, lots of I/O.
2.2. Step 2: Execution algorithms of RA
operations

• Pipelining
• Operations form a queue, and results are passed from one operation
to another as they are calculated
• Pipelining restructures the individual operation algorithms so that they
take streams of tuples as both input and output.
• Limitation
• algorithms that require sorting can only use pipelining if the input is already
sorted beforehand
• since sorting by nature cannot be performed until all tuples to be sorted are known.
2.3. Step 3: Cost estimation

• Each relational algebra expression can result in many query execution


plans
• Some query execution plans may be better than others
• Finding the fastest one
• Just an estimation under certain assumptions
• Huge number of query plans may exist
2.3. Step 3: Cost estimation

• Cost estimation factors


• Catalog information: database maintains statistics about relations
• Ex.
• number of tuples per relation
• number of blocks on disk per relation
• number of distinct values per attribute
• histogram of values per attribute
• Problems
• cost can only be estimated
• updating statistics is expensive, thus they are often out of date
2.3. Step 3: Cost estimation

• Choosing the cheapest query plan


• Problem:
• Estimating cost for all possible plans too expensive.
• Solutions:
• pruning: stop early to evaluate a plan
• heuristics: do not evaluate all plans
• Real databases use a combination of
• Apply heuristics to choose promising query plans.
• Choose cheapest plan among the promising plans using pruning.
• Examples of heuristics:
• perform selections as early as possible
• perform projections early avoid Cartesian products
2.3. Step 3: Cost estimation

• Heuristic rules
• Break apart conjunctive selections into a sequence of simple selections
• Move 𝜎 down the query tree as soon as possible
• Replace 𝜎-x pairs by ⋈
• Break apart and move Π down the tree as soon as possible
• Perform the joins with the smallest expected result first
Remark

• Query processing is the entire process or activities involved in


retrieving data from the database
• Parser
• Optimizer
• Code generator
• Query optimizer
• Step 1: Equivalence transformation
• Step 2: Annotation for the algorithm of the RA expression
• Step 3: Cost estimation for different query execution plans
Why query tuning? Why query optimizer is not
enough?

• Optimizers are not perfect:


• transformations produce only a subset of all possible query plans
• only a subset of possible annotations might be considered
• cost of query plans can only be estimated
• Query Tuning: Make life easier for your query optimizer!

9/27/21 Database Tuning


30
Figure out problematic queries

• Which queries should be rewritten?


• Rewrite queries that run “too slow”
• How to find these queries?
• query issues far too many disc accesses,
for example, point query scans an entire table
• you look at the query plan and see that relevant indexes are not used

9/27/21 Database Tuning


31
Overview of query tuning

• avoid DISTINCTs
• subqueries often inefficient
• temporary tables might help
• use clustering indexes for joins
• HAVING vs. WHERE
• use views with care
• system peculiarities: OR and order in FROM clause

9/27/21 Database Tuning


32
Testbed scenario

• Employee(ssnum, name, manager, dept, salary, numfriends)


• clustering index on ssnum
• non-clustering index on name
• non-clustering index on dept
• keys: ssnum, name
• Students(ssnum, name, course, grade)
• clustering index on ssnum
• non-clustering index on name
• keys: ssnum, name
• Techdept(dept, manager, location)
• clustering index on dept
• key: dept
• manager may manage many departments
• a location may contain many departments

9/27/21 Database Tuning


33
DISTINCT

• How can DISTINCT hurt?


• DISTINCT forces sort or other overhead.
• If not necessary, it should be avoided.
• Query: Find employees who work in the information systems
department.
• SELECT DISTINCT ssnum
FROM Employee
WHERE dept = ’information systems’
• DISTINCT not necessary:
• ssnum is a key of Employee, so it is also a key of a subset of Employee.
• Note: Since an index is defined on ssnum, there is likely to be no
overhead in this particular examples.

9/27/21 Database Tuning


34
Non-Correlated Subqueries
• Many systems handle subqueries inefficiently.
• Non-correlated: attributes of outer query not used in inner query.
• Query:
• SELECT ssnum
FROM Employee
WHERE dept IN (SELECT dept FROM Techdept)
• May lead to inefficient evaluation:
• check for each employee whether they are in Techdept
• index on Employee.dept not used!
• Equivalent query:
• SELECT ssnum
FROM Employee, Techdept
WHERE Employee.dept = Techdept.dept
• Efficient evaluation:
• look up employees for each dept in Techdept
use index on Employee.dept

9/27/21 Database Tuning


35
Temporary tables

• Temporary tables can hurt in the following ways:


• force operations to be performed in suboptimal order
(optimizer often does a very good job!)
• creating temporary tables i.s.s.1 causes catalogue update – possible
concurrency control bottleneck
• system may miss opportunity to use index
• Temporary tables are good:
• to rewrite complicated correlated subqueries
• to avoid ORDER BYs and scans in specific cases (see example)

9/27/21 Database Tuning


36
Ex. Unnecessary temp table

• Query: Find all IT department employees who earn more than


40000.
• SELECT * INTO Temp
FROM Employee
WHERE salary > 40000
SELECT ssnum
FROM Temp
WHERE Temp.dept = ’IT’
• Inefficient SQL:
• index on dept can not be used
• overhead to create Temp table (materialization vs. pipelining)
• Efficient SQL:
• SELECT ssnum
FROM Employee
WHERE Employee.dept = ’IT’
AND salary > 40000

9/27/21 Database Tuning


37
Joins: Use clustering indexes and numeric
values

• Query: Find all students who are also employees.


• Inefficient SQL:
• SELECT Employee.ssnum
FROM Employee, Student
WHERE Employee.name = Student.name
• Efficient SQL:
• SELECT Employee.ssnum
FROM Employee, Student
WHERE Employee.ssnum = Student.ssnum
• Benefits:
• Join on two clustering indexes allows merge join (fast!).
• Numerical equality is faster evaluated than string equality.

9/27/21 Database Tuning


38
Don’t use HAVING where WHERE is enough

• Query: Find average salary of the IT department.


• Inefficient SQL:
• SELECT AVG(salary) as avgsalary, dept
FROM Employee
GROUP BY dept
HAVING dept = ’IT’
• Problem: May first compute average for employees of all
departments.
• Efficient SQL: Compute average only for relevant employees.
• SELECT AVG(salary) as avgsalary, dept
FROM Employee
WHERE dept = ’IT’
GROUP BY dept

9/27/21 Database Tuning


39
Use views with care

• Views: macros for queries


• queries look simpler
• but are never faster and sometimes slower
• Creating a view:
• CREATE VIEW Techlocation
AS SELECT ssnum, Techdept.dept, location
FROM Employee, Techdept
WHERE Employee.dept = Techdept.dept
• Using the view:
• SELECT location
FROM Techlocation
WHERE ssnum = 452354786
• System expands view and executes:
• SELECT location
FROM Employee, Techdept
WHERE Employee.dept = Techdept.dept
AND ssnum = 452354786

9/27/21 Database Tuning


40
• Query: Get the department name for the employee with social
security number 452354786 (who works in a technical
department).
• Example of an inefficient SQL:
• SELECT dept
FROM Techlocation
WHERE ssnum = 452354786
• This SQL expands to:
• SELECT dept
FROM Employee, Techdept
WHERE Employee.dept = Techdept.dept
AND ssnum = 452354786
• But there is a more efficient SQL (no join!) doing the same thing:
• SELECT dept
FROM Employee
WHERE ssnum = 452354786

9/27/21 Database Tuning


41
System peculiarity: Indexes and OR

• Some systems never use indexes when conditions are OR-


connected.
• Query: Find employees with name Smith or who are in the
acquisitions department.
• SELECT Employee.ssnum
FROM Employee
WHERE Employee.name = ’Smith’
OR Employee.dept = ’acquisitions’
• Fix: use UNION instead of OR
• SELECT Employee.ssnum
FROM Employee
WHERE Employee.name = ’Smith’
UNION
SELECT Employee.ssnum
FROM Employee
WHERE Employee.dept = ’acquisitions’

9/27/21 Database Tuning


42

You might also like