0% found this document useful (0 votes)
25 views43 pages

Chapter15 1

The document discusses query processing and execution in a database system. It describes the major steps of query compilation including parsing, rewrite, and physical plan generation. It also covers query execution, explaining concepts like scanning, sorting, indexing, and cost-based optimization. The goal of the query processor is to efficiently execute SQL queries by converting them to a sequence of low-level operations.

Uploaded by

niloy2105044
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views43 pages

Chapter15 1

The document discusses query processing and execution in a database system. It describes the major steps of query compilation including parsing, rewrite, and physical plan generation. It also covers query execution, explaining concepts like scanning, sorting, indexing, and cost-based optimization. The goal of the query processor is to efficiently execute SQL queries by converting them to a sequence of low-level operations.

Uploaded by

niloy2105044
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 43

Chapter 15 (TCDS)

Query Execution
Sukarna Barua
Associate Professor, CSE, BUET
03/20/2024
The Query Processor

 Functions of query processor:

 Converts high level SQL queries into a sequence of database operations and
executes those operations.
 Converts high level query to a detailed description.

 Use query algorithms to efficiently execute a query.

03/20/2024
The Query Processor

 Major approaches in query processing:

 Scanning, hashing, sorting, and indexing

 Algorithms have significantly different costs and structures:

 Some algorithms assume main memory is available at least one of the relations involved
in an operation.
 Others assume that the arguments are too big to fit in the main memory.

 Query processing has two parts:


 Query compilation

 Query execution.
03/20/2024
Query Compilation

 Three major steps in query compilation:


(a) Parsing:
 A parse tree is constructed from the SQL query.
 Also known as expression tree. [ when parse tree is represented using relational algebra
operators]
 This representation is more succinct.

 Example: Parse tree shown (right) for the given SQL (left).

03/20/2024
Query Compilation

 Three major steps in query compilation :


(b) Query rewrite:
 Parse tree is converted to an initial query plan.
 Usually an algebraic representation of the query
 Initial plan is transformed to a logical query plan.
 Logical plan require less time to execute than
initial plan.

03/20/2024
Query Compilation

 Three major steps in query compilation:


(c) Physical plan generation
 Converts logical query plan to a physical query plan.
 Selects algorithms to implement each of the operations
in the logical plan.
 Physical plan includes details such as
- How query relations are accessed.
- When and if a relation is to be sorted.

03/20/2024
Query compilation

 Three major steps in query compilation


(c) Physical plan generation
 Selects the best physical plan with lowest cost.

03/20/2024
Query Compilation

 Query rewrite + physical plan generation


= Query optimizer

03/20/2024
Issues to Consider

 Issue 1: What of the algebraically equivalent forms of a query that leads to the
most efficient algorithms for answering a query?
 Issue 2: For each operation, what algorithms should be used to implement that
operation?
 Issue 3: How should the operations pass data from one to the other, e.g., in a
pipelined fashion, in memory buffers, or via the disk?

03/20/2024
Best Query Plan

 Metadata to consider for best query plan generation:


 The size of each relation
 Statistics such as the approximate number and frequency of different values
for an attribute
 Existence of certain indexes
 Layout of data on disk

03/20/2024
Issues to Consider

 Image: https://www.cs.emory.edu/~cheung/Courses/554/Syllabus/4-query-exec/phys-ops.html

03/20/2024
Physical Query Plan Operators

 Physical query plans are built from operators


 Physical operators implementations
 Relational algebra operations:
 One of the operations of relational algebra operations
 Example: , , etc.

 Non-relational algebra operation:


 Scanning, Sorting while scanning, etc.
 Example: table-scan, index-scan, etc.

03/20/2024
Scanning operation

 Scanning
 Read the entire contents of a relation R.
 Read only those tuples of R that satisfy a given predicate.

03/20/2024
Scanning Operation

 Scanning approaches
 Table-scan: R is stored in secondary memory, tuples are arranges in blocks.
- Blocks are already known to DBMS.
- Read the blocks one by one.
- This is called table-scan.
 Usage:
- When all blocks of R must be read.

03/20/2024
Scanning Operation

 Scanning approaches
 Index-scan: There is an index on any attribute of R.
- Read the index.
- Use the index to locate all the blocks of R.
- Read the blocks one by one according to index.
- Blocks are read sorted by index-attribute.
 Usage:
- When blocks location of are not known to DBMS.
- When tuples satisfying a condition on an attribute to be retrieved.
[index must be on attribute ]

03/20/2024
Sorting While Scanning

 Why sorting while scanning?


 Query have ORDER BY clause. Hence, sorting is required for the final output.
 Some approaches of relationship algebra requires one or both arguments to be
sorted relations.

03/20/2024
Sort-Scan Operation

 Sort-scan operation:
 Sorts the relation while scanning.
 If is to be sorted by attribute and there is a B-tree index on
- An index-scan produces sorted .
 If fits in main memory:
- Retrieve tuples of using table-scan or index-scan.
- Use a main-memory sorting algorithm.
 If R is too large to fit in main memory:
- Use multi-way merge-sort [ discussed later ].

03/20/2024
Query Execution Cost

 A query consists of several relational algebra operations.


 A physical query plan consists of several physical operators.
 Each operator implements an operation (relational/non-relational).
 Assumptions:
 Arguments of operation are in disk.
 Final result is left in main memory or pipelines [don’t matter for cost
calculation, why?]
 Measure of cost:
 Number of disk I/Os. [primary cost]
 Why? It takes longer to get data from disk than main memory.

03/20/2024
Parameters for Measuring Cost
 Main memory cost metric:
 Main memory is divided into buffers.
 : number of main-memory buffers available to a operator.
 can be:
- Entire main memory or
- A portion of main-memory [typically when several operations share main
memory]

03/20/2024
Parameters for Measuring Cost
 Secondary memory (disk) cost metric:
 Data is accessed one block at a time from disk.
 Three parameters: , and .
 Number of blocks to hold R in disk.
- Can be written as , if is implied.
 Number of tuples in R.
- Can be written as , if R is implied.
- T(B) is the number of tuples in a single block.
 Number of distinct values of an attribute “” in R.

03/20/2024
I/O Cost for Scan Operation
 Cost of scan
 Number of disk I/Os is approximately:
 [If is clustered]
 [If is not clustered, and tuples are stored along with other tuples in disk
blocks]
 Cost =

 We assume all relations are clustered.

03/20/2024
I/O Cost for Scan Operation
 Cost of sort-scan
 If R fits in main memory:
- Readinto memory.
- Perform an in-memory sort on .
- Cost =

03/20/2024
I/O Cost for Index-scan Operation

 Cost of index-scan
 Read the index first: blocks read.
 Read the blocks of : blocks read.
 Total = [B(I) << B(R)]
= [If
 Not useful when full is required.
 Useful when only a part of is required.
 Only relevant blocks of are retrieved.

03/20/2024
Iterators for Physical Plan Operators
 Iterators are implemented for physical operators:
 Returns result of operator one tuple at a time.
 Iterators have following three methods:
 Open: initializes data structure for getting blocks and tuples.
 Getnext: returns the next tuple in the result.
 Close: clears data structure.

03/20/2024
Types of algorithms for physical plan
operators
 One pass algorithms
 Involve reading data only once from disk.
 Require at least one argument to fit in main memory.
 Two-pass algorithms
 Relations are too large to fit in main memory.
 Involve two times read from disk.
 Read first time from disk, process in some way, write to disk, and reads a
second time from disk.

03/20/2024
Types of Algorithms For Physical Plan
Operators
 Many-pass algorithms
 Data has no limit.
 Involve three or more passes.

03/20/2024
Types of Physical Plan Operations
 Tuple-at-a-time, Unary Operations
 Example operators:
 Selection
 Projection:
 Do not require entire relation in memory at once.
 Read one block at a time in a main-memory buffer and produce the
output.

03/20/2024
Types of Physical Plan Operations
 Full-relation, Unary Operations
 Example operators:
 Gamma: (grouping operator)
 Delta: (duplicate-elimination operator)
 Require all or most of the tuples in memory at once. [ Why? ]
 One pass algorithms can be used only if fits in M.

03/20/2024
Types of Physical Plan Operations
 Full-relation, Binary Operation
 Example operators:
 Union:
 Intersection:
 Natural join:
 Product:
 One pass algorithm may be used if at least one argument fits in main-memory.

03/20/2024
One pass algorithm for tuple-at-a-time operation
 Relational algebra operations: and
 Approach:
 Read blocks one at a time in an input buffer.
 Perform the operation on each tuple and move selected tuple to the output
buffer.
 Requirement: regardless of .
 I/O Cost:
 if is clustered [table-scan].
 if is not clustered.
 Exception: For selection with a condition on an attribute for which an index is
available, use index to retrieve a subset of .
 Cost of index-scan: < B(R)

03/20/2024
One-pass algorithm for Unary, Full-Relation Operation
 Relational algebra operation: [Duplicate elimination]
 Use one memory block to hold one block of
 Use remaining buffers to hold output tuples [single copy of each tuple of ].
 Algorithm:
 For each tuple in retrieved block:
 If it is already in output tuples, then discard (don’t copy to output buffer).
 Otherwise copy to output block.

03/20/2024
One-pass algorithm for Unary, Full-Relation Operation
 Cost of checking whether a tuple already exists in output list:
 if checking required proportional to [size of output], total time = for duplicate
checking.
 used a hash table with a large number of buckets for output list.

 Requirement: . [size of R with unique tuples should fit in M-1 buffers ]

 What if ?
 Outputs doesn't fit in main memory.
 Output must be moved to disk back and forth, resulting in thrashing.
 Increases cost for duplicate checking.

03/20/2024
One-pass algorithm for Unary, Full-Relation Operation

 Grouping operation:
 Involves zero or more grouping attribute
 One or more aggregate attributes
 Algorithm:
 If we create one entry for each group in main memory:
- Scan the blocks of one at a time.
- For each tuple, find the entry corresponding to the tuple and update aggregated
result of the group.
- For aggregate, record the and seen so far.
- For aggregation, add one to accumulated value.
- For aggregation, add the value of a to the accumulated sums.
- For , use two accumulations, and

03/20/2024
One-pass algorithm for Unary, Full-Relation Operation

 Result generation:
 When all tuples of are read.
 Output contains one entry for each group from the main memory.
 Requirement:
 Efficient data structure for finding group entry for a tuple.
 Hash tables or balanced trees can be used.
 Number of disk I/Os:
 Memory buffers requirement: .
[Size of R with unique rows should fit in M-1 buffers]

03/20/2024
Binary operations: Bag Union
 Bag union:
 R and S are bags. Union keeps all tuples.
 Algorithm:
 First copy every tuple of
 Then copy every tuple of
 Number of disk I/Os:
 Number of memory buffers required: suffices.
[ no need to store; pipelining allows outputs one tuple-at-a-time ]

03/20/2024
Binary operations: Set Union
 Set union: R U S
 R and S are sets. Union keeps one copy of each common tuple occurring in both R and S.
 Algorithm:
 Assume
 First read and copy in main memory buffers, build a main memory data structure on
search key [ entire tuple is the search key ]
 Copy all tuples of into output.
 Retrieve one block of at a time in main memory buffer. For each tuple of , check if it
is also in [ using main memory data structure on search key ]
 If it is not in , copy it to output.
 Efficient data structure is required for storing in main memory so that
check operation can be done efficiently.
 If it is in , don’t copy.

03/20/2024
Binary operations: Set Union
 Set union: R U S
 Number of disk I/Os required:
 Number of memory buffers required: .
[ At least one of R and S must fit in M-1 buffers ]

03/20/2024
Binary operations: Set Intersection
 Set Intersection: .
 and are sets. Intersection keeps only common tuples of R and S.
 Algorithm:
 Assume .
 Read into main memory buffers, build a search data structure [key is full
tuple]
 Read each block of For each tuple of , check if it is also in
[using main memory data structure on search key to check]
 If it is in , copy it to output.
 If it is not in , don’t copy.
 Number of disk I/Os required: .
 Number of memory buffers required: .
03/20/2024
Binary operations: Set Difference
 Set Difference: .
 andare sets.
 Keeps tuples of R that are not in S.
 Algorithm:
 Assume
 Readinto main memory buffers, build a search data structure
[search key is full tuple]
 Read each block of For each tuple of , check if it is also in
 If it is not in , copy it to output.
 If it is in , don’t copy to output.
 Number of disk I/Os required:
 Number of memory buffers required:
[ At least one of R and S must fit in M-1 buffers ]

03/20/2024
Binary operations: Bag Intersection
 Bag intersection: .
 R and S are bags.
 Bags allow multiples copies of the same tuple.
 Also known as multi-sets.
 An element appears in the intersection the minimum of the number of times it
appears in either.

03/20/2024
Binary operations: Bag Intersection
 Algorithm for bag intersection:
 Assume
 Read into main memory buffers, build a search data structure [ key is full tuple]
along with a count value.
 Main memory stores only unique tuples of .
 Count is the number of times tuple occurs in .
 Read each block of For each tuple of , check if it is also in and check the
count:
 If count is positive, decrease by one, and copy the tuple to the output.
 If count is zero, no action is required.

 Number of disk I/Os required:


 Number of memory buffers required:
03/20/2024
Binary operations: Product
 Product:
 R and S are sets. Product implements the cartesian product.
 Algorithm:
 Assume
 Read into main memory buffers, no special data structure is required.
 Read each block of For each tuple of R:
 Concatenate it with each tuple of S in main memory
 Send the concatenated tuple to the output.

 Number of disk I/Os required: .


 Number of memory buffers required:

03/20/2024
Binary operations: Natural Join
 Natural join: .
 Algorithm:
.
 Read into main memory buffers, build a search data structure on with search
key = .
 Read each block of in one remaining memory buffer. For each tuple of :
 Find the tuple of that agrees with .
 Concatenate the matched tuple of with in main memory.
 Send the concatenated tuple to the output.

 Number of disk I/Os required: .


 Number of memory buffers required:

03/20/2024

You might also like