0% found this document useful (0 votes)

18 views44 pages

Lesson 06

Uploaded by

pramuapex

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views44 pages

Lesson 06

Uploaded by

pramuapex

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

ADVANCED DATABASE MANAGEMENT

SYSTEMS
ICT3273

Query Processing Part II

Nuwan Laksiri
Department of ICT
Faculty of Technology
University of Ruhuna Lecture 06
WHAT WE DISCUSS TODAY ……..
• RECAP QUERY PROCESSING PART I
• OVERVIEW OF QUERY PROCESSING PART II
• SORTING
• JOIN OPERATION
• OTHER OPERATIONS
• EVALUATION OF EXPRESSIONS

NEXT WEEK
• QUERY OPTIMIZATION PART I
• INTRODUCTION
• TRANSFORMATION OF RELATIONAL EXPRESSIONS
• EQUIVALENT RULES
• COST BASED OPTIMIZATION
• HEURISTIC OPTIMIZATION
RECAP
• OVERVIEW
• MEASURES OF QUERY COST
• SELECTION OPERATION
• BASIC ALGORITHMS
• SELECTIONS USING INDICES
• SELECTIONS INVOLVING COMPARISONS
• IMPLEMENTATION OF COMPLEX SELECTIONS
Sorting
• What is Sorting in the context of databases?
• SQL queries can specify that the output be
sorted.
• Several of the relational operations, such as
joins, can be implemented efficiently if the input
relations are sorted.
Sorting
• We may build an index on the relation, and
then use the index to read the relation in sorted
order. May lead to one disk block access for
each tuple.
• For relations that fit in memory, techniques like
quicksort can be used.
• For relations that don’t fit in memory, external
sort-merge is a good choice.
External Sort Merge

• Let M denote memory size (in pages).

1. Create sorted runs.
Repeatedly do the following till the end of
the relation:
a. Read M blocks of relation into memory
b. Sort the in-memory blocks
c. Write sorted data into hard disk.
2. Merge the runs
External Sorting using Sort-Merge Merge
Join Operation
• Most important relational operator
• Potentially very expensive
• Required in all practical queries and
applications
• Often appears in groups of joins
• Many variations with different characteristics,
suited for different situations
Join Operation (Nested Loop Join)
• In its simplest form, a nested loops join
compares each row from one table (known as
the outer table) to each row from the other table
(known as the inner table) looking for rows that
satisfy the join predicate.
Join Operation (Nested Loop Join)
Algorithm
for each tuple tr in r do begin
for each tuple ts in s do begin
test pair (tr,ts) to see if they satisfy the
join condition θ
if they do, add tr·ts to the result
end
end
Join Operation (Nested Loop Join)
• In the worst case, if there is enough memory only to hold one block of each
relation, the estimated cost is
nr  bs + br block transfers, plus
nr + br seeks
• If the smaller relation fits entirely in memory, use that as the inner relation.
• Reduces cost to br + bs block transfers and 2 seeks
• Example Student and Orders
• No of records → Student 5000 Orders 10000
• No of blocks → Student 100 Orders 400
• Assuming worst case memory availability cost estimate is
• With student as outer relation:
• 5000  400 + 100 = 2,000,100 block transfers,
• 5000 + 100 = 5100 seeks
• With Orders as the outer relation
• 10000  100 + 400 = 1,000,400 block transfers and 10,400 seeks

• If smaller relation (student) fits entirely in memory, the cost estimate will be 500
block transfers.
Join Operation (Block Nested-Loop Join)
• Variant of nested-loop join in which every block of inner relation is
paired with every block of outer relation.

FOR EACH BLOCK BR OF R DO BEGIN

FOR EACH BLOCK BS OF S DO BEGIN
FOR EACH TUPLE TR IN BR DO BEGIN
FOR EACH TUPLE TS IN BS DO BEGIN
CHECK IF (TR,TS) SATISFY THE JOIN CONDITION
IF THEY DO, ADD TR • TS TO THE RESULT.
END
END
END
END
Join Operation (Block Nested-Loop Join)
• Worst case estimate: br  bs + br block transfers + 2 * br seeks
• Each block in the inner relation s is read once for each block in the
outer relation
• Best case: br + bs block transfers + 2 seeks.
• Improvements to nested loop and block nested loop algorithms:
• In block nested-loop, use M — 2 disk blocks as blocking unit for outer
relations, where M = memory size in blocks; use remaining two blocks
to buffer inner relation and output
• Cost = br / (M-2)  bs + br block transfers +
2 br / (M-2) seeks
• If equi-join attribute forms a key or inner relation, stop inner loop on
first match
• Scan inner loop forward and backward alternately, to make use of
the blocks remaining in buffer (with LRU replacement)
• Use index on inner relation if available
Join Operation (Indexed Nested-Loop Join)
• Index lookups can replace file scans if
• Join is an equi-join or natural join and
• An index is available on the inner relation’s join attribute
• Can construct an index just to compute a join.
• For each tuple tr in the outer relation r, use the index to look
up tuples in s that satisfy the join condition with tuple tr.
• Worst case: buffer has space for only one page of r, and, for
each tuple in r, we perform an index lookup on s.
• Cost of the join: br (tt + ts) + nr  c
• Where c is the cost of traversing index and fetching all matching s
tuples for one tuple or r
• C can be estimated as cost of a single selection on s using the join
condition.
• If indices are available on join attributes of both r and s,
use the relation with fewer tuples as the outer relation.
Example of nested-loop join costs
• Compute student orders, with student as the outer relation.
• Let orders have a primary b+-tree index on the attribute id, which
contains 20 entries in each index node.
• Since orders has 10,000 tuples, the height of the tree is 4, and one more
access is needed to find the actual data
• student has 5000 tuples
• Cost of block nested loops join
• 400*100 + 100 = 40,100 block transfers + 2 * 100 = 200 seeks
• Assuming worst case memory
• May be significantly less with more memory

• Cost of indexed nested loops join

• 100 + 5000 * 5 = 25,100 block transfers and seeks.
• Cpu cost likely to be less than that for block nested loops join
Merge-Join
1. Sort both relations on their join attribute (if not already
sorted on the join attributes).
2. Merge the sorted relations to join them
1. Join step is similar to the merge stage of the sort-merge
algorithm.
2. Main difference is handling of duplicate values in join
attribute — every pair with
same value on join attribute
must be matched

** if interested please refer

the detailed algorithm in the
reference book
Merge-Join
• Can be used only for equi-joins and natural joins
• Each block needs to be read only once (assuming all tuples for any given
value of the join attributes fit in memory
• Thus the cost of merge join is:
br + bs block transfers + br / bb + bs / bb seeks
• + The cost of sorting if relations are unsorted.
• Hybrid merge-join: if one relation is sorted, and the other has a secondary
b+-tree index on the join attribute
• Merge the sorted relation with the leaf entries of the b+-tree .
• Sort the result on the addresses of the unsorted relation’s tuples
• Scan the unsorted relation in physical address order and merge with
previous result, to replace addresses by the actual tuples
• Sequential scan more efficient than random lookup
Hash-Join
• Applicable for equi-joins and natural joins.
• A hash function h is used to partition tuples of both relations
• H maps joinattrs values to {0, 1, ..., n}, where joinattrs denotes the
common attributes of r and s used in the natural join.
• R0, r1, . . ., rn denote partitions of r tuples
• Each tuple tr  r is put in partition ri where i = h(tr
[joinattrs]).
• R0,, r1. . ., Rn denotes partitions of s tuples
• Each tuple ts s is put in partition si, where i = h(ts
[joinattrs]).
Hash-Join
Hash-Join
• r tuples in ri need only to be compared with s
tuples in si need not be compared with s tuples in
any other partition, since:
• An r tuple and an s tuple that satisfy the join
condition will have the same value for the join
attributes.
• If that value is hashed to some value i, the r
tuple has to be in ri and the s tuple in si.
Other Operations
• Duplicate elimination
• Can be implemented via Hashing or sorting
• Projection
• Perform projection on each tuple followed by
duplicate elimination
• Aggregation
• Can be implemented in manner similar to
duplicate elimination
Other Operations (Set Operations)
• r U s (Union)
Other Operations (Set Operations)
Union
Other Operations (Set Operations)
Union
Other Operations (Set Operations)
• r ∩ s (Intersection)
Other Operations (Set Operations)
Intersection
Other Operations (Set Operations)
Intersection
Other Operations (Set Operations)
•r - s
Other Operations (Set Operations)
Other Operations (Set Operations)
Aggregate Operations
• MAX(),MIN()
• Can be computed by a table scan or by using an
appropriate index
• Eg: SELECT MAX(SALARY) FROM EMPLOYEE;

• COUNT(), AVERAGE(), and SUM()

• Dense index can be used
Aggregate Operations
If GROUP BY clause is included
• The table must first be partitioned into subsets of
tuples
• Each partition (group) has the same value for the
grouping attributes
• Eg: SELECT DNO, AVG(SALARY)
FROM EMPLOYEE
GROUP BY DNO
Evaluation of Expressions
Operator Tree
Evaluation of Expressions
• So far: we have seen algorithms for individual
operations
• Alternatives for evaluating an entire expression tree
• Materialization: generate results of an expression
whose inputs are relations or are already computed,
materialize (store) it on disk. Repeat.
• Pipelining: pass on tuples to parent operations
even as an operation is being executed
Materialization
• Materialized evaluation: evaluate one operation at a time,
starting at the lowest-level. Use intermediate results
materialized into temporary relations to evaluate next-level
operations.
• Ex: In figure below, compute and store

 building="Watson" (department)
then compute the store its join with instructor,
and finally compute the projection on name.
Materialization
• Materialized evaluation is always applicable
• Cost of writing results to disk and reading them back can be quite
high
• Our cost formulas for operations ignore cost of writing results to
disk, so
• Overall cost = sum of costs of individual operations +
cost of writing intermediate results to disk
• Double buffering: use two output buffers for each operation, when
one is full write it to disk while the other is getting filled
• Allows overlap of disk writes with computation and reduces
execution time
Pipelining
• Pipelined evaluation : evaluate several operations
simultaneously, passing the results of one operation on to the
next.
• Ex: In previous expression tree, don’t store result of
 building="Watson" (department)
• Instead, pass tuples directly to the join.. Similarly, don’t
store result of join, pass tuples directly to projection.
• Much cheaper than materialization: no need to store a
temporary relation to disk.
• Pipelining may not always be possible – e.g., sort, hash-join.
• For pipelining to be effective, use evaluation algorithms that
generate output tuples even as tuples are received for inputs to
the operation.
• Pipelines can be executed in two ways: demand driven and
producer driven
Pipelining
• In demand driven or lazy evaluation
• System repeatedly requests next tuple from top level operation
• Each operation requests next tuple from children operations as
required, in order to output its next tuple
• In between calls, operation has to maintain “state” so it knows what to
return next

• In producer-driven or eager pipelining

• Operators produce tuples eagerly and pass them up to their parents
• Buffer maintained between operators, child puts tuples in buffer, parent
removes tuples from buffer
• If buffer is full, child waits till there is space in the buffer, and then
generates more tuples
• System schedules operations that have space in output buffer and can
process more input tuples

• Alternative name: pull and push models of pipelining

Evaluation Algorithms for Pipelining
• Some algorithms are not able to output results
even as they get input tuples
• E.g. merge join, or hash join
• Intermediate results written to disk and then read
back

• Blocking operations
• Operations are pipelined
HOME WORK
• Find more details about the concepts which are discussed in
the class by referring reference books
SUMMARY

• RECAP B+ TREE
• OVERVIEW
• MEASURES OF QUERY COST
• SELECTION OPERATION
• BASIC ALGORITHMS
• SELECTIONS USING INDICES
• SELECTIONS INVOLVING COMPARISONS
• IMPLEMENTATION OF COMPLEX SELECTIONS
REFERENCES

• Fundamentals of database systems

(6th edition) by remez elmasri & shamkant B. Navathe )

• Database Management Systems

(3rd edition) - by Raghu Ramakrishnan and Johannes Gehrke, McGraw Hill,
2003.

• Advanced Database Management Systems

by Rini Chakrabarti, Shibhadra Dasgupta
THANK YOU

DBMS UNIT 4 Part 1
No ratings yet
DBMS UNIT 4 Part 1
15 pages
Query Processing - Short Form
No ratings yet
Query Processing - Short Form
3 pages
Course08 - RelEval
No ratings yet
Course08 - RelEval
22 pages
Unit 3
No ratings yet
Unit 3
63 pages
7-Query Processing
No ratings yet
7-Query Processing
47 pages
05 Optimization
No ratings yet
05 Optimization
58 pages
Session - 10 Querying
No ratings yet
Session - 10 Querying
36 pages
Ch12-Query Processing
No ratings yet
Ch12-Query Processing
34 pages
Q Evaluation
No ratings yet
Q Evaluation
17 pages
Chapter 1 Part II
No ratings yet
Chapter 1 Part II
22 pages
BCS Topic
No ratings yet
BCS Topic
66 pages
Unit IV Part II
No ratings yet
Unit IV Part II
37 pages
Algorithms For Query Processing and Optimization
No ratings yet
Algorithms For Query Processing and Optimization
77 pages
3 Join Optimization
No ratings yet
3 Join Optimization
32 pages
DBMS R19 Unit Iv
No ratings yet
DBMS R19 Unit Iv
25 pages
Lecture11 Query Processing
No ratings yet
Lecture11 Query Processing
37 pages
Query Processing + Optimization: Outline: Operator Evaluation Strategies
No ratings yet
Query Processing + Optimization: Outline: Operator Evaluation Strategies
53 pages
Query Execution
No ratings yet
Query Execution
87 pages
Advance Database Management System: Unit - 2 .Query Processing and Optimization
No ratings yet
Advance Database Management System: Unit - 2 .Query Processing and Optimization
38 pages
Unit-2 Query Processing and Optimization, Query Equivalence, Join Strategies
No ratings yet
Unit-2 Query Processing and Optimization, Query Equivalence, Join Strategies
38 pages
Dbms Chapter 5
No ratings yet
Dbms Chapter 5
54 pages
Solution 03
No ratings yet
Solution 03
6 pages
Chapter 2-Query Processing - 110554
No ratings yet
Chapter 2-Query Processing - 110554
38 pages
Unit 3 - DBMS
No ratings yet
Unit 3 - DBMS
15 pages
DBMS Unit 8
No ratings yet
DBMS Unit 8
7 pages
Chapter 13
No ratings yet
Chapter 13
24 pages
QEII
No ratings yet
QEII
44 pages
ADBMS
No ratings yet
ADBMS
15 pages
Dbms Query Evaluation
No ratings yet
Dbms Query Evaluation
28 pages
13 QP1
No ratings yet
13 QP1
33 pages
06 Query Processing (2) - NDN
No ratings yet
06 Query Processing (2) - NDN
31 pages
Relational Algebra Optimization
No ratings yet
Relational Algebra Optimization
24 pages
DBMS 10 Joins v2
No ratings yet
DBMS 10 Joins v2
38 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
45 pages
QueryProcess Optim
No ratings yet
QueryProcess Optim
60 pages
Data Warehousing: Need For Speed: Join Techniques
No ratings yet
Data Warehousing: Need For Speed: Join Techniques
22 pages
Vu Lec 35
No ratings yet
Vu Lec 35
42 pages
08 Query Processing Strategies and Optimization
No ratings yet
08 Query Processing Strategies and Optimization
32 pages
Relational Operators
No ratings yet
Relational Operators
114 pages
Lecture Notes
No ratings yet
Lecture Notes
96 pages
ADB Chapter 2 DB Part1
No ratings yet
ADB Chapter 2 DB Part1
10 pages
Query Processing
No ratings yet
Query Processing
39 pages
QueryProcessing Sorting
No ratings yet
QueryProcessing Sorting
44 pages
Query Optimization
No ratings yet
Query Optimization
20 pages
Week09 QPO
No ratings yet
Week09 QPO
56 pages
hw3 Sols
No ratings yet
hw3 Sols
5 pages
This
No ratings yet
This
8 pages
CH 13 Updated
No ratings yet
CH 13 Updated
30 pages
05 Vaishnavi Bhosale B1
No ratings yet
05 Vaishnavi Bhosale B1
68 pages
Relational Algebra and Relational Tuple Calculus
No ratings yet
Relational Algebra and Relational Tuple Calculus
18 pages
CSE 444: Database Internals: Section 4: Query Optimizer
No ratings yet
CSE 444: Database Internals: Section 4: Query Optimizer
16 pages
CH 11
No ratings yet
CH 11
19 pages
Notes On DBMS Internals: Preamble
No ratings yet
Notes On DBMS Internals: Preamble
27 pages
Query Processing: Solutions To Practice Exercises
No ratings yet
Query Processing: Solutions To Practice Exercises
5 pages
Module - 1
No ratings yet
Module - 1
94 pages
Notes On DBMS Internals: Preamble
No ratings yet
Notes On DBMS Internals: Preamble
20 pages
Visualizing Data Structures
From Everand
Visualizing Data Structures
Rhonda Hoenigman
No ratings yet
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet
Mastering Data Structures and Algorithms in C and C++
From Everand
Mastering Data Structures and Algorithms in C and C++
Sachin Naha
No ratings yet
Breadth First Search: Fundamentals and Applications
From Everand
Breadth First Search: Fundamentals and Applications
Fouad Sabry
No ratings yet
Practical 6
No ratings yet
Practical 6
44 pages
Lab Sheet 05
No ratings yet
Lab Sheet 05
8 pages
Lesson 07
No ratings yet
Lesson 07
57 pages
Lecture 6 - Change Management
No ratings yet
Lecture 6 - Change Management
32 pages
Lesson 08
No ratings yet
Lesson 08
39 pages
Lesson 04
No ratings yet
Lesson 04
58 pages
Lesson 05
No ratings yet
Lesson 05
29 pages
Database Management System: Name: Krishna A Patel
No ratings yet
Database Management System: Name: Krishna A Patel
17 pages
Maximum Duration of The Exam (In Minutes) :120 Minimum Percentage of Approval:70 We Ask Please Turn Off Cell Phones
No ratings yet
Maximum Duration of The Exam (In Minutes) :120 Minimum Percentage of Approval:70 We Ask Please Turn Off Cell Phones
23 pages
Spring Boot Developer Resume
No ratings yet
Spring Boot Developer Resume
6 pages
Error Handling Flaws - Information and How To Fix - Veracode
No ratings yet
Error Handling Flaws - Information and How To Fix - Veracode
7 pages
Payment Billing System: Mini Project Report
No ratings yet
Payment Billing System: Mini Project Report
67 pages
Reseller Store
No ratings yet
Reseller Store
106 pages
Oracle Data Guard Presentation
No ratings yet
Oracle Data Guard Presentation
52 pages
Professional Cloud Architect
No ratings yet
Professional Cloud Architect
9 pages
Lec Database
No ratings yet
Lec Database
57 pages
Writing Queries
No ratings yet
Writing Queries
5 pages
CH 9 11EM MCQ
No ratings yet
CH 9 11EM MCQ
9 pages
1 - Micromine Basic Module
No ratings yet
1 - Micromine Basic Module
101 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
166 pages
Reading Sample Sap Press SAP Master Data Governance
No ratings yet
Reading Sample Sap Press SAP Master Data Governance
32 pages
Exam Questions
No ratings yet
Exam Questions
3 pages
Car Rental System
No ratings yet
Car Rental System
12 pages
GenaiStack Script
No ratings yet
GenaiStack Script
2 pages
Iit Roorkee Full Stack Software Dev
No ratings yet
Iit Roorkee Full Stack Software Dev
17 pages
Asm Astam Sample
No ratings yet
Asm Astam Sample
47 pages
1 Course Information
No ratings yet
1 Course Information
3 pages
Jaggia BA 1e Chap002 PPT
No ratings yet
Jaggia BA 1e Chap002 PPT
35 pages
Git Updates
No ratings yet
Git Updates
6 pages
Wts 11 & 12 Data Handling
No ratings yet
Wts 11 & 12 Data Handling
48 pages
WebProgramming - Exercises
No ratings yet
WebProgramming - Exercises
6 pages
Adt Lab11
No ratings yet
Adt Lab11
64 pages
Accounting Information Systems 14th Edition (Ebook PDF) Download
100% (1)
Accounting Information Systems 14th Edition (Ebook PDF) Download
58 pages
Chapter 3 - Data Modelling Concepts
No ratings yet
Chapter 3 - Data Modelling Concepts
6 pages
3 4 5 IT Infrastructure
No ratings yet
3 4 5 IT Infrastructure
97 pages
SQL Major Method 2
100% (1)
SQL Major Method 2
6 pages
Dokumen - Tips Manual-Foxpro 2023
No ratings yet
Dokumen - Tips Manual-Foxpro 2023
36 pages

Lesson 06

Uploaded by

Lesson 06

Uploaded by

ADVANCED DATABASE MANAGEMENT

Query Processing Part II

• Let M denote memory size (in pages).

FOR EACH BLOCK BR OF R DO BEGIN

• Cost of indexed nested loops join

** if interested please refer

• COUNT(), AVERAGE(), and SUM()

• In producer-driven or eager pipelining

• Alternative name: pull and push models of pipelining

• Fundamentals of database systems

• Database Management Systems

• Advanced Database Management Systems

You might also like