0% found this document useful (0 votes)

44 views26 pages

QueryProcessing Lect 3

This document discusses distributed query processing. It explains that a distributed query processor must decompose high-level queries into data manipulation commands and consider communication costs to optimize query plans. Two example query plans over distributed relations are provided to demonstrate how minimizing network traffic can reduce query processing costs. The objectives, components, and optimization techniques of distributed query processing are outlined.

Uploaded by

ally

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views26 pages

QueryProcessing Lect 3

Uploaded by

ally

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

4.

Distributed Query Processing

lecture 3

Overview of Query Processing

2019-2020 3rd Sem2 NW
Morning/Evening/Dr. Salma

1
Query Processing

High level user query

Query
Processor

Low level data manipulation commands

2
Query Processing Components
● Query language that is used
⬥ SQL (Structured Query Language)
● Query execution methodology
⬥ The steps that the system goes through in executing
high-level (declarative) user queries
● Query optimization
⬥ How to determine the “best” execution plan?

3
Query Language
● Tuple calculus: { t | F(t) }
where t is a tuple variable, and F(t) is a well formed formula
● Example:
⬥ Get the numbers and names of all programmers.

4
Query Language (cont.)
● Domain calculus:
where xi is a domain variable, and is a well
formed formula
● Example:
{ x, y | EMP(x, y, “Programmer") }

Variables are position sensitive!

5
Query Language (cont.)
● SQL is a tuple calculus language.

SELECT ENO,ENAME
FROM EMP
WHERE TITLE=“Programmer”

End user uses non-procedural (declarative)

languages to express queries.
6
Query Processing Objectives & Problems

● Query processor transforms queries into procedural

operations to access data in an optimal way.

● Distributed query processor has to deal with query

decomposition and data localization.

7
DB example

Figure 3.3

8
Centralized Query Processing
Alternatives
SELECTENAME
FROM EMP E, ASG G
WHERE E.ENO = G.ENO AND RESP=“manager”

● Strategy 1:

● Strategy 2:

Strategy 2 avoids Cartesian product, so is it “better”.

9
Distributed Query Processing
● Query processor must consider the communication
cost and select the best site.
● The same query example, but relation G and E are
fragmented and distributed.

10
Distributed Query Processing Plans

● By centralized optimization,

● Two distributed query processing plans

1111
Distributed Query Plan I
Plan I: To transport all segments to query site and
execute there.
Site 5

Result = (EMP1 ∪ EMP⋈2) ENO

σTITLE=“manager” (ASG1 ∪
ASG2)
ASG1 ASG2 EMP1 EMP2

Site 1 Site 2 Site 3 Site 4

This causes too much network traffic, very costly.

12
Distributed Query Plan II
Plan II (Optimized):

Site 5
Result = (EMP1 ’ ∪ EMP2
’)
EMP1’ EMP2’
Site 3 Site 4
EMP1’ = EMP1 ⋈ ENO ASG1’ EMP2’ = EMP2 ⋈ ENO ASG2’
ASG1’ ASG2’
Site 1 Site 2
ASG1’ = σ RESP=“manager” (ASG1) ASG2’ = σ RESP =“manager” (ASG2)
13
Costs of the Two Plans
● Assume:
⬥ size(EMP)=400, size(ASG)=1000, 20 tuples with RESP =“manager”
⬥ tuple access cost = 1 unit; tuple transfer cost = 10 units
⬥ ASG and EMP are locally clustered on attribute RESP and ENO, respectively.
● Plan 1
⬥ Transfer EMP to site 5: 400*tuple transfer cost 4000
⬥ Transfer ASG to site 5: 1000*tuple transfer cost 10000
⬥ Produce ASG’: 1000*tuple access cost 1000
⬥ Join EMP and ASG’: 400*20*tuple access cost 8000
Total cost 23,000
● Plan 2
⬥ Produce ASG’: (10+10)*tuple access cost 20
⬥ Transfer ASG’ to the sites of EMP: (10+10)*tuple transfer cost 200
⬥ Produce EMP’: (10+10)*tuple access cost * 2 40
⬥ Transfer EMP’ to result site: (10+10)*tuple transfer cost 200
Total cost 460 14
Query Optimization Objectives
● Minimize a cost function
I/O cost + CPU cost + communication cost
● These might have different weights in different distributed
environments

● Can also maximize throughout

15
Communication Cost
● Wide area network
● Communication cost will dominate
- Low bandwidth
- Low speed
- High protocol overhead
● Most algorithms ignore all other cost components

● Local area network

● Communication cost not that dominate
● Total cost function should be considered
16
Types of Query Optimization
• Query optimization aims at choosing the “best” point in the
solution space of all possible execution strategies.
● Exhaustive search
▪ method for query optimization is to search the solution
space, exhaustively predict the cost of each strategy, and
select the strategy with minimum cost.
▪ Cost-based
▪ Optimal
▪ Combinatorial complexity in the number of relations (The
problem becomes worse as the number of relations or
fragments increases (e.g., becomes greater than 5 or 6).
▪ Workable for small solution spaces
17
Types of Query Optimization

❖ Heuristics
• Not optimal
• restrict the solution space so that only a few
strategies are considered
• Perform unary operations (selection and
projection) first
• Reorder operations to reduce intermediate
relation size
• Replace a join by a series of semijoins to
minimize data communication.
18
Query Optimization Granularity
● Single query at a time
⬥ Cannot use common intermediate results

● Multiple queries at a time

⬥ Efficient if many similar queries
⬥ Decision space is much larger

19
Query Optimization Timing
● Static
⬥ Do it at compilation time by using statistics, appropriate
for exhaustive search, optimized once, but executed
many times.
⬥ Difficult to estimate the size of the intermediate results
⬥ Can amortize over many executions

● Dynamic
⬥ Do it at execution time, accurate about the size of the
intermediate results, repeated for every execution,
expensive.
20
Query Optimization Timing (cont.)
● Hybrid
⬥ Compile using a static algorithm
⬥ If the error in estimate size > threshold, re-optimizing at
run time

21
Statistics
● Relation
⬥ Cardinality
⬥ Size of a tuple
⬥ Fraction of tuples participating in a join with another relation
● Attributes
⬥ Cardinality of the domain
⬥ Actual number of distinct values
● Common assumptions
⬥ Independence between different attribute values
⬥ Uniform distribution of attribute values within their domain
22
Decision Sites
● For query optimization, it may be done by
⬥ Single site – centralized approach
– Single site determines the best schedule
– Simple
– Need knowledge about the entire distributed database
⬥ All the sites involved – distributed approach
– Cooperation among sites to determine the schedule
– Need only local information
– Cost of operation
⬥ Hybrid – one site makes major decision in cooperation
with other sites making local decisions
– One site determines the global schedule
– Each site optimizes the local subqueries
23
Network Topology

● Wide Area Network (WAN) – point-to-point

⬥ Characteristics
– Low bandwidth
– Low speed
– High protocol overhead
⬥ Communication cost will dominate; ignore all other cost
factors
⬥ Global schedule to minimize communication cost
⬥ Local schedules according to centralized query
optimization

24
Network Topology (cont.)

● Local Area Network (LAN)

⬥ communication cost not that dominate
⬥ Total cost function should be considered
⬥ Broadcasting can be exploited
⬥ Special algorithms exist for star networks

25
Other Information to Exploit

● Using replications to minimize communication costs

● Using semijoins to reduce the size of operand

relations to cut down communication costs when
overhead is not significant.
Semijoins: is a technique for processing a join between two tables that are stored sites. The
basic idea is to reduce the transfer cost by first sending only the projected join column(s)

to the other site, where it is joined with the second relation .

Ms Access Manual
50% (2)
Ms Access Manual
82 pages
Advanced Database Systems: Chapter 3:query Processing and Evaluation
100% (1)
Advanced Database Systems: Chapter 3:query Processing and Evaluation
36 pages
Forcepoint DLP Admin Guide
No ratings yet
Forcepoint DLP Admin Guide
496 pages
Query Processing
No ratings yet
Query Processing
121 pages
4 Query Processing
No ratings yet
4 Query Processing
79 pages
4 2 Query - Processing
No ratings yet
4 2 Query - Processing
106 pages
4-Query - Processing (1) - PTIT
No ratings yet
4-Query - Processing (1) - PTIT
72 pages
DDB Lec 4 PDF
No ratings yet
DDB Lec 4 PDF
69 pages
QueryProcess Optim
No ratings yet
QueryProcess Optim
60 pages
Zyqwadawfafslecture09 Query Optimization
No ratings yet
Zyqwadawfafslecture09 Query Optimization
90 pages
Vu Lec 30
No ratings yet
Vu Lec 30
28 pages
Chapter 2 - Query Processing and Optimization
100% (1)
Chapter 2 - Query Processing and Optimization
28 pages
Lecture5 - Query - Processing 1
No ratings yet
Lecture5 - Query - Processing 1
23 pages
4-Query Processing Nhom1
No ratings yet
4-Query Processing Nhom1
73 pages
Lect 19
No ratings yet
Lect 19
33 pages
CSE 453 Slide 3
No ratings yet
CSE 453 Slide 3
72 pages
Principles of Distributed Database Systems: M. Tamer Özsu Patrick Valduriez
No ratings yet
Principles of Distributed Database Systems: M. Tamer Özsu Patrick Valduriez
73 pages
DDB Lec5
No ratings yet
DDB Lec5
46 pages
Query Processing Concepts
No ratings yet
Query Processing Concepts
99 pages
SF8 - Unit 2 DDB
No ratings yet
SF8 - Unit 2 DDB
97 pages
Query Optimization
No ratings yet
Query Optimization
29 pages
ADB - Unit - III (Chapter-2) - Query Processing and Decomposition
No ratings yet
ADB - Unit - III (Chapter-2) - Query Processing and Decomposition
42 pages
Implications of A Distributed Environment Part 2
No ratings yet
Implications of A Distributed Environment Part 2
38 pages
Chapter 5: Overview of Query Processing
No ratings yet
Chapter 5: Overview of Query Processing
18 pages
Vu Lec 33
No ratings yet
Vu Lec 33
36 pages
Session - 10 Querying
No ratings yet
Session - 10 Querying
36 pages
1 2e Query Optimization Ozsu ch8 SPLIT
No ratings yet
1 2e Query Optimization Ozsu ch8 SPLIT
29 pages
Vu Lec 35
No ratings yet
Vu Lec 35
42 pages
Query
No ratings yet
Query
104 pages
Query Processing
No ratings yet
Query Processing
28 pages
Distributed Query Processing
No ratings yet
Distributed Query Processing
17 pages
07.overview of Query Processing
No ratings yet
07.overview of Query Processing
35 pages
Vu Lec 34
No ratings yet
Vu Lec 34
26 pages
Distributed Database Management Notes - 3
86% (7)
Distributed Database Management Notes - 3
48 pages
Queryoptimization Examples
No ratings yet
Queryoptimization Examples
26 pages
CH 13 Updated
No ratings yet
CH 13 Updated
30 pages
Query
No ratings yet
Query
13 pages
Unit IV Part II
No ratings yet
Unit IV Part II
37 pages
DBMS Unit5 Lecture1
No ratings yet
DBMS Unit5 Lecture1
22 pages
Query Evaluation
No ratings yet
Query Evaluation
51 pages
3 QueryProcessing
No ratings yet
3 QueryProcessing
15 pages
Lecture Nine 8086 Microprocessor Memory and I/O Interfacing: March 2020
0% (1)
Lecture Nine 8086 Microprocessor Memory and I/O Interfacing: March 2020
23 pages
06 Query Processing (2) - NDN
No ratings yet
06 Query Processing (2) - NDN
31 pages
Unit2 1
No ratings yet
Unit2 1
10 pages
Query Processing in Distributed Database
No ratings yet
Query Processing in Distributed Database
20 pages
DBMS
No ratings yet
DBMS
24 pages
Unit II QUERY PROCESSING AND DECOMPOSITION
No ratings yet
Unit II QUERY PROCESSING AND DECOMPOSITION
24 pages
Query Processing
No ratings yet
Query Processing
39 pages
ER Design Issues
No ratings yet
ER Design Issues
2 pages
Outline: Distributed Query Processing
No ratings yet
Outline: Distributed Query Processing
8 pages
Introduction To Database Management Systems CS470
No ratings yet
Introduction To Database Management Systems CS470
11 pages
Query Proceessing
No ratings yet
Query Proceessing
5 pages
DDBS Unit 2
No ratings yet
DDBS Unit 2
7 pages
Rahul Chugh Adbms Asiignment 2
No ratings yet
Rahul Chugh Adbms Asiignment 2
10 pages
6-Query Intro
No ratings yet
6-Query Intro
15 pages
IJCER (WWW - Ijceronline.com) International Journal of Computational Engineering Research
No ratings yet
IJCER (WWW - Ijceronline.com) International Journal of Computational Engineering Research
4 pages
Distributed Query Processing
No ratings yet
Distributed Query Processing
31 pages
Efficient Join On DBMS
No ratings yet
Efficient Join On DBMS
3 pages
CHAPTER 9 - Database Management Systems
No ratings yet
CHAPTER 9 - Database Management Systems
6 pages
Distributed Databases Data Warehousing: CPS 216 Advanced Database Systems
No ratings yet
Distributed Databases Data Warehousing: CPS 216 Advanced Database Systems
11 pages
Ch12-Query Processing
No ratings yet
Ch12-Query Processing
34 pages
AZ 104 August 2023
No ratings yet
AZ 104 August 2023
5 pages
Distributed Query Processing
No ratings yet
Distributed Query Processing
24 pages
Ketulkumar Polara: Data Scientist Email: Phone
No ratings yet
Ketulkumar Polara: Data Scientist Email: Phone
6 pages
DDP ch7
No ratings yet
DDP ch7
1 page
Query Optimization
No ratings yet
Query Optimization
103 pages
ASUG82506 - Microsoft Proof of Concept of Group Reporting Consolidations in SAP S4HANA For Central Finance Foundation PDF
No ratings yet
ASUG82506 - Microsoft Proof of Concept of Group Reporting Consolidations in SAP S4HANA For Central Finance Foundation PDF
45 pages
SQL Project
No ratings yet
SQL Project
17 pages
Zener Diode
100% (1)
Zener Diode
12 pages
Removable Storage Devices
No ratings yet
Removable Storage Devices
10 pages
Signals and Systems: Dr. Muayad
No ratings yet
Signals and Systems: Dr. Muayad
11 pages
A Little Riak Book
No ratings yet
A Little Riak Book
105 pages
CO202 - DSA - Course Plan
No ratings yet
CO202 - DSA - Course Plan
3 pages
Tableau Blueprint
No ratings yet
Tableau Blueprint
307 pages
Top 50 Industry-Relevant Data Analyst Interview Q - A
No ratings yet
Top 50 Industry-Relevant Data Analyst Interview Q - A
5 pages
DFS Configuration Guide
No ratings yet
DFS Configuration Guide
9 pages
Distributed Database: 2019-2020 3Rd Sem2 NW Morning/Evening/Dr. Salma
No ratings yet
Distributed Database: 2019-2020 3Rd Sem2 NW Morning/Evening/Dr. Salma
52 pages
A30 327
No ratings yet
A30 327
20 pages
CoSc 2041 Chapter 4-1
No ratings yet
CoSc 2041 Chapter 4-1
16 pages
Distributed Database Design
No ratings yet
Distributed Database Design
51 pages
File Organization Answers
No ratings yet
File Organization Answers
2 pages
Lec 16 BB
No ratings yet
Lec 16 BB
24 pages
BSM 562 Big Data: Kevser Ovaz Akpınar, PHD
No ratings yet
BSM 562 Big Data: Kevser Ovaz Akpınar, PHD
41 pages
Binational Circuits 2013
No ratings yet
Binational Circuits 2013
31 pages
Session3 Overheads
No ratings yet
Session3 Overheads
50 pages
6 Sears
No ratings yet
6 Sears
25 pages
ODI Statement of Direction 20200501
No ratings yet
ODI Statement of Direction 20200501
6 pages
Section 3.3 Data Storage
No ratings yet
Section 3.3 Data Storage
13 pages
Modern Web Applications
No ratings yet
Modern Web Applications
9 pages
Data Recovery and Secure Deletion
No ratings yet
Data Recovery and Secure Deletion
20 pages
Penyerahan Dan Penilaian Tugasan CBDB4103 Intermediate Database MAY 2023
No ratings yet
Penyerahan Dan Penilaian Tugasan CBDB4103 Intermediate Database MAY 2023
11 pages
3 - 3 Digital Signals: in Information For and Than - For
No ratings yet
3 - 3 Digital Signals: in Information For and Than - For
12 pages
How To Fetch Data in Excel or Generate Excel File in PHP
No ratings yet
How To Fetch Data in Excel or Generate Excel File in PHP
8 pages
Hbase Mock Test
No ratings yet
Hbase Mock Test
6 pages
Dent
No ratings yet
Dent
4 pages
X (K) (K) + X2 (K) (K O, I,) X (K+ ) XL (K+ ) +wik+ ) X2 (K+ )
No ratings yet
X (K) (K) + X2 (K) (K O, I,) X (K+ ) XL (K+ ) +wik+ ) X2 (K+ )
4 pages
What Happens During Oracle Database Hot Backup: Alter Tablespace Tbs - Name Begin Backup
No ratings yet
What Happens During Oracle Database Hot Backup: Alter Tablespace Tbs - Name Begin Backup
2 pages
Application of Fast-Fourier-Transform Techniques To The Discrete-Dipole Approximation
No ratings yet
Application of Fast-Fourier-Transform Techniques To The Discrete-Dipole Approximation
3 pages
Coronaviruses Are From The Family Coronaviridae and Are: Severe Acute Respiratory Syndrome Sars
No ratings yet
Coronaviruses Are From The Family Coronaviridae and Are: Severe Acute Respiratory Syndrome Sars
2 pages
Ayoub Bouyebla: Education Computer Skills
No ratings yet
Ayoub Bouyebla: Education Computer Skills
1 page
Graph Layout Support for Model-Driven Engineering
From Everand
Graph Layout Support for Model-Driven Engineering
Miro Spönemann
No ratings yet

QueryProcessing Lect 3

Uploaded by

QueryProcessing Lect 3

Uploaded by

4.

Distributed Query Processing

Overview of Query Processing

High level user query

Low level data manipulation commands

Variables are position sensitive!

End user uses non-procedural (declarative)

● Query processor transforms queries into procedural

● Distributed query processor has to deal with query

Strategy 2 avoids Cartesian product, so is it “better”.

● Two distributed query processing plans

Result = (EMP1 ∪ EMP⋈2) ENO

Site 1 Site 2 Site 3 Site 4

This causes too much network traffic, very costly.

● Can also maximize throughout

● Local area network

● Multiple queries at a time

● Wide Area Network (WAN) – point-to-point

● Local Area Network (LAN)

● Using replications to minimize communication costs

● Using semijoins to reduce the size of operand

to the other site, where it is joined with the second relation .

You might also like