0% found this document useful (0 votes)

43 views23 pages

Lecture5 - Query - Processing 1

Query Processing

Uploaded by

amirosama2121

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views23 pages

Lecture5 - Query - Processing 1

Query Processing

Uploaded by

amirosama2121

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Principles of Distributed Database

Systems
M. Tamer Özsu
Patrick Valduriez

© 2020, M.T. Özsu & P. Valduriez 1

Outline
◼ Introduction
◼ Distributed and parallel database design
◼ Distributed data control
◼ Distributed Query Processing
◼ Distributed Transaction Processing
◼ Data Replication
◼ Database Integration – Multidatabase Systems
◼ Parallel Database Systems
◼ Peer-to-Peer Data Management
◼ Big Data Processing
◼ NoSQL, NewSQL and Polystores
◼ Web Data Management
© 2020, M.T. Özsu & P. Valduriez 2
Outline
◼ Distributed Query Processing
❑ Query Decomposition and Localization
❑ Join Ordering
❑ Distributed Query Optimization
❑ Adaptive Query Processing

© 2020, M.T. Özsu & P. Valduriez 3

Query Processing in a DDBMS

◼ Generally, a query in distributed DBMS require data from

multiple sites, and this is called transmission of data that
causes communication costs.

◼ Query processing in DBMS is different from centralized

DBMS due to the communication cost of data transfer
over the network.

◼ The transmission cost is low when the sites are

connected through high-speed network and is quite
significant in another network.

© 2020, M.T. Özsu & P. Valduriez 4

Query Processing in a DDBMS

◼ In distributed query processing, the data transfer

cost of distributed query processing means:

❑ Cost of transferring intermediate files to other sites for

processing and
❑ Cost of transferring the ultimate result file to the
location where the results required

© 2020, M.T. Özsu & P. Valduriez 5

Distributed DBMS Environment

• If s1 request a query and

needs data from s2 and
s3.
• It is decided to execute
the query at s3.
• 1st communication cost is
transferring the data from
s2 to s3 → then s3 will
execute the query and
get the result.
• 2nd communication cost is
transferring the result
from s3 to s1

© 2020, M.T. Özsu & P. Valduriez 6

Query Processing in a DDBMS

High level user query

Query
Processor

Low-level data manipulation

commands for D-DBMS

© 2020, M.T. Özsu & P. Valduriez 7

Query Processing Components

◼ Query language
❑ SQL: “intergalactic dataspeak”

◼ Query execution
❑ The steps that one goes through in executing high-level
(declarative) user queries.

◼ Query optimization
❑ How do we determine the “best” execution plan?

◼ We assume a homogeneous D-DBMS

© 2020, M.T. Özsu & P. Valduriez 8

Selecting Alternatives
Find the names of employee who are
managing a project

Strategy 1

SELECT ENAME
FROM EMP , ASG
WHERE emp.no=asg.no and RESP = "Manager"

Strategy 1
 ENAME(RESP=“Manager”EMP.ENO=ASG.ENO(EMP×ASG))

© 2020, M.T. Özsu & P. Valduriez 9

Selecting Alternatives
Strategy 2

SELECT ENAME
FROM EMP NATURAL JOIN ASG
WHERE RESP = "Manager"

Strategy 2
 ENAME(EMP ⋈ENO (RESP=“Manager” (ASG))
Strategy 2 avoids Cartesian product, and consumes less
computing resources, so may be “better”

© 2020, M.T. Özsu & P. Valduriez 10

Selecting Alternatives
In a distributed system,

◼ Relational algebra is not enough to express execution

strategies. It must be supplemented with operators for
exchanging data between sites.

◼ The distributed query processor must also select the

best sites to process data , and possibly the way data
should be transformed.

© 2020, M.T. Özsu & P. Valduriez 11

What is the Problem?

Site 1 Site 2 Site 3 Site 4 Site 5

ASG1=ENO≤“E3”(ASG) ASG2= ENO>“E3”(ASG) EMP1= ENO≤“E3”(EMP) EMP2= ENO>“E3”(EMP) Result

Strategy A Strategy B
© 2020, M.T. Özsu & P. Valduriez 12
Cost of Alternatives

Assume
◼ size(EMP) = 400 row,
size(ASG) = 1000
◼ tuple access cost = 1
unit (1 operation or 1s);
◼ tuple transfer cost = 10
units
◼ There are 20 managers
in relation ASG
◼ Assume that the data is
uniformly distributed
among sites

© 2020, M.T. Özsu & P. Valduriez 13

Cost of Alternatives

◼ Strategy A
❑ produce ASG': (10+10) tuple access cost 20
❑ transfer ASG' to the sites of EMP: (10+10)
tuple transfer cost 200
❑ produce EMP': (10+10) tuple access cost
2 40
❑ transfer EMP' to result site: (10+10) tuple
transfer cost 200
Total Cost 460

◼ Strategy B
❑ transfer EMP to site 5: 400 tuple transfer
cost 4,000
❑ transfer ASG to site 5: 1000 tuple transfer
cost 10,000
❑ produce ASG': 1000 tuple access (apply
condition) 1,000
❑ join EMP and ASG': 400 20(manager) tuple
access cost 8,000
Total Cost 23,000

© 2020, M.T. Özsu & P. Valduriez 14

Query Optimization Objectives
◼ Minimize a cost function
❑ I/O cost + CPU cost + communication cost
❑ These might have different weights in different distributed
environments
◼ Wide area networks
❑ Communication cost may dominate or vary much
◼ Bandwidth
◼ Speed
◼ Protocol overhead
◼ Local area networks
❑ Communication cost not that dominant, so total cost function
should be considered
◼ Can also maximize throughput
© 2020, M.T. Özsu & P. Valduriez 15
Complexity of Relational Operations

Operation Complexity

Select
Project O(n)
◼ Assume (without duplicate elimination)

❑ Relations of cardinality n Project

(with duplicate elimination) O(n  log n)
❑ Sequential scan
Group

Join
Semi-join O(n  log n)
Division
Set Operators

Cartesian Product O(n2)

Types Of Optimizers

◼ Exhaustive search
❑ Cost-based
❑ Optimal
❑ Combinatorial complexity in the number of relations
◼ Heuristics
❑ Not optimal
❑ Regroup common sub-expressions
❑ Perform selection, projection first
❑ Replace a join by a series of semijoins
❑ Reorder operations to reduce intermediate relation size
❑ Optimize individual operations

Optimization Granularity

◼ Single query at a time

❑ Cannot use common intermediate results

◼ Multiple queries at a time

❑ Efficient if many similar queries

❑ Decision space is much larger

Optimization Timing

◼ Static : optimization is done at query compilation time

❑ Compilation ➔ optimize prior to the execution
❑ Difficult to estimate the size of the intermediate resultserror
propagation
❑ Can amortize over many executions
◼ Dynamic: proceeds at query execution time
❑ Run time optimization
❑ Exact information on the intermediate relation sizes
❑ Have to re-optimize for multiple executions
◼ Hybrid: tradeoff between both
❑ Compile using a static algorithm
❑ If the error in estimate sizes > threshold, re-optimize at run time

Statistics

◼ Relation
❑ Cardinality
❑ Size of a tuple
❑ Fraction of tuples participating in a join with another relation
◼ Attribute
❑ Cardinality of domain
❑ Actual number of distinct values
◼ Simplifying assumptions
❑ Independence between different attribute values
❑ Uniform distribution of attribute values within their domain

Optimization Decision Sites

◼ Centralized
❑ Single site determines the “best” schedule
❑ Simple
❑ Need knowledge about the entire distributed database
◼ Distributed
❑ Cooperation among sites to determine the schedule
❑ Need only local information
❑ Cost of cooperation
◼ Hybrid
❑ One site determines the global schedule
❑ Each site optimizes the local subqueries

Network Topology

◼ Wide area networks (WAN) – point-to-point

❑ Characteristics
◼ Relatively low bandwidth (compared to local CPU/IO)
◼ High protocol overhead
❑ Communication cost may dominate; ignore all other cost factors
❑ Global schedule to minimize communication cost
❑ Local schedules according to centralized query optimization
◼ Local area networks (LAN)
❑ Communication cost not that dominant
❑ Total cost function should be considered
❑ Broadcasting can be exploited (joins)
❑ Special algorithms exist for star networks

Questions?

Create Project - Excel
No ratings yet
Create Project - Excel
3 pages
Query Processing
No ratings yet
Query Processing
121 pages
Outline: Distributed Query Processing
No ratings yet
Outline: Distributed Query Processing
8 pages
6-Query Intro
No ratings yet
6-Query Intro
15 pages
ADB - Unit - III (Chapter-2) - Query Processing and Decomposition
No ratings yet
ADB - Unit - III (Chapter-2) - Query Processing and Decomposition
42 pages
4 Query Processing
No ratings yet
4 Query Processing
79 pages
Principles of Distributed Database Systems: M. Tamer Özsu Patrick Valduriez
No ratings yet
Principles of Distributed Database Systems: M. Tamer Özsu Patrick Valduriez
73 pages
Query
No ratings yet
Query
104 pages
4-Query Processing Nhom1
No ratings yet
4-Query Processing Nhom1
73 pages
4 2 Query - Processing
No ratings yet
4 2 Query - Processing
106 pages
4-Query - Processing (1) - PTIT
No ratings yet
4-Query - Processing (1) - PTIT
72 pages
Query Optimization
No ratings yet
Query Optimization
29 pages
Query Processing
No ratings yet
Query Processing
28 pages
QueryProcessing Lect 3
No ratings yet
QueryProcessing Lect 3
26 pages
SF8 - Unit 2 DDB
No ratings yet
SF8 - Unit 2 DDB
97 pages
8 Query Optimization
No ratings yet
8 Query Optimization
53 pages
L1 Distributed QueryProcessing
No ratings yet
L1 Distributed QueryProcessing
4 pages
Vu Lec 30
No ratings yet
Vu Lec 30
28 pages
Unit II QUERY PROCESSING AND DECOMPOSITION
No ratings yet
Unit II QUERY PROCESSING AND DECOMPOSITION
24 pages
2e Query Optimization Ozsu ch8
No ratings yet
2e Query Optimization Ozsu ch8
26 pages
Chapter 5: Overview of Query Processing
No ratings yet
Chapter 5: Overview of Query Processing
18 pages
CSE 453 Slide 3
No ratings yet
CSE 453 Slide 3
72 pages
DDB Lec 4 PDF
No ratings yet
DDB Lec 4 PDF
69 pages
10 DistQueryOptimization
No ratings yet
10 DistQueryOptimization
14 pages
07.overview of Query Processing
No ratings yet
07.overview of Query Processing
35 pages
Distributed Query Processing
No ratings yet
Distributed Query Processing
31 pages
Distributed Query Processing
No ratings yet
Distributed Query Processing
17 pages
Distributed Query Optimization: Oscar Romero Alberto Abelló Gamazo
No ratings yet
Distributed Query Optimization: Oscar Romero Alberto Abelló Gamazo
44 pages
1-Introduction TO Principles of Distributed Database Systems
No ratings yet
1-Introduction TO Principles of Distributed Database Systems
46 pages
All Merged
No ratings yet
All Merged
513 pages
Outline: Multidatabase Query Processing
No ratings yet
Outline: Multidatabase Query Processing
41 pages
Ch1 (CSE417)
No ratings yet
Ch1 (CSE417)
46 pages
RBD Lectures Merged
No ratings yet
RBD Lectures Merged
367 pages
8-Parallel Nhom5
No ratings yet
8-Parallel Nhom5
59 pages
3 QueryProcessing
No ratings yet
3 QueryProcessing
15 pages
Outline: Parallel Database Systems
No ratings yet
Outline: Parallel Database Systems
48 pages
Query Optimization
No ratings yet
Query Optimization
27 pages
Queryoptimization Examples
No ratings yet
Queryoptimization Examples
26 pages
1 Introduction
No ratings yet
1 Introduction
46 pages
Automated Physical Database Design and Tuning Emerging Directions in Database Systems and Applications 1st Edition Nicolas Bruno Instant Download
No ratings yet
Automated Physical Database Design and Tuning Emerging Directions in Database Systems and Applications 1st Edition Nicolas Bruno Instant Download
52 pages
Module 1 - Query Processing
No ratings yet
Module 1 - Query Processing
20 pages
Overview of Query Processing
No ratings yet
Overview of Query Processing
35 pages
1 Introduction
No ratings yet
1 Introduction
50 pages
Outline: What Is A Distributed DBMS Distributed DBMS Architecture
No ratings yet
Outline: What Is A Distributed DBMS Distributed DBMS Architecture
40 pages
Unit2 1
No ratings yet
Unit2 1
10 pages
Automated Physical Database Design and Tuning Emerging Directions in Database Systems and Applications 1st Edition Nicolas Bruno instant download
No ratings yet
Automated Physical Database Design and Tuning Emerging Directions in Database Systems and Applications 1st Edition Nicolas Bruno instant download
111 pages
Distibuted System
No ratings yet
Distibuted System
11 pages
Distributed Databases Data Warehousing: CPS 216 Advanced Database Systems
No ratings yet
Distributed Databases Data Warehousing: CPS 216 Advanced Database Systems
11 pages
Begin
No ratings yet
Begin
11 pages
Distributed Databases
No ratings yet
Distributed Databases
32 pages
Distributed Query Processing
No ratings yet
Distributed Query Processing
24 pages
DDP ch7
No ratings yet
DDP ch7
1 page
1 Introduction
No ratings yet
1 Introduction
42 pages
Vu Lec 33
No ratings yet
Vu Lec 33
36 pages
Introduction-Distributed DBMS-1-26
No ratings yet
Introduction-Distributed DBMS-1-26
26 pages
Efficient Join On DBMS
No ratings yet
Efficient Join On DBMS
3 pages
Distributed Database Systems-Chhanda Ray
No ratings yet
Distributed Database Systems-Chhanda Ray
271 pages
Outline: Data Server Approach Parallel Architectures Parallel DBMS Techniques Parallel Execution Models
No ratings yet
Outline: Data Server Approach Parallel Architectures Parallel DBMS Techniques Parallel Execution Models
42 pages
Implications of A Distributed Environment Part 2
No ratings yet
Implications of A Distributed Environment Part 2
38 pages
Final DBMS Unit 7
No ratings yet
Final DBMS Unit 7
48 pages
What Is The Price Of A Mousetrap? The Assessment Of Value From Cloud Services.
From Everand
What Is The Price Of A Mousetrap? The Assessment Of Value From Cloud Services.
Ernie Zibert
No ratings yet
Unit 5
No ratings yet
Unit 5
17 pages
CV Mallapur Niharika 2
No ratings yet
CV Mallapur Niharika 2
1 page
DAS and Echnoserve March 2020
No ratings yet
DAS and Echnoserve March 2020
12 pages
Print On.: The Possibilities Are Endless With The New LEF-12 Benchtop UV Flatbed Printer
No ratings yet
Print On.: The Possibilities Are Endless With The New LEF-12 Benchtop UV Flatbed Printer
4 pages
2025 DumpsCafe Microsoft MS-102
No ratings yet
2025 DumpsCafe Microsoft MS-102
29 pages
BEM 2022a Wireless Mobile Speaker - User Guide
No ratings yet
BEM 2022a Wireless Mobile Speaker - User Guide
9 pages
Time Management
No ratings yet
Time Management
7 pages
Heijunka
No ratings yet
Heijunka
6 pages
Adhesive Dispensing
No ratings yet
Adhesive Dispensing
5 pages
Lab 09-Building A Chat App (Using Python) Using A Azure OpenAI Service and RAG
No ratings yet
Lab 09-Building A Chat App (Using Python) Using A Azure OpenAI Service and RAG
49 pages
Cover Letter Plos One
100% (1)
Cover Letter Plos One
7 pages
Adel - CV - English 01-08-2023
No ratings yet
Adel - CV - English 01-08-2023
37 pages
Low-Latency BCH-CRC Decoder For 3D CT NAND Flash Memory Applications
No ratings yet
Low-Latency BCH-CRC Decoder For 3D CT NAND Flash Memory Applications
2 pages
BREECH LOCK Presentation Dec2004 Tcm24 226146
No ratings yet
BREECH LOCK Presentation Dec2004 Tcm24 226146
24 pages
TM 9 2350 230 34 1
No ratings yet
TM 9 2350 230 34 1
214 pages
MSSC Akx32lb K
No ratings yet
MSSC Akx32lb K
150 pages
8086 Bus Design
No ratings yet
8086 Bus Design
28 pages
Manual de Los Fundamentos de La SANIDAD INTERIOR Y LIBERACION.
No ratings yet
Manual de Los Fundamentos de La SANIDAD INTERIOR Y LIBERACION.
10 pages
Safety Grating Products Catalog
No ratings yet
Safety Grating Products Catalog
108 pages
Absabank-Uganda Fin103
100% (1)
Absabank-Uganda Fin103
4 pages
8.2.3 Varonaleds Isvi 9
No ratings yet
8.2.3 Varonaleds Isvi 9
17 pages
Caravan Manual
100% (6)
Caravan Manual
268 pages
Bearing No # 4 Inspection Work in Uepl Company
No ratings yet
Bearing No # 4 Inspection Work in Uepl Company
6 pages
Mahle - Supplier Management
No ratings yet
Mahle - Supplier Management
63 pages
Os Lab Manual
No ratings yet
Os Lab Manual
98 pages
Diah Puspita Sari - Bachelor of Law - CV
No ratings yet
Diah Puspita Sari - Bachelor of Law - CV
2 pages
A Proposal On Understanding Brand Loyalty of Customers On Smartphones
No ratings yet
A Proposal On Understanding Brand Loyalty of Customers On Smartphones
4 pages
Unit 2
No ratings yet
Unit 2
59 pages
Supervised Vs Unsupervised Learning - Javatpoint
No ratings yet
Supervised Vs Unsupervised Learning - Javatpoint
9 pages

Lecture5 - Query - Processing 1

Uploaded by

Lecture5 - Query - Processing 1

Uploaded by

Principles of Distributed Database

© 2020, M.T. Özsu & P. Valduriez 1

© 2020, M.T. Özsu & P. Valduriez 3

◼ Generally, a query in distributed DBMS require data from

◼ Query processing in DBMS is different from centralized

◼ The transmission cost is low when the sites are

© 2020, M.T. Özsu & P. Valduriez 4

◼ In distributed query processing, the data transfer

❑ Cost of transferring intermediate files to other sites for

© 2020, M.T. Özsu & P. Valduriez 5

• If s1 request a query and

© 2020, M.T. Özsu & P. Valduriez 6

High level user query

Low-level data manipulation

© 2020, M.T. Özsu & P. Valduriez 7

◼ We assume a homogeneous D-DBMS

© 2020, M.T. Özsu & P. Valduriez 8

© 2020, M.T. Özsu & P. Valduriez 9

© 2020, M.T. Özsu & P. Valduriez 10

◼ Relational algebra is not enough to express execution

◼ The distributed query processor must also select the

© 2020, M.T. Özsu & P. Valduriez 11

Site 1 Site 2 Site 3 Site 4 Site 5

© 2020, M.T. Özsu & P. Valduriez 13

© 2020, M.T. Özsu & P. Valduriez 14

❑ Relations of cardinality n Project

Cartesian Product O(n2)

© 2020, M.T. Özsu & P. Valduriez 16

© 2020, M.T. Özsu & P. Valduriez 17

◼ Single query at a time

◼ Multiple queries at a time

❑ Decision space is much larger

© 2020, M.T. Özsu & P. Valduriez 18

◼ Static : optimization is done at query compilation time

© 2020, M.T. Özsu & P. Valduriez 19

© 2020, M.T. Özsu & P. Valduriez 20

© 2020, M.T. Özsu & P. Valduriez 21

◼ Wide area networks (WAN) – point-to-point

© 2020, M.T. Özsu & P. Valduriez 22

© 2020, M.T. Özsu & P. Valduriez 23

You might also like