0% found this document useful (0 votes)

202 views24 pages

Distributed Query Processing

This document discusses distributed query processing and summarizes: 1) Distributed query processing extends centralized query processing by adding operators for sending and receiving data across sites and optimizing for response time over heterogeneous networks. 2) Middleware provides location and data model transparency but requires data cleaning and has tradeoffs between query response time and data freshness. 3) Virtual integration with caching can improve response times compared to data warehouses which materialize results but have high resource requirements.

Uploaded by

Maya Ramya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

202 views24 pages

Distributed Query Processing

Uploaded by

Maya Ramya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 24

Distributed Query Processing

Donald Kossmann University of Heidelberg [email protected]

Agenda
Query Processing 101
centralized query processing distributed query processing

Middleware
SQL and XML data integration

The Role of Web Services

Problem Statement
Input: Query
How many times has the moon circled around the earth in the last twenty years?

Output: Answer
240!

Objectives:
response time, throughput, first answers, little IO, ...

Centralized vs. Distributed Query Processing

same problem but, different parameters and objectives

Query Processing 101

Input: Declarative Query
SQL, OQL, XQuery, ...

Step 1: Translate Query into Algebra

Tree of operators

Step 2: Optimize Query (physical and logical)

Tree of operators (Compilation)

Step 3: Interpretation
Query result

Algebra
A.d

SELECT A.d FROM A, B WHERE A.a = B.b AND A.c = 35

A.a = B.b, A.c = 35 X B

relational algebra for SQL very well understood algebra for OQL fairly well understood algebra for XQuery (work in progress)

Query Optimization
A.d A.a = B.b, A.c = 35 X A B index A.c A.d hashjoin B.b B

no brainers (e.g., push down cheap predicates) enumerate alternative plans, apply cost model use search heuristics to find cheapest plan

Query Execution
John A.d

(John, 35, CS)

hashjoin (John, 35, CS) (Mary, 35, EE) index A.c B.b B (CS) (AS)

(Edinburgh, CS,5.0) (Edinburgh, AS, 6.0)

library of operators (hash join, merge join, ...) pipelining (iterator model) lazy evaluation exploit indexes and clustering in database

Summary: Centralized Queries

Basic SQL (SPJG, nesting) well understood Very good extensibility
nearest neighbor search, spatial joins, time series, UDF, roll-up, cube, ...

Current problems
statistics, cost model for optimization physical database design expensive

Trends
interactiveness during execution approximate answers more and more functionality, powerful models (XML)

Distributed Query Processing 101

Idea:
This is just an extension of centralized query processing. (System R* et al. in the early 80s)

What is different?
extend physical algebra: send&receive operators resource vectors, network interconnect matrix caching and replication optimize for response time less predictability in cost model (adaptive algos) heterogeneity in data formats and data models

Distributed Query Plan

A.d hashjoin

receive
send

B.b
index A.c B

Cost
1 8

Total Cost = Sum of Cost of Ops

Cost = 40

1
1

6
6

2
5 10

Response Time
25, 33 independent, pipelined parallelism 24, 32 Total Cost = 40 first tuple = 25 last tuple = 33

0, 7
0, 6

0, 24
0, 18

0, 12
0, 5 0, 10 first tuple = 0 last tuple = 10

Adaptive Algorithms
Deal with unpredictable events at run time
delays in arrival of data, burstiness of network autonomity of nodes, change in policies

Example: double pipelined hash joins

build hash table for both input streams read inputs in separate threads good for bursty arrival of data

re-optimization at run time

monitor execution of query adjust estimates of cost model re-optimize if delta is too large

Heterogeneity
Use Wrappers to hide heterogeneity Wrappers take care of data format, packaging Wrappers map from local to global schema Wrappers carry out caching
connections, cursors, data, ...

Wrappers map queries into local dialect Wrappers participate in query planning!!!
define the subset of queries that can be handled give cost information, statistics capability-based rewrite (HKWY, VLDB 1997)

Data Cleaning
Are two objects the same? Is D. A. Kossman the same as Kossmann? Is the object that was at Position x 10 min. ago the same as the object at Position y now? Approaches (combination of)
statistical domain knowledge human interspection

Very Expensive

Summary
Theory very well understood
extend traditional (centralized) query processing add some bells and whistles heterogeinity needs manual work and wrappers

Problems in Practice
cost model, statistics architectures are not fit for adaptivity, heterogeneity optimizers do not scale for 10,000s of sites autonomy of sites, systems not built for asynchronous communication data cleaning

Middleware
Two kinds of middleware
data warehouses virtual integration

Data Warehouses
good: query response times good: materializes results of data cleaning bad: high resource requirements in middleware bad: staleness of data

Virtual Integration
the opposite caching possible to improve response times

Virtual Integration
Query

Middleware (query decomposition, result composition) wrapper sub query wrapper sub query

DB1

DB2

IBM Data Joiner

SQL Query

Data Joiner wrapper sub query wrapper sub query

SQL DB1

SQL DB2

Adding XML
Query XML Publishing Middleware (SQL) wrapper wrapper

sub query
DB1

sub query
DB2

XML Data Integration

XML Query

Middleware (XML) XML query wrapper XML query wrapper

DB1

DB2

XML Data Integration

Example: BEA Liquid Data Advantage
Availability of XML wrappers for all major databases

Problems
XML - SQL mapping is very difficult XML is not always the right language (e.g., decision support style queries)

Summary
Middleware looks like a homogenous, centralized database
location transparency data model transparency

Middleware provides global schema

data sources map local schemas to global schema

Various kinds of middleware (SQL, OQL, XML) Stacks of middleware possible Data Cleaning requires special attention

A Note on Web Services

Idea: Encapsulate Data Source
provide WSDL interface to access data works very well if query pattern is known

Problem: Exploit Capability of Source

WSDL limits capabilities of data source good optimization requires white box example: access by id, access by name, full scan should all combinations be listed in WSDL?

Solution: WSDL for Query Planning

Details ???

GSky Green Wall Spec Drawing-3-P
100% (1)
GSky Green Wall Spec Drawing-3-P
1 page
CMPG 111 Pec - 2024
No ratings yet
CMPG 111 Pec - 2024
7 pages
Python A Practical Learning Approach - TS Murugesh
No ratings yet
Python A Practical Learning Approach - TS Murugesh
166 pages
Unit 1 Client Side Scripting Final
No ratings yet
Unit 1 Client Side Scripting Final
254 pages
Lecture-2.3-Flowchart Algorithm Pseudocode
No ratings yet
Lecture-2.3-Flowchart Algorithm Pseudocode
21 pages
Fundamentals of Python: First Programs Second Edition
No ratings yet
Fundamentals of Python: First Programs Second Edition
42 pages
Visual Programming 1: - Exam Preparation: With MCQS, Pracs, Questions and Solutions
No ratings yet
Visual Programming 1: - Exam Preparation: With MCQS, Pracs, Questions and Solutions
27 pages
Mba 851 Project Evaluation
100% (1)
Mba 851 Project Evaluation
228 pages
Prokon Tutorials
95% (19)
Prokon Tutorials
48 pages
Hybrid Architecture in Context of Kathmandu Valley (Writing Sample Rai Roby)
No ratings yet
Hybrid Architecture in Context of Kathmandu Valley (Writing Sample Rai Roby)
39 pages
User Manual II Erbe Vio 3
No ratings yet
User Manual II Erbe Vio 3
77 pages
Introduction To Query Processing and Optimization
No ratings yet
Introduction To Query Processing and Optimization
4 pages
Query Processing
No ratings yet
Query Processing
5 pages
2 Chapter 3 Query Optimization
No ratings yet
2 Chapter 3 Query Optimization
29 pages
CH - 2 Query Process
No ratings yet
CH - 2 Query Process
44 pages
Chapter 1 Introduction - Review Questions
No ratings yet
Chapter 1 Introduction - Review Questions
82 pages
Dlmdmdql01 Course Book
No ratings yet
Dlmdmdql01 Course Book
104 pages
Mathematical-Economics Solved MCQs (Set-4)
No ratings yet
Mathematical-Economics Solved MCQs (Set-4)
8 pages
MathType Training Handout
No ratings yet
MathType Training Handout
24 pages
Python in One Shot
No ratings yet
Python in One Shot
10 pages
001-2023-0921 DLMDSBDT01 Course Book
No ratings yet
001-2023-0921 DLMDSBDT01 Course Book
124 pages
Introduction To Computer Programming Using Python Comp 111
No ratings yet
Introduction To Computer Programming Using Python Comp 111
227 pages
Python Mastery - 2 BOOK IN 1
No ratings yet
Python Mastery - 2 BOOK IN 1
438 pages
Basics of Information Technology
0% (1)
Basics of Information Technology
61 pages
Sophia Introduction To Programming With Python Syllabus
No ratings yet
Sophia Introduction To Programming With Python Syllabus
4 pages
Law Book 1
No ratings yet
Law Book 1
83 pages
Basic SQL: ITCS 201 Web Programming Part II
No ratings yet
Basic SQL: ITCS 201 Web Programming Part II
29 pages
Python Cheat Sheet
100% (1)
Python Cheat Sheet
27 pages
INTRODUCTION TO PYTHON Version 1 WITH SO
No ratings yet
INTRODUCTION TO PYTHON Version 1 WITH SO
158 pages
Database Solutions (2nd Edition) Thomas M Connolly & Carolyn e B
100% (5)
Database Solutions (2nd Edition) Thomas M Connolly & Carolyn e B
82 pages
MathType - Equation Editor Tips
No ratings yet
MathType - Equation Editor Tips
9 pages
INF1511 - Chapter 1 - Python and Its Features
No ratings yet
INF1511 - Chapter 1 - Python and Its Features
3 pages
BIG DATA ANALYTICS - Syllabus
No ratings yet
BIG DATA ANALYTICS - Syllabus
4 pages
04 Python 1
No ratings yet
04 Python 1
43 pages
Data Structures Using Python Lab Manual (R20a0583)
No ratings yet
Data Structures Using Python Lab Manual (R20a0583)
71 pages
Data Science Book
No ratings yet
Data Science Book
383 pages
2021 2023 Syllabus
No ratings yet
2021 2023 Syllabus
31 pages
An Introduction To R
No ratings yet
An Introduction To R
212 pages
Unit 1 Full Notes
No ratings yet
Unit 1 Full Notes
52 pages
Presentation ON RDBMS: Submitted By-Dilpreet Singh Joginder Singh Class - Mba (Bu) 3 SEM
100% (3)
Presentation ON RDBMS: Submitted By-Dilpreet Singh Joginder Singh Class - Mba (Bu) 3 SEM
11 pages
SQL Statements: - Select - Insert - Update - Delete - Create - Alter - Drop - Rename - Truncate - Commit - Rollback - Savepoint
100% (1)
SQL Statements: - Select - Insert - Update - Delete - Create - Alter - Drop - Rename - Truncate - Commit - Rollback - Savepoint
231 pages
SQL Programming For Beginners The Ultimate Beginners Guide To Analyze and Manipulate Data With SQL (2020)
100% (2)
SQL Programming For Beginners The Ultimate Beginners Guide To Analyze and Manipulate Data With SQL (2020)
88 pages
Mining Social Network Graphs
No ratings yet
Mining Social Network Graphs
35 pages
R Lnaguager
No ratings yet
R Lnaguager
38 pages
Sqlalchemy 0 7 3
No ratings yet
Sqlalchemy 0 7 3
540 pages
File Hanling - New - C++
No ratings yet
File Hanling - New - C++
26 pages
Practical List Ip
100% (1)
Practical List Ip
10 pages
Lecture On Principles of Programming Languages
No ratings yet
Lecture On Principles of Programming Languages
34 pages
OPR Cheat Sheet: Graphical Method
No ratings yet
OPR Cheat Sheet: Graphical Method
3 pages
CMPG111 Semester Planner
No ratings yet
CMPG111 Semester Planner
3 pages
DBMS Module 2
No ratings yet
DBMS Module 2
125 pages
Data Science Book
No ratings yet
Data Science Book
722 pages
R Programming Course Notes
No ratings yet
R Programming Course Notes
28 pages
Lecture 2-Variables, Data Types and Arithmetic Expressions
No ratings yet
Lecture 2-Variables, Data Types and Arithmetic Expressions
35 pages
JavaScript Functional Programming Made Simple: A Practical Guide with Examples
From Everand
JavaScript Functional Programming Made Simple: A Practical Guide with Examples
William E. Clark
No ratings yet
XGBoost for Regression Predictive Modeling and Time Series Analysis: Learn how to build, evaluate, and deploy predictive models with expert guidance
From Everand
XGBoost for Regression Predictive Modeling and Time Series Analysis: Learn how to build, evaluate, and deploy predictive models with expert guidance
Partha Pritam Deka
No ratings yet
Distributed Query Processing
No ratings yet
Distributed Query Processing
31 pages
Distributed Query Processing
No ratings yet
Distributed Query Processing
17 pages
Distributed Databases Data Warehousing: CPS 216 Advanced Database Systems
No ratings yet
Distributed Databases Data Warehousing: CPS 216 Advanced Database Systems
11 pages
Chapter 8
No ratings yet
Chapter 8
65 pages
CSE 453 Slide 3
No ratings yet
CSE 453 Slide 3
72 pages
Ch-9 Multidatabase Query Processing
No ratings yet
Ch-9 Multidatabase Query Processing
46 pages
Adbms Notes
No ratings yet
Adbms Notes
17 pages
Query Processing Concepts
No ratings yet
Query Processing Concepts
99 pages
Data and Computer Communications: - Wireless Lans
No ratings yet
Data and Computer Communications: - Wireless Lans
55 pages
Unit I (Distributed Databases)
No ratings yet
Unit I (Distributed Databases)
8 pages
SISP
100% (1)
SISP
23 pages
XEN Demo
No ratings yet
XEN Demo
17 pages
Uml Diagram
No ratings yet
Uml Diagram
20 pages
Unit II Requirements Elicitation
No ratings yet
Unit II Requirements Elicitation
23 pages
Recurrences and Continued Fractions
No ratings yet
Recurrences and Continued Fractions
73 pages
Java Application Development For Blackberry Smart Phone
100% (1)
Java Application Development For Blackberry Smart Phone
16 pages
Safe Work Method Statement of C & I
No ratings yet
Safe Work Method Statement of C & I
21 pages
Lab No2 CMT
No ratings yet
Lab No2 CMT
7 pages
Weep Dang
No ratings yet
Weep Dang
81 pages
NetApp Clustered Data ONTAP 82 CLI Commands Sort by Task
No ratings yet
NetApp Clustered Data ONTAP 82 CLI Commands Sort by Task
6 pages
Building Regs Manual
No ratings yet
Building Regs Manual
98 pages
PRAKASAM Irrigation Profile
No ratings yet
PRAKASAM Irrigation Profile
32 pages
Pedoman PMMB-Ver1
100% (1)
Pedoman PMMB-Ver1
185 pages
Commissioning of Air Conditioning Systems
No ratings yet
Commissioning of Air Conditioning Systems
3 pages
Hand Book For Perforated Metal (IPRF - DES)
100% (1)
Hand Book For Perforated Metal (IPRF - DES)
124 pages
Rajdeep Project On Fire Risk Analysis in Power Plant
100% (1)
Rajdeep Project On Fire Risk Analysis in Power Plant
29 pages
ST-07 Computer Appreciation
No ratings yet
ST-07 Computer Appreciation
14 pages
Section
No ratings yet
Section
18 pages
MasterGlenium 51 v1
No ratings yet
MasterGlenium 51 v1
3 pages
False Ceiling Details 1
No ratings yet
False Ceiling Details 1
1 page
En Security Chp3 PTActA AAA Student
No ratings yet
En Security Chp3 PTActA AAA Student
4 pages
S - 301 PIER DWG PILE DETAIL WAL TYPE-PILE FDN - PDF 2 of 2
No ratings yet
S - 301 PIER DWG PILE DETAIL WAL TYPE-PILE FDN - PDF 2 of 2
1 page
Corrosion of Steel in Concrete
No ratings yet
Corrosion of Steel in Concrete
2 pages
PostMaster Mail Server Presentation
No ratings yet
PostMaster Mail Server Presentation
22 pages
2024 Internship Report
No ratings yet
2024 Internship Report
19 pages
Pack 1B 1. Texts For Studying The Architectural Vocabulary
No ratings yet
Pack 1B 1. Texts For Studying The Architectural Vocabulary
12 pages
Kaid Ahmed Ali Kaid: DR / Zaid Thabet
No ratings yet
Kaid Ahmed Ali Kaid: DR / Zaid Thabet
23 pages
Luxe Collection
No ratings yet
Luxe Collection
54 pages
Schedule of Events - Pinkprint
No ratings yet
Schedule of Events - Pinkprint
2 pages
Acacia Brochure
No ratings yet
Acacia Brochure
2 pages
Engineering Awards 2017: Award of Special Merit
No ratings yet
Engineering Awards 2017: Award of Special Merit
5 pages

Distributed Query Processing

Uploaded by

Distributed Query Processing

Uploaded by

Distributed Query Processing

Donald Kossmann University of Heidelberg [email protected]

The Role of Web Services

Centralized vs. Distributed Query Processing

Query Processing 101

Step 1: Translate Query into Algebra

Step 2: Optimize Query (physical and logical)

SELECT A.d FROM A, B WHERE A.a = B.b AND A.c = 35

A.a = B.b, A.c = 35 X B

(John, 35, CS)

(Edinburgh, CS,5.0) (Edinburgh, AS, 6.0)

Summary: Centralized Queries

Distributed Query Processing 101

Distributed Query Plan

Total Cost = Sum of Cost of Ops

Example: double pipelined hash joins

re-optimization at run time

IBM Data Joiner

Data Joiner wrapper sub query wrapper sub query

XML Data Integration

Middleware (XML) XML query wrapper XML query wrapper

XML Data Integration

Middleware provides global schema

A Note on Web Services

Problem: Exploit Capability of Source

Solution: WSDL for Query Planning

You might also like