0% found this document useful (0 votes)

39 views5 pages

Query Proceessing

The document discusses query processing in distributed database systems. It describes how a query is processed across multiple layers: query decomposition transforms a query into an algebraic query on global relations; data localization determines which fragments are involved and transforms the query into one on fragments; and global query optimization finds an optimal execution strategy and ordering of operators to minimize costs like communication between sites. The overall goal is to efficiently process a query expressed on the distributed database as if it were a single database.

Uploaded by

Deepa Deepa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views5 pages

Query Proceessing

Uploaded by

Deepa Deepa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Query Processing in Distributed Database

System
Issue No. 03 - May (1979 vol. 5)
ISSN: 0098-5589
pp: 177-187
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TSE.1979.234179

S.B. Yao
A.R. Hevner , Department of Computer Science, Purdue University
ABSTRACT
Query processing in a distributed system requires the transmission f data between
computers in a network. The arrangement of data transmissions and local data
processing is known as a distribution strategy for a query. Two cost measures, response
time and total time are used to judge the quality of a distribution strategy. Simple
algorithms are presented that derive distribution strategies which have minimal response
time and minimal total time, for a special class of queries. These optimal algorithms are
used as a basis to develop a general query processing algorithm. Distributed query
examples are presented and the complexity of the general algorithm is analyzed. The
integration of a query processing subsystem into a distributed database management
system is discussed.
INDEX TERMS
system modeling, Computer network, database, distributed database systems,
distributed processing, distribution strategy, heuristic algorithms, query processing,
redundant data, relational data model

Discuss the objectives of Query Processing.

Ans. Objectives of Query Processing

 The main objectives of query processing in a distributed environment is to
form a high level query on a distributed database, which is seen as a single
database by the users, into an efficient execution strategy expressed in a
low level language in local databases.
 An important point of query processing is query optimization. Because
many execution strategies are correct transformations of the same high
level query the one that optimizes (minimizes) resource consumption
should be retained.

 The good measure of resource consumption are:

o The total cost that will be incurred in processing the query. It is the dome of
all times incurred in processing the operations of the query at various sites
and intrinsic communication.

o The resource time of the query. This is the time elapsed for executing the
query. Since operations can be executed in parallel at different sited, the
response time of a query may be significantly less than its cost.

 Obviously the total cost should be minimized.

o In a distributed system, the total cost to be minimized includes CPU, I\O,

and communication costs. This cost can be minimized by reducing the
number of I\O operation through fast access methods to the data and
efficient use of main memory. The communication cost is the time needed
for exchanging the data between sited participating in the execution of the
query.

o In centralized systems, only CPU and I\O cost have to be considered.

Layers of Query Processing
The problem of query processing can itself be decomposed into several subproblems, corresponding
to various layers. A generic layering scheme for query processing is shown where each layer solves a
well-defined subproblem. To simplify the discussion, let us assume a static and semicentralized query
processor that does not exploit replicated fragments. The input is a query on global data expressed in
relational calculus. This query is posed on global (distributed) relations, meaning that data distribution
is hidden. Four main layers are involved in distributed query processing. The first three layers map the
input query into an optimized distributed query execution plan. They perform the functions of query
decomposition, data localization, and global query optimization. Query decomposition and data
localization correspond to query rewriting. The first three layers are performed by a central control site
and use schema information stored in the global directory. The fourth layer performs distributed query
execution by executing the plan and returns the answer to the query. It is done by the local sites and
the control site.

Generic Layering Scheme for Distributed Query Processing

Query Decomposition
The first layer decomposes the calculus query into an algebraic query on global relations. The
information needed for this transformation is found in the global conceptual schema describing the
global relations. However, the information about data distribution is not used here but in the next
layer. Thus the techniques used by this layer are those of a centralized DBMS.

Query decomposition can be viewed as four successive steps. First, the calculus query is rewritten in
a normalized form that is suitable for subsequent manipulation. Normalization of a query generally
involves the manipulation of the query quantifiers and of the query qualification by applying logical
operator priority.

Second, the normalized query is analyzed semantically so that incorrect queries are detected and
rejected as early as possible. Techniques to detect incorrect queries exist only for a subset of
relational calculus. Typically, they use some sort of graph that captures the semantics of the query.

Third, the correct query (still expressed in relational calculus) is simplified. One way to simplify a
query is to eliminate redundant predicates. Note that redundant queries are likely to arise when a
query is the result of system transformations applied to the user query. Such transformations are used
for performing semantic data control (views, protection, and semantic integrity control).

Fourth, the calculus query is restructured as an algebraic query. That several algebraic queries can
be derived from the same calculus query, and that some algebraic queries are “better” than others.
The quality of an algebraic query is defined in terms of expected performance. The traditional way to
do this transformation toward a “better” algebraic specification is to start with an initial algebraic query
and transform it in order to find a “good” one. The initial algebraic query is derived immediately from
the calculus query by translating the predicates and the target statement into relational operators as
they appear in the query. This directly translated algebra query is then restructured through
transformation rules. The algebraic query generated by this layer is good in the sense that the worse
executions are typically avoided. For instance, a relation will be accessed only once, even if there are
several select predicates. However, this query is generally far from providing an optimal execution,
since information about data distribution and fragment allocation is not used at this layer.
Data Localization
The input to the second layer is an algebraic query on global relations. The main role of the second
layer is to localize the query’s data using data distribution information in the fragment schema. We
saw that relations are fragmented and stored in disjoint subsets, called fragments, each being stored
at a different site. This layer determines which fragments are involved in the query and transforms the
distributed query into a query on fragments. Fragmentation is defined by fragmentation predicates
that can be expressed through relational operators. A global relation can be reconstructed by applying
the fragmentation rules, and then deriving a program, called a localization program, of relational
algebra operators, which then act on fragments. Generating a query on fragments is done in two
steps. First, the query is mapped into a fragment query by substituting each relation by its
reconstruction program (also called materialization program). Second, the fragment query is simplified
and restructured to produce another “good” query. Simplification and restructuring may be done
according to the same rules used in the decomposition layer. As in the decomposition layer, the final
fragment query is generally far from optimal because information regarding fragments is not utilized.

Global Query Optimization

The input to the third layer is an algebraic query on fragments. The goal of query optimization is to
find an execution strategy for the query which is close to optimal. Remember that finding the optimal
solution is computationally intractable. An execution strategy for a distributed query can be described
with relational algebra operators and communication primitives (send/receive operators) for
transferring data between sites. The previous layers have already optimized the query, for example,
by eliminating redundant expressions. However, this optimization is independent of fragment
characteristics such as fragment allocation and cardinalities. In addition, communication operators are
not yet specified. By permuting the ordering of operators within one query on fragments, many
equivalent queries may be found.

Query optimization consists of finding the “best” ordering of operators in the query, including
communication operators that minimize a cost function. The cost function, often defined in terms of
time units, refers to computing resources such as disk space, disk I/Os, buffer space, CPU cost,
communication cost, and so on. Generally, it is a weighted combination of I/O, CPU, and
communication costs. Nevertheless, a typical simplification made by the early distributed DBMSs, as
we mentioned before, was to consider communication cost as the most significant factor. This used to
be valid for wide area networks, where the limited bandwidth made communication much more costly
than local processing. This is not true anymore today and communication cost can be lower than I/O
cost. To select the ordering of operators it is necessary to predict execution costs of alternative
candidate orderings. Determining execution costs before query execution (i.e., static optimization) is
based on fragment statistics and the formulas for estimating the cardinalities of results of relational
operators. Thus the optimization decisions depend on the allocation of fragments and available
statistics on fragments which are recorder in the allocation schema.

An important aspect of query optimization is join ordering, since permutations of the joins within the
query may lead to improvements of orders of magnitude. One basic technique for optimizing a
sequence of distributed join operators is through the semijoin operator. The main value of the semijoin
in a distributed system is to reduce the size of the join operands and then the communication cost.
However, techniques which consider local processing costs as well as communication costs may not
use semijoins because they might increase local processing costs. The output of the query
optimization layer is a optimized algebraic query with communication operators included on
fragments. It is typically represented and saved (for future executions) as a distributed query
execution plan.

Distributed Query Execution

The last layer is performed by all the sites having fragments involved in the query. Each subquery
executing at one site, called a local query, is then optimized using the local schema of the site and
executed. At this time, the algorithms to perform the relational operators may be chosen. Local
optimization uses the algorithms of centralized systems.
The goal of distributed query processing may be summarized as follows: given a calculus query on a
distributed database, find a corresponding execution strategy that minimizes a system cost function
that includes I/O, CPU, and communication costs. An execution strategy is specified in terms of
relational algebra operators and communication primitives (send/receive) applied to the local
databases (i.e., the relation fragments). Therefore, the complexity of relational operators that affect
the performance of query execution is of major importance in the design of a query processor.

Lect#2 DDBS (Characteristics and Layers of Query Processing)
80% (10)
Lect#2 DDBS (Characteristics and Layers of Query Processing)
20 pages
VSICM7 M10 Lifecycle Management
100% (1)
VSICM7 M10 Lifecycle Management
62 pages
Module 2
No ratings yet
Module 2
17 pages
Distributed Database Management Notes - 3
86% (7)
Distributed Database Management Notes - 3
48 pages
A Survey of Distributed Query Optimization
No ratings yet
A Survey of Distributed Query Optimization
10 pages
Unit II QUERY PROCESSING AND DECOMPOSITION
No ratings yet
Unit II QUERY PROCESSING AND DECOMPOSITION
24 pages
Final DBMS Unit 7
No ratings yet
Final DBMS Unit 7
48 pages
DDBMS-Chapter-4-SE-LectureNote (Version 1)
No ratings yet
DDBMS-Chapter-4-SE-LectureNote (Version 1)
11 pages
DDB Lec 4 PDF
No ratings yet
DDB Lec 4 PDF
69 pages
Introduction To Query Processing and Optimization
No ratings yet
Introduction To Query Processing and Optimization
4 pages
DDBS Unit 2
No ratings yet
DDBS Unit 2
7 pages
Query
No ratings yet
Query
13 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
31 pages
ADB - Unit - III (Chapter-2) - Query Processing and Decomposition
No ratings yet
ADB - Unit - III (Chapter-2) - Query Processing and Decomposition
42 pages
Query Processing
0% (1)
Query Processing
15 pages
Chapter 2 Adb
No ratings yet
Chapter 2 Adb
21 pages
Phases of QP
No ratings yet
Phases of QP
6 pages
Distributed Query Processing
No ratings yet
Distributed Query Processing
31 pages
Query
No ratings yet
Query
104 pages
Query Processing
No ratings yet
Query Processing
121 pages
Query Processing in Distributed Database
No ratings yet
Query Processing in Distributed Database
20 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
28 pages
4 Query Processing
No ratings yet
4 Query Processing
79 pages
4-Query - Processing (1) - PTIT
No ratings yet
4-Query - Processing (1) - PTIT
72 pages
Implications of A Distributed Environment Part 2
No ratings yet
Implications of A Distributed Environment Part 2
38 pages
Chapter 5: Overview of Query Processing
No ratings yet
Chapter 5: Overview of Query Processing
18 pages
Advanced Database Chapter Two Query Processing and Optimization
100% (1)
Advanced Database Chapter Two Query Processing and Optimization
43 pages
CSE 453 Slide 3
No ratings yet
CSE 453 Slide 3
72 pages
ADBMS Chapter One
No ratings yet
ADBMS Chapter One
21 pages
Advancedchapter 2 2013
No ratings yet
Advancedchapter 2 2013
16 pages
Query Optimization
No ratings yet
Query Optimization
29 pages
Data Structures Explained: A Practical Guide with Examples
From Everand
Data Structures Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
Itm661 Lecture03 Part2 2015
No ratings yet
Itm661 Lecture03 Part2 2015
47 pages
Query Processing
No ratings yet
Query Processing
28 pages
Advanced Database Systems Chapter 2
100% (1)
Advanced Database Systems Chapter 2
16 pages
Presentation9 - Query Processing and Query Optimization in DBMS
No ratings yet
Presentation9 - Query Processing and Query Optimization in DBMS
36 pages
Unit 6
No ratings yet
Unit 6
34 pages
Query Processing Concepts
No ratings yet
Query Processing Concepts
99 pages
Distributed Query Processing
No ratings yet
Distributed Query Processing
24 pages
DBMS Unit 4
No ratings yet
DBMS Unit 4
9 pages
UNIT 4 Query Processing and Different Types of Databases
No ratings yet
UNIT 4 Query Processing and Different Types of Databases
13 pages
Distributed Query Optimization: Oscar Romero Alberto Abelló Gamazo
No ratings yet
Distributed Query Optimization: Oscar Romero Alberto Abelló Gamazo
44 pages
Layers of Query Processing
No ratings yet
Layers of Query Processing
11 pages
Introduction To Database Management Systems CS470
No ratings yet
Introduction To Database Management Systems CS470
11 pages
Query Optimization
No ratings yet
Query Optimization
11 pages
QUERY Processing and Relational Algebra
No ratings yet
QUERY Processing and Relational Algebra
27 pages
Chapter 1 Query Processing
No ratings yet
Chapter 1 Query Processing
26 pages
Uds24201j Unit III
No ratings yet
Uds24201j Unit III
34 pages
6-Query Intro
No ratings yet
6-Query Intro
15 pages
QueryProcessing Lect 3
No ratings yet
QueryProcessing Lect 3
26 pages
Chapter 1 Query Processing and Optimization
No ratings yet
Chapter 1 Query Processing and Optimization
13 pages
Outline: Distributed Query Processing
No ratings yet
Outline: Distributed Query Processing
8 pages
Lecture 4 Query Processing
No ratings yet
Lecture 4 Query Processing
18 pages
Adbms Unit2
No ratings yet
Adbms Unit2
20 pages
Query Processing in DBMS
No ratings yet
Query Processing in DBMS
22 pages
DDS Unit - 2
No ratings yet
DDS Unit - 2
7 pages
Vu Lec 30
No ratings yet
Vu Lec 30
28 pages
Query Processing 1
No ratings yet
Query Processing 1
13 pages
Graph Layout Support for Model-Driven Engineering
From Everand
Graph Layout Support for Model-Driven Engineering
Miro Spönemann
No ratings yet
DBMS Unit - 7
No ratings yet
DBMS Unit - 7
33 pages
SF8 - Unit 2 DDB
No ratings yet
SF8 - Unit 2 DDB
97 pages
WWW Sanfoundry Com C Program Number Divisible by 5
No ratings yet
WWW Sanfoundry Com C Program Number Divisible by 5
5 pages
HPC Mini Project
No ratings yet
HPC Mini Project
12 pages
Tutorial CoDeSys V2 1 en
0% (1)
Tutorial CoDeSys V2 1 en
56 pages
QB ECC402 May 2022 Examination Microcontroller SE EXTC Sem IV 2021-22
No ratings yet
QB ECC402 May 2022 Examination Microcontroller SE EXTC Sem IV 2021-22
9 pages
Andy Timmons Resolution Songbook PDF
0% (2)
Andy Timmons Resolution Songbook PDF
3 pages
Naukri AbhishekShingare (1y 0m)
No ratings yet
Naukri AbhishekShingare (1y 0m)
2 pages
Java Notes - Term 2 2022
No ratings yet
Java Notes - Term 2 2022
8 pages
How To Collect IBM DSA Log From Linux
No ratings yet
How To Collect IBM DSA Log From Linux
2 pages
DS-7324/7332HWI-SH DVR: Series
No ratings yet
DS-7324/7332HWI-SH DVR: Series
2 pages
HP DesignJet T630 Printer Series
No ratings yet
HP DesignJet T630 Printer Series
2 pages
Computer Hardware 101 Lesson Plan 8-27-13
No ratings yet
Computer Hardware 101 Lesson Plan 8-27-13
5 pages
Kubernetes-Personal Notes-Naresh Kumar Chityala
No ratings yet
Kubernetes-Personal Notes-Naresh Kumar Chityala
9 pages
Installation Configuration Guide
No ratings yet
Installation Configuration Guide
265 pages
Part 2
No ratings yet
Part 2
8 pages
Awr Vs Ash
No ratings yet
Awr Vs Ash
22 pages
Activation PDF
No ratings yet
Activation PDF
270 pages
Dice Game
100% (1)
Dice Game
17 pages
Using Dynamic Data Exchange (DDE) and Macro Variables To Create Custom Excel Workbooks With Multi-Sheets
No ratings yet
Using Dynamic Data Exchange (DDE) and Macro Variables To Create Custom Excel Workbooks With Multi-Sheets
7 pages
32pfl3518g 78 Fhi BRP PDF
No ratings yet
32pfl3518g 78 Fhi BRP PDF
3 pages
Fundamental Research of Distributed Database PDF
No ratings yet
Fundamental Research of Distributed Database PDF
9 pages
Drive Image Manual
No ratings yet
Drive Image Manual
94 pages
Drop Box
No ratings yet
Drop Box
3 pages
Level 3 Repair: 8-1. Components Layout
No ratings yet
Level 3 Repair: 8-1. Components Layout
63 pages
Virtualization For Dummies
No ratings yet
Virtualization For Dummies
43 pages
Cab Programming Manual x4
No ratings yet
Cab Programming Manual x4
661 pages
22477ittstm U4 cp1
No ratings yet
22477ittstm U4 cp1
16 pages
Çalışıyor Gibi
No ratings yet
Çalışıyor Gibi
11 pages
Practical Finite Element Analysis by Nitin S Gokhale PDF
50% (6)
Practical Finite Element Analysis by Nitin S Gokhale PDF
3 pages
Pic 1
No ratings yet
Pic 1
71 pages

Query Proceessing

Uploaded by

Query Proceessing

Uploaded by

Query Processing in Distributed Database

Discuss the objectives of Query Processing.

Ans. Objectives of Query Processing

 The good measure of resource consumption are:

 Obviously the total cost should be minimized.

o In a distributed system, the total cost to be minimized includes CPU, I\O,

o In centralized systems, only CPU and I\O cost have to be considered.

Generic Layering Scheme for Distributed Query Processing

Global Query Optimization

Distributed Query Execution

You might also like