Distributed Query Processing
Distributed Query Processing
Definition:
Distributed Query Processing refers to the process of formulating, optimizing, and executing a
query in a Distributed Database System (DDS), where the data is stored across multiple,
geographically separated sites.
The goal is to process user queries efficiently, minimizing communication cost, processing
time, and resource usage, while ensuring correct results.
• Breaks the original query into smaller components that are easier to manage and
optimize.
Main tasks:
• Restructuring: Optimize the algebraic form (e.g., push selections, simplify joins).
Goal: Convert the query into an equivalent, efficient representation for further processing.
2. Data Localization (Query Fragmentation)
• Identifies where the required data is physically located across different sites.
• Translates the global query into local queries for execution at relevant sites.
Steps involved:
Goal: Ensure each sub-query accesses only the data it needs from the right location.
3. Global Optimization
• Generates an efficient execution plan for the entire distributed query.
Steps involved:
Goal: Find a globally efficient plan to execute all sub-queries and combine the results.
4. Local Optimization
• Each site optimizes its own sub-query using local database resources and statistics.
Steps Involved:
Diagram (Textual)
User Query (SQL)