In the context of distributed database systems, query processing involves the transformation of
high-level queries (e.g., SQL) into efficient execution strategies that can be run across multiple,
physically distributed database sites. The main objectives of query processing in such systems are:
1. Correctness
• Ensure that the semantics of the query remain consistent across the distributed environment.
• Results should be the same as if the query was executed on a centralized database, even if
data is partitioned or replicated.
2. Efficiency / Performance
• Minimize response time and resource usage (CPU, memory, I/O, and network bandwidth).
• Efficiently manage data transfer between sites, which is often the biggest performance
bottleneck in distributed systems.
3. Minimizing Data Transfer
• Reduce the amount of data sent over the network, as inter-site communication is costly.
• Use strategies like fragment queries, and query shipping to localize data processing as
much as possible.
4. Parallelism
• Take advantage of parallel execution across sites to improve query performance.
• Distribute subqueries or fragments of the query to different sites to execute in parallel.
5. Site Autonomy
• Respect the independence of each site, especially in loosely coupled systems.
• Query plans should minimize interference with local transactions and operations.
6. Optimization
• Choose the best execution plan based on statistics, cost models, and data distribution.
• Must consider data fragmentation (horizontal, vertical, or mixed) and replication.
7. Transparency
• Provide location transparency: users should not need to know where data resides.
• Handle fragmentation transparency, ensuring the query runs correctly regardless of how
data is split across sites.