Principles of Distributed Database
Systems
M. Tamer Özsu
Patrick Valduriez
© 2020, M.T. Özsu & P. Valduriez 1
Outline
◼ Introduction
◼ Distributed and parallel database design
◼ Distributed data control
◼ Distributed Query Processing
◼ Distributed Transaction Processing
◼ Data Replication
◼ Database Integration – Multidatabase Systems
◼ Parallel Database Systems
◼ Peer-to-Peer Data Management
◼ Big Data Processing
◼ NoSQL, NewSQL and Polystores
◼ Web Data Management
© 2020, M.T. Özsu & P. Valduriez 2
Outline
◼ Distributed Query Processing
❑ Query Decomposition and Localization
❑ Join Ordering
❑ Distributed Query Optimization
❑ Adaptive Query Processing
© 2020, M.T. Özsu & P. Valduriez 3
Query Processing in a DDBMS
◼ Generally, a query in distributed DBMS require data from
multiple sites, and this is called transmission of data that
causes communication costs.
◼ Query processing in DBMS is different from centralized
DBMS due to the communication cost of data transfer
over the network.
◼ The transmission cost is low when the sites are
connected through high-speed network and is quite
significant in another network.
© 2020, M.T. Özsu & P. Valduriez 4
Query Processing in a DDBMS
◼ In distributed query processing, the data transfer
cost of distributed query processing means:
❑ Cost of transferring intermediate files to other sites for
processing and
❑ Cost of transferring the ultimate result file to the
location where the results required
© 2020, M.T. Özsu & P. Valduriez 5
Distributed DBMS Environment
• If s1 request a query and
needs data from s2 and
s3.
• It is decided to execute
the query at s3.
• 1st communication cost is
transferring the data from
s2 to s3 → then s3 will
execute the query and
get the result.
• 2nd communication cost is
transferring the result
from s3 to s1
© 2020, M.T. Özsu & P. Valduriez 6
Query Processing in a DDBMS
High level user query
Query
Processor
Low-level data manipulation
commands for D-DBMS
© 2020, M.T. Özsu & P. Valduriez 7
Query Processing Components
◼ Query language
❑ SQL: “intergalactic dataspeak”
◼ Query execution
❑ The steps that one goes through in executing high-level
(declarative) user queries.
◼ Query optimization
❑ How do we determine the “best” execution plan?
◼ We assume a homogeneous D-DBMS
© 2020, M.T. Özsu & P. Valduriez 8
Selecting Alternatives
Find the names of employee who are
managing a project
Strategy 1
SELECT ENAME
FROM EMP , ASG
WHERE emp.no=asg.no and RESP = "Manager"
Strategy 1
ENAME(RESP=“Manager”EMP.ENO=ASG.ENO(EMP×ASG))
© 2020, M.T. Özsu & P. Valduriez 9
Selecting Alternatives
Strategy 2
SELECT ENAME
FROM EMP NATURAL JOIN ASG
WHERE RESP = "Manager"
Strategy 2
ENAME(EMP ⋈ENO (RESP=“Manager” (ASG))
Strategy 2 avoids Cartesian product, and consumes less
computing resources, so may be “better”
© 2020, M.T. Özsu & P. Valduriez 10
Selecting Alternatives
In a distributed system,
◼ Relational algebra is not enough to express execution
strategies. It must be supplemented with operators for
exchanging data between sites.
◼ The distributed query processor must also select the
best sites to process data , and possibly the way data
should be transformed.
© 2020, M.T. Özsu & P. Valduriez 11
What is the Problem?
Site 1 Site 2 Site 3 Site 4 Site 5
ASG1=ENO≤“E3”(ASG) ASG2= ENO>“E3”(ASG) EMP1= ENO≤“E3”(EMP) EMP2= ENO>“E3”(EMP) Result
Strategy A Strategy B
© 2020, M.T. Özsu & P. Valduriez 12
Cost of Alternatives
Assume
◼ size(EMP) = 400 row,
size(ASG) = 1000
◼ tuple access cost = 1
unit (1 operation or 1s);
◼ tuple transfer cost = 10
units
◼ There are 20 managers
in relation ASG
◼ Assume that the data is
uniformly distributed
among sites
© 2020, M.T. Özsu & P. Valduriez 13
Cost of Alternatives
◼ Strategy A
❑ produce ASG': (10+10) tuple access cost 20
❑ transfer ASG' to the sites of EMP: (10+10)
tuple transfer cost 200
❑ produce EMP': (10+10) tuple access cost
2 40
❑ transfer EMP' to result site: (10+10) tuple
transfer cost 200
Total Cost 460
◼ Strategy B
❑ transfer EMP to site 5: 400 tuple transfer
cost 4,000
❑ transfer ASG to site 5: 1000 tuple transfer
cost 10,000
❑ produce ASG': 1000 tuple access (apply
condition) 1,000
❑ join EMP and ASG': 400 20(manager) tuple
access cost 8,000
Total Cost 23,000
© 2020, M.T. Özsu & P. Valduriez 14
Query Optimization Objectives
◼ Minimize a cost function
❑ I/O cost + CPU cost + communication cost
❑ These might have different weights in different distributed
environments
◼ Wide area networks
❑ Communication cost may dominate or vary much
◼ Bandwidth
◼ Speed
◼ Protocol overhead
◼ Local area networks
❑ Communication cost not that dominant, so total cost function
should be considered
◼ Can also maximize throughput
© 2020, M.T. Özsu & P. Valduriez 15
Complexity of Relational Operations
Operation Complexity
Select
Project O(n)
◼ Assume (without duplicate elimination)
❑ Relations of cardinality n Project
(with duplicate elimination) O(n log n)
❑ Sequential scan
Group
Join
Semi-join O(n log n)
Division
Set Operators
Cartesian Product O(n2)
© 2020, M.T. Özsu & P. Valduriez 16
Types Of Optimizers
◼ Exhaustive search
❑ Cost-based
❑ Optimal
❑ Combinatorial complexity in the number of relations
◼ Heuristics
❑ Not optimal
❑ Regroup common sub-expressions
❑ Perform selection, projection first
❑ Replace a join by a series of semijoins
❑ Reorder operations to reduce intermediate relation size
❑ Optimize individual operations
© 2020, M.T. Özsu & P. Valduriez 17
Optimization Granularity
◼ Single query at a time
❑ Cannot use common intermediate results
◼ Multiple queries at a time
❑ Efficient if many similar queries
❑ Decision space is much larger
© 2020, M.T. Özsu & P. Valduriez 18
Optimization Timing
◼ Static : optimization is done at query compilation time
❑ Compilation ➔ optimize prior to the execution
❑ Difficult to estimate the size of the intermediate resultserror
propagation
❑ Can amortize over many executions
◼ Dynamic: proceeds at query execution time
❑ Run time optimization
❑ Exact information on the intermediate relation sizes
❑ Have to re-optimize for multiple executions
◼ Hybrid: tradeoff between both
❑ Compile using a static algorithm
❑ If the error in estimate sizes > threshold, re-optimize at run time
© 2020, M.T. Özsu & P. Valduriez 19
Statistics
◼ Relation
❑ Cardinality
❑ Size of a tuple
❑ Fraction of tuples participating in a join with another relation
◼ Attribute
❑ Cardinality of domain
❑ Actual number of distinct values
◼ Simplifying assumptions
❑ Independence between different attribute values
❑ Uniform distribution of attribute values within their domain
© 2020, M.T. Özsu & P. Valduriez 20
Optimization Decision Sites
◼ Centralized
❑ Single site determines the “best” schedule
❑ Simple
❑ Need knowledge about the entire distributed database
◼ Distributed
❑ Cooperation among sites to determine the schedule
❑ Need only local information
❑ Cost of cooperation
◼ Hybrid
❑ One site determines the global schedule
❑ Each site optimizes the local subqueries
© 2020, M.T. Özsu & P. Valduriez 21
Network Topology
◼ Wide area networks (WAN) – point-to-point
❑ Characteristics
◼ Relatively low bandwidth (compared to local CPU/IO)
◼ High protocol overhead
❑ Communication cost may dominate; ignore all other cost factors
❑ Global schedule to minimize communication cost
❑ Local schedules according to centralized query optimization
◼ Local area networks (LAN)
❑ Communication cost not that dominant
❑ Total cost function should be considered
❑ Broadcasting can be exploited (joins)
❑ Special algorithms exist for star networks
© 2020, M.T. Özsu & P. Valduriez 22
Questions?
© 2020, M.T. Özsu & P. Valduriez 23