04-DistributedDataManagementAndProcessing
04-DistributedDataManagementAndProcessing
and Processing
23D020: Big Data Management for Data Science
Barcelona School of Economics
23D020
Knowledge objectives 13. Recognize the complexity and benefits of data allocation
14. Explain the benefits of replication
1. Give a definition of Distributed System 15. Discuss the alternatives of a distributed catalog
2. Enumerate the 6 challenges of a Distributed System 16. Explain the CAP theorem
3. Give a definition of a Distributed Database 17. Identify the 3 configuration alternatives given by the CAP
4. Explain the different transparency layers in DDBMS theorem
5. Identify the requirements that distribution imposes 18. Explain the 4 synchronization protocols we can have
on the ANSI/SPARC architecture 19. Explain what eventually consistency means
6. Draw a classical reference functional architecture for 20. Enumerate the phases of distributed query processing
DDBMS
21. Explain the difference between data shipping and query
7. Enumerate the 8 main features of Cloud Databases shipping
8. Explain the difficulties of Cloud Database providers to 22. Explain the meaning of “reconstruction” and “reduction” in
have multiple tenants syntactic optimization
9. Enumerate the 4 main problems tenants/users need 23. Enumerate the 4 different cost factors in distributed query
to tackle in Cloud Databases processing
10. Distinguish the cost of sequential and random access 24. Explain the different kinds of parallelism
11. Explain the difference between the cost of sequential 25. Identify the impact of fragmentation in intra-operator
and random access parallelism
12. Distinguish vertical and horizontal fragmentation 26. Explain the impact of tree topologies (i.e., linear and bushy) in
inter-operator parallelism
23D020 3
Distributed System
Distributed DBMS
Cloud DBMS
Distributed Systems
23D020 4
Distributed system
“One in which components located at networked computers communicate
and coordinate their actions only by passing messages.”
G. Coulouris et al.
• Characteristics:
• Concurrency of components
• Independent failures of components
• Lack of a global clock
Network
23D020 5
Challenges of distributed systems
• Openness
• Scalability
• Quality of service
• Performance/Efficiency
• Reliability/Availability
• Confidentiality
• Concurrency Network
• Transparency
• Heterogeneity of components
23D020 6
Scalability
Cope with large workloads
• Scale up
• Scale out
• Use: Network
• Automatic load-balancing
• Avoid:
• Bottlenecks
• Unnecessary communication
• Peer-to-peer
23D020 7
Performance/Efficiency
Efficient processing
• Minimize latencies
• Maximize throughput
• Use
• Parallelism Network
• Network optimization
• Specific techniques
23D020 8
Reliability/Availability
a) Keep consistency
b) Keep the system running
• Even in the case of failures
• Use
Network
• Replication
• Flexible routing
• Heartbeats
• Automatic recovery
23D020 9
Concurrency
Share resources as much as possible
• Use
• Consensus Protocols
Network
• Avoid
• Interferences
• Deadlocks
23D020 10
Transparency
a) Hide implementation (i.e., physical) details to the users
b) Make transparent to the user all the mechanisms to solve the other
challenges
Network
23D020 11
Further objectives
• Use
• Platform-independent software
• Avoid
• Complex configurations
• Specific hardware/software Network
23D020 12
Distributed System
Distributed DBMS
Cloud DBMS
23D020 13
Distributed database
“A Distributed DataBase (DDB) is an integrated collection of databases that is physically
distributed across sites in a computer network. A Distributed DataBase Management
System (DDBMS) is the software system that manages a distributed database such that
the distribution aspects are transparent to the users.”
Encyclopedia of Database Systems
Network Network
23D020 14
Transparency layers (I)
Fragmentation Transparency
Replication Transparency
Update Transparency
Network Transparency
Name Transparency
Location Transparency
Data Independence
23D020 15
Transparency layers (II)
• Fragmentation transparency
• The user must not be aware of the existence of different fragments
• Replication transparency
• The user must not be aware of the existing replicas
• Network transparency
• Data access must be independent regardless where data is located
• Each data object must have a unique name
• Data independence at the logical and physical level must be guaranteed
• Inherited from centralized DBMSs (ANSI SPARC)
23D020 16
Classification According to Degree of Autonomy
23D020 17
Extended ANSI-SPARC Architecture of Schemas
Query Manager
Execution Manager
Scheduler
23D020 19
Distributed DBMS Functional Architecture
Global Query Manager External
One coordinator
Schema
View Security Constraint Query
GLOBAL CATALOG
Manager Manager Checker Optimizer Global
Conceptual
Schema
Fragment
Global Execution Manager Schema
Allocation
Schema
Global Scheduler
…
Local Query Manager Local
Conceptual
Schema
Many workers
LOCAL CATALOG
LOCAL CATALOG
Internal
Schema
Operating Recovery Data Manager
Manager Log Data Manager
system
…
23D020 20
Distributed System
Distributed DBMS
Cloud DBMS
Cloud Databases
23D020 21
Parallel database architectures
23D020 22
Key Features of Cloud Databases
• Scalability
a) Ability to horizontally scale (scale out)
• Quality of service
• Performance/Efficiency
b) Fragmentation: Replication & Distribution
c) Indexing: Distributed indexes and RAM
• Reliability/Availability
• Concurrency Network
d) Weaker concurrency model than ACID
• Transparency
e) Simple call level interface or protocol
• No declarative query language
• Further objectives
f) Flexible schema
• Ability to dynamically add new attributes
g) Quick/Cheap set up
h) Multi-tenancy
23D020 23
Multi-tenancy platform problems (provider side)
• Difficulty: Unpredictable load characteristics
• Variable popularity
• Flash crowds
• Variable resource requirements
• Requirement: Support thousands of tenants
a) Maintain metadata about tenants (e.g., activated features)
b) Self-managing
c) Tolerating failures
d) Scale-out is necessary (sooner or later)
• Rolling upgrades one server at a time
e) Elastic load balancing
• Dynamic partitioning of databases
23D020 24
Data management problems (tenant side)
I. (Distributed) data design
• Data fragmentation
• Data allocation
• Data replication
II. (Distributed) catalog management
• Metadata fragmentation
• Metadata allocation
• Metadata replication
III. (Distributed) transaction management
• Enforcement of ACID properties
• Distributed recovery system
• Distributed concurrency control system
• Replica consistency
• Latency & Availability vs. Update performance
23D020 25
(Distributed) Data Design
Challenge I
23D020 26
DDB Design
• Given a DB and its workload, how should the DB be split and allocated to
sites as to optimize certain objective functions
• Minimize resource consumption for query processing
23D020 27
Data Fragmentation
• Usefulness
• An application typically accesses only a subset of data
• Different subsets are (naturally) needed at different sites
• The degree of concurrency is enhanced
• Facilitates parallelism
• Fragments can be even defined dynamicaly (i.e., at query time, not at design time)
• Difficulties
• Complicates the catalog management
• May lead to poorer performance when multiple fragments need to be joined
• Fragments likely to be used jointly can be colocated to minimize communication overhead
• Costly to enforce the dependency between attributes in different fragments
23D020 28
Fragmentation Correctness
• Completeness
• Every datum in the relation must be assigned to a fragment
• Disjointness
• There is no redundancy and every datum is assigned to only one fragment
• The decision to replicate data is in the allocation phase
• Reconstruction
• The original relation can be reconstructed from the fragments
• Union for horizontal fragmentation
• Join for vertical fragmentation
23D020 29
Finding the best fragmentation strategy
• Consider it per table
• Computational cost is NP-hard
• Needed information
• Workload
• Frequency of each query
• Access plan and cost of each query
• Take intermediate results and repetitive access into account
• Value distribution and selectivity of predicates
• Work in three phases
1. Determine primary partitions (i.e., attribute subsets often accessed together)
2. Generate a disjoint and covering combination of primary partitions
3. Evaluate the cost of all combinations generated in the previous phase
23D020 30
Data Allocation
• Given a set of fragments, a set of sites on which a number of applications are
running, allocate each fragment such that some optimization criterion is met (subject
to certain constraints)
• It is known to be an NP-hard problem
• The optimal solution depends on many factors
• Location in which the query originates
• The query processing strategies (e.g., join methods)
• Furthermore, in a dynamic environment the workload and access patterns may change
• The problem is typically simplified with certain assumptions
• E.g., only communication cost considered
• Typical approaches build cost models and any optimization algorithm can be
adapted to solve it
• Sub-optimal solutions
• Heuristics are also available
• E.g., best-fit for non-replicated fragments
23D020 31
Data Replication
• Generalization of Allocation (for more than one location)
• Provides execution alternatives
• Improves availability
• Generates consistency problems
• Specially useful for read-only workloads
• No synchronization required
23D020 32
(Distributed) Catalog
Management
Challenge II
23D020 33
DDBMS Catalog Characteristics
External
• Fragmentation Schema
• Global metadata
GLOBAL CATALOG
Global
• External schemas Conceptual
• Global conceptual schema Schema
• Fragment schema Fragment
• Allocation schema Schema
• Local metadata Allocation
• Local conceptual schema Schema
• Physical schema
• Allocation Local
• Global metadata in the coordinator node Conceptual
• Local metadata in the workers
Schema
• Replication Local
LOCAL CATALOG
Internal
a) Single-copy (Coordinator node) Schema
• Single point of failure
• Poor performance (potential bottleneck)
b) Multi-copy (Mirroring, Secondary node)
• Requires synchronization
23D020 34
(Distributed) Transaction
Management
Challenge III
23D020 35
CAP theorem
“We can only achieve two of Consistency, system Availability, and
tolerance to network Partition.”
Eric Brewer
Write(X)
X X
b) Ok (inconsistent)
23D020 36
Configuration alternatives
a) Strong consistency (give away availability)
• Replicas are synchonously modified and guarantee consistent query answering
• The whole system will be declared not to be available in case of network partition
b) Eventually consistent (give away consistency)
• Changes are asynchronously propagated to replicas so answer to the same query
depends on the replica being used
• In case of network partition, changes will be simply delayed
c) Non-distributed data (give away network partitioning)
• Connectivity cannot be lost
• We can have strong consistency without affecting availability
23D020 37
Managing replicas
• Replicating fragments improves query latency and availability
• Requires dealing with consistency and update (a.k.a., synchronization) performance
• Replication protocols characteristics
• Primary – Distributed versioning
• Eager – Lazy replication
23D020 38
Eventual consistency
23D020 39
Replication management configuration
• Definitions
• N: #replicas
• W: #replicas that have to be written before commit
• R: #replicas read that need to coincide before giving response
• Named situations
• Inconsistency window W<N
• Strong consistency R+W>N
• Eventually consistent R+W<=N
• Sets of machines (R and W) may not overlap
• Potential conflict W<(N+1)/2
• Sets of writing machines (W) may not overlap
• Typical configurations
• Fault tolerant system N=3; W=2; R=2
• Massive replication for read scaling R=1
• Read One-Write All (ROWA) R=1; W=N (1+N>N Strong consistency)
• Fast read
• Slow write (low probability of succeeding)
23D020 40
Visual Guide to NOSQL Systems
Data Models:
A
Availability:
Relational
Each client can always
Key-Value
read and write.
Column-Oriented/Tabular
Document-Oriented
AP
CA Dynamo, Voldemort, Tokyo Cabinet, KAI
RDBMSs (MySQL, Postgres,…) Cassandra
Aster Data, Greenplum
Vertica
Pick Two SimpleDB, CouchDB, Riak
23D020 42
Challenges in distributed query processing
• Communication cost (data shipping)
• Not that critical for LAN networks
• Assuming high enough I/O cost
• Fragmentation / Replication
• Metadata and statistics about fragments (and replicas) in the global catalog
• Join Optimization A centralized optimizer
• Joins order minimizes the number of
• Semi-join strategy accesses to disk
23D020 43
The main scenarios in data processing
• Data shipping
• The data is retrieved from the stored site to the site executing the query
• Avoid bottlenecks on frequently used data
• Too much data may be moved – bandwidth intensive!
• Query shipping
• The evaluation of the query is delegated to the site where it is stored
• Avoid transferring large amounts of data
• Overloads machines containing the data! Data Shipping Query Shipping
S1
D1 D2
• Hybrid strategy
• Dynamically decide data or query shipping S2
D1 D2 D1 D2
23D020 44
Phases of distributed query processing
23D020 45
Syntactic optimizer
• Ordering operators
• Left or right deep trees
• Bushy trees
• Added difficulties
• Consider multi-way joins
• Consider the size of the datasets
• Specially the size of the intermediate joins
23D020 46
Physical optimizer
• Transforms an internal query representation into an efficient plan
• Replaces the logical query operators by
1) Specific algorithms (plan operators)
2) Access methods
• Decides in which order to execute them
• Parallelism (!)
• Selects where to execute them (exploit Data Location)
• More difficult for joins (multi-way joins)
• This is done by…
• Enumerating alternative but equivalent plans
• Estimating their costs
• Searching for the best solution
• Using available statistics regarding the physical state of the system
23D020 47
Criteria to choose the access plan
• Usually with the goal to optimize response time
• Time needed to execute a query (i.e., latency or response time)
• Benefits from parallelism
• Cost Model
• Sum of local cost and communication cost
• Local cost
• Cost of central unit processing (#cycles),
• Unit cost of I/O operation (#I/O ops)
• Communication cost (commonly assumed it is linear in the number of bytes transmitted)
• Cost of initiating a message and sending a message (#messages)
• Cost of transmitting one byte (#bytes)
• Knowledge required
• Size of elementary data units processed
• Selectivity of operations to estimate intermediate results
23D020 48
Cost model example
• Parameters:
• Local processing:
• Average CPU time to process an instance (Tcpu)
• Number of instances processed (#inst)
• I/O time per operation (TI/O)
• Number of I/O operations (#I/Os)
The statistics are not difficult to
• Global processing: collect, the problem is that for
• Message time (TMsg) estimating the response time, we
• Number of messages issued (#msgs) need to know a-priori what is going
• Transfer time (send a byte from one site to another) (T TR) to be executed in parallel and what
• Number of bytes transferred (#bytes) is going to be executed
sequentially!
• Calculations:
Resources Used = Wcpu *Tcpu * #inst + WI/O *TI/O *#I/Os + WMsg *TMsg *#msgs + WTR *TTR *#bytes
Response Time = Tcpu * seq#inst + TI/O * seq#I/Os + TMsg * seq#msgs + TTR * seq#bytes
23D020 49
The problem of parallelism
Theory Practice
Samuel Yee
23D020 50
Bulk Synchronous Parallel Model
Ideal
Waisted computing time
Real
SAILING lab slides
23D020 51
Kinds of parallelism
• Inter-query: different queries executed in parallel
• Intra-query
• Intra-operator (if its one operator)
• Unary (e.g., selection)
• Static partitioning
• Binary (e.g., join)
• Static or dynamic partitioning
• Inter-operator
• Independent
• Pipelined
23D020 52
Closing
23D020 55
Summary
• Distributed Systems • Distributed Transactions
• Distributed Database Systems • CAP Theorem
• Distributed Database Systems Architectures • Strong and Eventual Consistency
23D020 56
References
• D. DeWitt & J. Gray. Parallel Database Systems: The future of High Performance
Database Processing. Communications of the ACM, June 1992
• N. J. Gunther. A Simple Capacity Model of Massively Parallel Transaction
Systems. CMG National Conference, 1993
• L. Liu, M.T. Özsu (Eds.). Encyclopedia of Database Systems. Springer, 2009
• M. T. Özsu & P. Valduriez. Principles of Distributed Database Systems, 3rd Ed.
Springer, 2011
• G. Coulouris et al. Distributed Systems: Concepts and Design, 5th Ed. Addisson-
Wesley, 2012
• G. Graefe. Query Evaluation Techniques. In ACM Computing Surveys, 25(2),
1993
• L. G. Valiant. A bridging model for parallel computation. Commun. ACM.
August 1990
23D020 57