0% found this document useful (0 votes)

3 views

04-DistributedDataManagementAndProcessing

The document outlines the complexities and benefits of distributed data management and processing, emphasizing key concepts such as distributed systems, databases, and cloud databases. It covers various challenges, transparency layers, and architectural considerations, as well as the CAP theorem and data management issues related to fragmentation, allocation, and replication. Additionally, it discusses the implications of multi-tenancy and the need for efficient query processing in distributed environments.

Uploaded by

lgavidiap31

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

04-DistributedDataManagementAndProcessing

Uploaded by

lgavidiap31

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

Distributed Data Management

and Processing
23D020: Big Data Management for Data Science
Barcelona School of Economics

23D020
Knowledge objectives 13. Recognize the complexity and benefits of data allocation
14. Explain the benefits of replication
1. Give a definition of Distributed System 15. Discuss the alternatives of a distributed catalog
2. Enumerate the 6 challenges of a Distributed System 16. Explain the CAP theorem
3. Give a definition of a Distributed Database 17. Identify the 3 configuration alternatives given by the CAP
4. Explain the different transparency layers in DDBMS theorem
5. Identify the requirements that distribution imposes 18. Explain the 4 synchronization protocols we can have
on the ANSI/SPARC architecture 19. Explain what eventually consistency means
6. Draw a classical reference functional architecture for 20. Enumerate the phases of distributed query processing
DDBMS
21. Explain the difference between data shipping and query
7. Enumerate the 8 main features of Cloud Databases shipping
8. Explain the difficulties of Cloud Database providers to 22. Explain the meaning of “reconstruction” and “reduction” in
have multiple tenants syntactic optimization
9. Enumerate the 4 main problems tenants/users need 23. Enumerate the 4 different cost factors in distributed query
to tackle in Cloud Databases processing
10. Distinguish the cost of sequential and random access 24. Explain the different kinds of parallelism
11. Explain the difference between the cost of sequential 25. Identify the impact of fragmentation in intra-operator
and random access parallelism
12. Distinguish vertical and horizontal fragmentation 26. Explain the impact of tree topologies (i.e., linear and bushy) in
inter-operator parallelism

23D020 3
Distributed System

Distributed DBMS

Cloud DBMS

Distributed Systems

23D020 4
Distributed system
“One in which components located at networked computers communicate
and coordinate their actions only by passing messages.”
G. Coulouris et al.
• Characteristics:
• Concurrency of components
• Independent failures of components
• Lack of a global clock
Network

23D020 5
Challenges of distributed systems
• Openness
• Scalability
• Quality of service
• Performance/Efficiency
• Reliability/Availability
• Confidentiality
• Concurrency Network

• Transparency
• Heterogeneity of components

23D020 6
Scalability
Cope with large workloads
• Scale up
• Scale out

• Use: Network
• Automatic load-balancing

• Avoid:
• Bottlenecks
• Unnecessary communication
• Peer-to-peer

23D020 7
Performance/Efficiency
Efficient processing
• Minimize latencies
• Maximize throughput

• Use
• Parallelism Network

• Network optimization
• Specific techniques

23D020 8
Reliability/Availability
a) Keep consistency
b) Keep the system running
• Even in the case of failures

• Use
Network
• Replication
• Flexible routing
• Heartbeats
• Automatic recovery

23D020 9
Concurrency
Share resources as much as possible

• Use
• Consensus Protocols

Network
• Avoid
• Interferences
• Deadlocks

23D020 10
Transparency
a) Hide implementation (i.e., physical) details to the users
b) Make transparent to the user all the mechanisms to solve the other
challenges

Network

23D020 11
Further objectives
• Use
• Platform-independent software

• Avoid
• Complex configurations
• Specific hardware/software Network

23D020 12
Distributed System

Distributed DBMS

Cloud DBMS

Distributed Database Systems

23D020 13
Distributed database
“A Distributed DataBase (DDB) is an integrated collection of databases that is physically
distributed across sites in a computer network. A Distributed DataBase Management
System (DDBMS) is the software system that manages a distributed database such that
the distribution aspects are transparent to the users.”
Encyclopedia of Database Systems

Network Network

23D020 14
Transparency layers (I)

Fragmentation Transparency

Replication Transparency

Update Transparency

Network Transparency

Name Transparency

Location Transparency

Data Independence

23D020 15
Transparency layers (II)
• Fragmentation transparency
• The user must not be aware of the existence of different fragments
• Replication transparency
• The user must not be aware of the existing replicas
• Network transparency
• Data access must be independent regardless where data is located
• Each data object must have a unique name
• Data independence at the logical and physical level must be guaranteed
• Inherited from centralized DBMSs (ANSI SPARC)

23D020 16
Classification According to Degree of Autonomy

Autonomy Central Query Update

schema transparency transparency
DDBMS No Yes Yes Yes
T.C. Federated Low Yes Yes Limited
L.C. Federated Medium No Yes Limited
Multi-database High No No No

23D020 17
Extended ANSI-SPARC Architecture of Schemas

• Global catalog (Mappings between ESs – GCS and GCS – LCSs)

• Each node has a local catalog (Mappings between LCSi – ISi)
23D020 18
Centralized DBMS Functional Architecture

Query Manager

View Security Constraint Query

Manager Manager Checker Optimizer

Execution Manager

Scheduler

Recovery Data Manager

Manager Log
Operating
system Buffer pool
Buffer
Manager (Memory)
File
system

23D020 19
Distributed DBMS Functional Architecture
Global Query Manager External

One coordinator
Schema
View Security Constraint Query

GLOBAL CATALOG
Manager Manager Checker Optimizer Global
Conceptual
Schema

Fragment
Global Execution Manager Schema
Allocation
Schema
Global Scheduler

…
Local Query Manager Local
Conceptual
Schema
Many workers

Local Execution Manager

Local

LOCAL CATALOG
LOCAL CATALOG
Internal
Schema
Operating Recovery Data Manager
Manager Log Data Manager
system

File Buffer Buffer pool

system Manager (Memory)

…
23D020 20
Distributed System

Distributed DBMS

Cloud DBMS

Cloud Databases

23D020 21
Parallel database architectures

D. DeWitt & J. Gray. Figure by D. Abadi

23D020 22
Key Features of Cloud Databases
• Scalability
a) Ability to horizontally scale (scale out)
• Quality of service
• Performance/Efficiency
b) Fragmentation: Replication & Distribution
c) Indexing: Distributed indexes and RAM
• Reliability/Availability
• Concurrency Network
d) Weaker concurrency model than ACID
• Transparency
e) Simple call level interface or protocol
• No declarative query language
• Further objectives
f) Flexible schema
• Ability to dynamically add new attributes
g) Quick/Cheap set up
h) Multi-tenancy

23D020 23
Multi-tenancy platform problems (provider side)
• Difficulty: Unpredictable load characteristics
• Variable popularity
• Flash crowds
• Variable resource requirements
• Requirement: Support thousands of tenants
a) Maintain metadata about tenants (e.g., activated features)
b) Self-managing
c) Tolerating failures
d) Scale-out is necessary (sooner or later)
• Rolling upgrades one server at a time
e) Elastic load balancing
• Dynamic partitioning of databases

23D020 24
Data management problems (tenant side)
I. (Distributed) data design
• Data fragmentation
• Data allocation
• Data replication
II. (Distributed) catalog management
• Metadata fragmentation
• Metadata allocation
• Metadata replication
III. (Distributed) transaction management
• Enforcement of ACID properties
• Distributed recovery system
• Distributed concurrency control system
• Replica consistency
• Latency & Availability vs. Update performance

IV. (Distributed) query processing

• Optimization considering
1) Distribution/Parallelism
• Communication overhead
2) Replication

23D020 25
(Distributed) Data Design
Challenge I

23D020 26
DDB Design
• Given a DB and its workload, how should the DB be split and allocated to
sites as to optimize certain objective functions
• Minimize resource consumption for query processing

• Two main issues:

• Data fragmentation
• Data allocation
• Data replication

23D020 27
Data Fragmentation
• Usefulness
• An application typically accesses only a subset of data
• Different subsets are (naturally) needed at different sites
• The degree of concurrency is enhanced
• Facilitates parallelism
• Fragments can be even defined dynamicaly (i.e., at query time, not at design time)

• Difficulties
• Complicates the catalog management
• May lead to poorer performance when multiple fragments need to be joined
• Fragments likely to be used jointly can be colocated to minimize communication overhead
• Costly to enforce the dependency between attributes in different fragments

23D020 28
Fragmentation Correctness
• Completeness
• Every datum in the relation must be assigned to a fragment
• Disjointness
• There is no redundancy and every datum is assigned to only one fragment
• The decision to replicate data is in the allocation phase
• Reconstruction
• The original relation can be reconstructed from the fragments
• Union for horizontal fragmentation
• Join for vertical fragmentation

23D020 29
Finding the best fragmentation strategy
• Consider it per table
• Computational cost is NP-hard
• Needed information
• Workload
• Frequency of each query
• Access plan and cost of each query
• Take intermediate results and repetitive access into account
• Value distribution and selectivity of predicates
• Work in three phases
1. Determine primary partitions (i.e., attribute subsets often accessed together)
2. Generate a disjoint and covering combination of primary partitions
3. Evaluate the cost of all combinations generated in the previous phase

23D020 30
Data Allocation
• Given a set of fragments, a set of sites on which a number of applications are
running, allocate each fragment such that some optimization criterion is met (subject
to certain constraints)
• It is known to be an NP-hard problem
• The optimal solution depends on many factors
• Location in which the query originates
• The query processing strategies (e.g., join methods)
• Furthermore, in a dynamic environment the workload and access patterns may change
• The problem is typically simplified with certain assumptions
• E.g., only communication cost considered
• Typical approaches build cost models and any optimization algorithm can be
adapted to solve it
• Sub-optimal solutions
• Heuristics are also available
• E.g., best-fit for non-replicated fragments

23D020 31
Data Replication
• Generalization of Allocation (for more than one location)
• Provides execution alternatives
• Improves availability
• Generates consistency problems
• Specially useful for read-only workloads
• No synchronization required

23D020 32
(Distributed) Catalog
Management
Challenge II

23D020 33
DDBMS Catalog Characteristics
External
• Fragmentation Schema

• Global metadata

GLOBAL CATALOG
Global
• External schemas Conceptual
• Global conceptual schema Schema
• Fragment schema Fragment
• Allocation schema Schema
• Local metadata Allocation
• Local conceptual schema Schema

• Physical schema
• Allocation Local
• Global metadata in the coordinator node Conceptual
• Local metadata in the workers
Schema

• Replication Local

LOCAL CATALOG
Internal
a) Single-copy (Coordinator node) Schema
• Single point of failure
• Poor performance (potential bottleneck)
b) Multi-copy (Mirroring, Secondary node)
• Requires synchronization

23D020 34
(Distributed) Transaction
Management
Challenge III

23D020 35
CAP theorem
“We can only achieve two of Consistency, system Availability, and
tolerance to network Partition.”
Eric Brewer

• Consistency (C) equivalent to a single up-to-date copy of the data

• High availability (A) of the data (for updates)
• Tolerance to network partitions (P). a) Error (unavailable)

Write(X)
X X

b) Ok (inconsistent)

23D020 36
Configuration alternatives
a) Strong consistency (give away availability)
• Replicas are synchonously modified and guarantee consistent query answering
• The whole system will be declared not to be available in case of network partition
b) Eventually consistent (give away consistency)
• Changes are asynchronously propagated to replicas so answer to the same query
depends on the replica being used
• In case of network partition, changes will be simply delayed
c) Non-distributed data (give away network partitioning)
• Connectivity cannot be lost
• We can have strong consistency without affecting availability

23D020 37
Managing replicas
• Replicating fragments improves query latency and availability
• Requires dealing with consistency and update (a.k.a., synchronization) performance
• Replication protocols characteristics
• Primary – Distributed versioning
• Eager – Lazy replication

User A User B User A User B

write(x) write(x)
write(x) read(x) write(x)

Primary Synchronous Replica Primary Asynchronous Replica

Server Replication Server Server Replication Server
Strong a) Eager primary copy replication b) Lazy primary copy replication Eventually
Consistency User A User A User B
Consistent
User B

write(x) write(x) write(x) write(x)

Replica 1 Synchronous Replica 2 Replica 1 Asynchronous Replica 2

Server Replication Server Server Replication Server
c) Eager distributed replication d) Lazy distributed replication

23D020 38
Eventual consistency

Justin Travis Waith-Mair

23D020 39
Replication management configuration
• Definitions
• N: #replicas
• W: #replicas that have to be written before commit
• R: #replicas read that need to coincide before giving response
• Named situations
• Inconsistency window  W<N
• Strong consistency  R+W>N
• Eventually consistent  R+W<=N
• Sets of machines (R and W) may not overlap
• Potential conflict  W<(N+1)/2
• Sets of writing machines (W) may not overlap
• Typical configurations
• Fault tolerant system  N=3; W=2; R=2
• Massive replication for read scaling  R=1
• Read One-Write All (ROWA)  R=1; W=N (1+N>N  Strong consistency)
• Fast read
• Slow write (low probability of succeeding)

23D020 40
Visual Guide to NOSQL Systems
Data Models:

A
Availability:
Relational
Each client can always
Key-Value
read and write.
Column-Oriented/Tabular
Document-Oriented

AP
CA Dynamo, Voldemort, Tokyo Cabinet, KAI
RDBMSs (MySQL, Postgres,…) Cassandra
Aster Data, Greenplum
Vertica
Pick Two SimpleDB, CouchDB, Riak

Consistency: Partition Tolerance:

All clients always have
the same view of the data C CP
P The system works well despite
physical network partitions.
BerkeleyDB, Scalaris, MemcacheDB, Redis
BigTable, Hypertable, HBase
MongoDB, Terrastore
source:
23D020 https://blog.nahurst.com/visual-guide-to-nosql-systems 41
(Distributed) Query
Processing
Challenge IV

23D020 42
Challenges in distributed query processing
• Communication cost (data shipping)
• Not that critical for LAN networks
• Assuming high enough I/O cost
• Fragmentation / Replication
• Metadata and statistics about fragments (and replicas) in the global catalog
• Join Optimization A centralized optimizer
• Joins order minimizes the number of
• Semi-join strategy accesses to disk

• How to decide the execution plan A distributed optimizer

• Exploit parallelism (!) minimizes the use of network
• Who executes what bandwidth

23D020 43
The main scenarios in data processing
• Data shipping
• The data is retrieved from the stored site to the site executing the query
• Avoid bottlenecks on frequently used data
• Too much data may be moved – bandwidth intensive!
• Query shipping
• The evaluation of the query is delegated to the site where it is stored
• Avoid transferring large amounts of data
• Overloads machines containing the data! Data Shipping Query Shipping
S1

D1 D2
• Hybrid strategy
• Dynamically decide data or query shipping S2
D1 D2 D1 D2

23D020 44
Phases of distributed query processing

23D020 45
Syntactic optimizer
• Ordering operators
• Left or right deep trees
• Bushy trees

Hard to parallelize Easy to parallelize

• Added difficulties
• Consider multi-way joins
• Consider the size of the datasets
• Specially the size of the intermediate joins

23D020 46
Physical optimizer
• Transforms an internal query representation into an efficient plan
• Replaces the logical query operators by
1) Specific algorithms (plan operators)
2) Access methods
• Decides in which order to execute them
• Parallelism (!)
• Selects where to execute them (exploit Data Location)
• More difficult for joins (multi-way joins)
• This is done by…
• Enumerating alternative but equivalent plans
• Estimating their costs
• Searching for the best solution
• Using available statistics regarding the physical state of the system

23D020 47
Criteria to choose the access plan
• Usually with the goal to optimize response time
• Time needed to execute a query (i.e., latency or response time)
• Benefits from parallelism
• Cost Model
• Sum of local cost and communication cost
• Local cost
• Cost of central unit processing (#cycles),
• Unit cost of I/O operation (#I/O ops)
• Communication cost (commonly assumed it is linear in the number of bytes transmitted)
• Cost of initiating a message and sending a message (#messages)
• Cost of transmitting one byte (#bytes)
• Knowledge required
• Size of elementary data units processed
• Selectivity of operations to estimate intermediate results

23D020 48
Cost model example
• Parameters:
• Local processing:
• Average CPU time to process an instance (Tcpu)
• Number of instances processed (#inst)
• I/O time per operation (TI/O)
• Number of I/O operations (#I/Os)
The statistics are not difficult to
• Global processing: collect, the problem is that for
• Message time (TMsg) estimating the response time, we
• Number of messages issued (#msgs) need to know a-priori what is going
• Transfer time (send a byte from one site to another) (T TR) to be executed in parallel and what
• Number of bytes transferred (#bytes) is going to be executed
sequentially!
• Calculations:
Resources Used = Wcpu *Tcpu * #inst + WI/O *TI/O *#I/Os + WMsg *TMsg *#msgs + WTR *TTR *#bytes
Response Time = Tcpu * seq#inst + TI/O * seq#I/Os + TMsg * seq#msgs + TTR * seq#bytes

23D020 49
The problem of parallelism

Theory Practice

Samuel Yee

23D020 50
Bulk Synchronous Parallel Model

Ideal
Waisted computing time

Real
SAILING lab slides

23D020 51
Kinds of parallelism
• Inter-query: different queries executed in parallel
• Intra-query
• Intra-operator (if its one operator)
• Unary (e.g., selection)
• Static partitioning
• Binary (e.g., join)
• Static or dynamic partitioning
• Inter-operator
• Independent
• Pipelined

Pipelined Independent operators

23D020 52
Closing

23D020 55
Summary
• Distributed Systems • Distributed Transactions
• Distributed Database Systems • CAP Theorem
• Distributed Database Systems Architectures • Strong and Eventual Consistency

• Cloud Databases • Distributed Querying

• Data shipping
• Distributed Database Design • Query shipping
• Fragmentation
• Query Optimizer
• Kinds
• Characteristics
• Parallelism
• Allocation
• Replication
• Distributed Catalog

23D020 56
References
• D. DeWitt & J. Gray. Parallel Database Systems: The future of High Performance
Database Processing. Communications of the ACM, June 1992
• N. J. Gunther. A Simple Capacity Model of Massively Parallel Transaction
Systems. CMG National Conference, 1993
• L. Liu, M.T. Özsu (Eds.). Encyclopedia of Database Systems. Springer, 2009
• M. T. Özsu & P. Valduriez. Principles of Distributed Database Systems, 3rd Ed.
Springer, 2011
• G. Coulouris et al. Distributed Systems: Concepts and Design, 5th Ed. Addisson-
Wesley, 2012
• G. Graefe. Query Evaluation Techniques. In ACM Computing Surveys, 25(2),
1993
• L. G. Valiant. A bridging model for parallel computation. Commun. ACM.
August 1990
23D020 57

SAP HANA On Power Level 2 Quiz - Attempt Review
100% (4)
SAP HANA On Power Level 2 Quiz - Attempt Review
19 pages
Distributed Systems
No ratings yet
Distributed Systems
228 pages
Blockchain Technology in Advertising
No ratings yet
Blockchain Technology in Advertising
20 pages
c1791 Student Guide
No ratings yet
c1791 Student Guide
338 pages
AWS Cloud
No ratings yet
AWS Cloud
34 pages
07-DistributedDataManagement
No ratings yet
07-DistributedDataManagement
44 pages
02 DistributedDataManagement
No ratings yet
02 DistributedDataManagement
37 pages
10 Distributeddbms
No ratings yet
10 Distributeddbms
56 pages
Distributed DBM S
No ratings yet
Distributed DBM S
67 pages
04_Distributed DBMSs - Concepts and Design
No ratings yet
04_Distributed DBMSs - Concepts and Design
72 pages
Chapter 4 Distributed Database Systems
No ratings yet
Chapter 4 Distributed Database Systems
69 pages
Iii. Current Trends: Distributed Databases and DBMSS: Concepts and Design
No ratings yet
Iii. Current Trends: Distributed Databases and DBMSS: Concepts and Design
32 pages
Distributed Databases
No ratings yet
Distributed Databases
55 pages
Lecture 1 Ho PDF
No ratings yet
Lecture 1 Ho PDF
62 pages
lecture-1-ho (1)
No ratings yet
lecture-1-ho (1)
62 pages
Chapter - 7 Distributed Database System
0% (1)
Chapter - 7 Distributed Database System
54 pages
Topic 7 DDBMS
No ratings yet
Topic 7 DDBMS
28 pages
Basis For Distributed Database Technology
No ratings yet
Basis For Distributed Database Technology
35 pages
Distributed Database Management Systems
No ratings yet
Distributed Database Management Systems
55 pages
Distributeddbms Er. Inderjeet Bal
No ratings yet
Distributeddbms Er. Inderjeet Bal
60 pages
Basis For Distributed Database Technology
No ratings yet
Basis For Distributed Database Technology
35 pages
Distributed Multimedia & Database System
No ratings yet
Distributed Multimedia & Database System
58 pages
Distributed Database Management Systems (2)
No ratings yet
Distributed Database Management Systems (2)
73 pages
Distributed Database Management Systems
No ratings yet
Distributed Database Management Systems
123 pages
Distributed Database Design
88% (8)
Distributed Database Design
85 pages
Unit 2
No ratings yet
Unit 2
16 pages
Topic 7 - Distributed Database Systems
No ratings yet
Topic 7 - Distributed Database Systems
44 pages
DDBMS Architecture
No ratings yet
DDBMS Architecture
62 pages
1 Distributed DB
No ratings yet
1 Distributed DB
67 pages
Lecture 6 -Distributed databases
No ratings yet
Lecture 6 -Distributed databases
61 pages
Unit i Distributed Databases
No ratings yet
Unit i Distributed Databases
45 pages
Distributed Database Design: Basics
No ratings yet
Distributed Database Design: Basics
18 pages
Distributed Databases AND Client-Server Architechures
No ratings yet
Distributed Databases AND Client-Server Architechures
73 pages
CSE 453 Slide 1
No ratings yet
CSE 453 Slide 1
46 pages
Distributed DBMS Architecture 5. Distributed Database Design 7.5 Layers of Query Processing
No ratings yet
Distributed DBMS Architecture 5. Distributed Database Design 7.5 Layers of Query Processing
19 pages
Chapter - 6 Distributed Database System
No ratings yet
Chapter - 6 Distributed Database System
50 pages
Enterprise Systems: Distributed Databases and Systems - DT211 4
No ratings yet
Enterprise Systems: Distributed Databases and Systems - DT211 4
25 pages
Distributed DBMS (Good)
No ratings yet
Distributed DBMS (Good)
58 pages
Distributed Database Management Systems
No ratings yet
Distributed Database Management Systems
55 pages
Distributed Database Design
100% (3)
Distributed Database Design
86 pages
Distributed DB
No ratings yet
Distributed DB
146 pages
Distributed Databases
No ratings yet
Distributed Databases
24 pages
RST Dbms
No ratings yet
RST Dbms
62 pages
Unit - I Distributed Data Processing
100% (2)
Unit - I Distributed Data Processing
27 pages
Distributed Database Management Systems
No ratings yet
Distributed Database Management Systems
63 pages
UNIT 1 _SCSA3008_DISTRIBUTED DATABASE AND INFORMATION
No ratings yet
UNIT 1 _SCSA3008_DISTRIBUTED DATABASE AND INFORMATION
23 pages
DDMS Part-1
No ratings yet
DDMS Part-1
35 pages
Module 1
No ratings yet
Module 1
24 pages
Distributed Systems
No ratings yet
Distributed Systems
25 pages
ADBMS Tutorial
No ratings yet
ADBMS Tutorial
6 pages
Distributed Databases: CMP-3440 - Database Systems
No ratings yet
Distributed Databases: CMP-3440 - Database Systems
12 pages
10-Distributed Databases Lecturer 3 Best
No ratings yet
10-Distributed Databases Lecturer 3 Best
55 pages
Distributed Database Management Systems
No ratings yet
Distributed Database Management Systems
23 pages
Chapter 4 - Distributed Database System
No ratings yet
Chapter 4 - Distributed Database System
52 pages
Distributed Database Management Systems: Rahil But
No ratings yet
Distributed Database Management Systems: Rahil But
67 pages
Distributed DBMS
No ratings yet
Distributed DBMS
62 pages
A Distributed Database Management System ('DDBMS') Is A Software System
No ratings yet
A Distributed Database Management System ('DDBMS') Is A Software System
5 pages
Distributed Dbmss - Concepts and Design: Pearson Education © 2009
No ratings yet
Distributed Dbmss - Concepts and Design: Pearson Education © 2009
72 pages
Week 12- Distributed Databases
No ratings yet
Week 12- Distributed Databases
37 pages
Distributed Database
100% (1)
Distributed Database
24 pages
Docker Essentials and Practices: Definitive Reference for Developers and Engineers
From Everand
Docker Essentials and Practices: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Designing Resilient Distributed Systems with CAP: Definitive Reference for Developers and Engineers
From Everand
Designing Resilient Distributed Systems with CAP: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
NetBSD Systems and Architecture: Definitive Reference for Developers and Engineers
From Everand
NetBSD Systems and Architecture: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
CouchDB Essentials: Definitive Reference for Developers and Engineers
From Everand
CouchDB Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
DB2 Administration and Optimization Guide: Definitive Reference for Developers and Engineers
From Everand
DB2 Administration and Optimization Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Aws The Final
No ratings yet
Aws The Final
20 pages
6014.optimizing SQL Server For Temenos T24 - FINAL
No ratings yet
6014.optimizing SQL Server For Temenos T24 - FINAL
60 pages
Risk Calculation For Cloud Solution Final
100% (1)
Risk Calculation For Cloud Solution Final
13 pages
Lect6-IoT-Cloud Storage Models and Communication APIs1
100% (1)
Lect6-IoT-Cloud Storage Models and Communication APIs1
25 pages
HPE ProLiant DL380 Gen11-PSN1014696069CZEN
No ratings yet
HPE ProLiant DL380 Gen11-PSN1014696069CZEN
5 pages
PracticeExam DCADAS3 Scala 1
No ratings yet
PracticeExam DCADAS3 Scala 1
27 pages
Revolutionizing Supply Chain Management: Real-Time Data Processing and Concurrency
No ratings yet
Revolutionizing Supply Chain Management: Real-Time Data Processing and Concurrency
8 pages
AWS Professional Exam Details
No ratings yet
AWS Professional Exam Details
8 pages
NETACT
100% (1)
NETACT
12 pages
Why We Are Building Cardano: Charles Hoskinson
No ratings yet
Why We Are Building Cardano: Charles Hoskinson
44 pages
DevOps Onboarding Blueprint 6 Months Success Plan
No ratings yet
DevOps Onboarding Blueprint 6 Months Success Plan
46 pages
John Paul
No ratings yet
John Paul
43 pages
Horizontally Scaling and Vertically Scaling
No ratings yet
Horizontally Scaling and Vertically Scaling
4 pages
Cloud Computing: Semester 5
No ratings yet
Cloud Computing: Semester 5
13 pages
Simics Unleashed - Applications of Virtual Platforms 2013
No ratings yet
Simics Unleashed - Applications of Virtual Platforms 2013
197 pages
Module-3-Cloud Platform Architecture
No ratings yet
Module-3-Cloud Platform Architecture
33 pages
Resouce Management Andloadbalancing
No ratings yet
Resouce Management Andloadbalancing
42 pages
Dell Computer
No ratings yet
Dell Computer
13 pages
Distributed System
No ratings yet
Distributed System
37 pages
GCP Architect Interview Questions
No ratings yet
GCP Architect Interview Questions
4 pages
Abhisheksinh Gohil
No ratings yet
Abhisheksinh Gohil
1 page
ERP Book
100% (1)
ERP Book
610 pages
VMware vs Nutanix
No ratings yet
VMware vs Nutanix
24 pages
HON UNIT 2 DIGITAL NOTES
No ratings yet
HON UNIT 2 DIGITAL NOTES
78 pages
Microsoft Azure Cloud Fundamentals AZ 90
No ratings yet
Microsoft Azure Cloud Fundamentals AZ 90
62 pages

04-DistributedDataManagementAndProcessing

Uploaded by

04-DistributedDataManagementAndProcessing

Uploaded by

Distributed Data Management

Distributed Database Systems

Autonomy Central Query Update

• Global catalog (Mappings between ESs – GCS and GCS – LCSs)

View Security Constraint Query

Recovery Data Manager

Local Execution Manager

File Buffer Buffer pool

D. DeWitt & J. Gray. Figure by D. Abadi

IV. (Distributed) query processing

• Two main issues:

• Consistency (C) equivalent to a single up-to-date copy of the data

User A User B User A User B

Primary Synchronous Replica Primary Asynchronous Replica

write(x) write(x) write(x) write(x)

Replica 1 Synchronous Replica 2 Replica 1 Asynchronous Replica 2

Justin Travis Waith-Mair

Consistency: Partition Tolerance:

• How to decide the execution plan A distributed optimizer

Hard to parallelize Easy to parallelize

Pipelined Independent operators

• Cloud Databases • Distributed Querying

You might also like